basic web applications 2. search engine why we need search ensigns? why we need search ensigns?...

21
Basic Web Applications Basic Web Applications 2 2

Upload: ambrose-beasley

Post on 29-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Basic Web Applications Basic Web Applications 22

Page 2: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Search EngineSearch Engine

Why we need search ensigns?Why we need search ensigns?– because there are hundreds of millions because there are hundreds of millions

of pages available on the webof pages available on the web– most of them titled according to the most of them titled according to the

notion of their authornotion of their author– almost all of them sitting on servers almost all of them sitting on servers

with hidden names. with hidden names. – We use search engines get We use search engines get

information on those pages.information on those pages.

Page 3: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

what is Internet what is Internet Search EngineSearch Engine

Special sites on the Web that are designed Special sites on the Web that are designed to help people find information stored on to help people find information stored on other sites. other sites.

various search engines use different ways various search engines use different ways to work, but they all perform three basic to work, but they all perform three basic tasks:tasks:– Select pieces of the Internet -- based on Select pieces of the Internet -- based on

important words.important words.– Keep an index of the words they find, and where Keep an index of the words they find, and where

they find them.they find them.– Allow users to look for words or combinations of Allow users to look for words or combinations of

words found in that index.words found in that index.

Page 4: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Search EngineSearch Engine

1- Search engines use software called spiders, which comb the internet looking for documents and their web addresses

2- Spreading out across the most widely used portions of the Web.

the process is called Web crawling

Page 5: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Search EngineSearch Engine

The documents and web addresses are collected and sent to the search engine's indexing software

Page 6: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Search EngineSearch Engine

The indexing software extracts information from the documents, storing it in a database. (every words or titles)

Page 7: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

When you perform search by entering keywords, the When you perform search by entering keywords, the database is searched for documents that match.database is searched for documents that match.

Page 8: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Search EngineSearch Engine

Page 9: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Search EngineSearch Engine

In Google-In Google- multiple spiders at one time. multiple spiders at one time.

Each spider --- > keep 300 connections to Each spider --- > keep 300 connections to Web pages open at a time. Web pages open at a time.

The system crawl over 100 pages per The system crawl over 100 pages per second-second- around 600 kilobytes of data around 600 kilobytes of data each second. each second.

to minimize delays use its own DNS.to minimize delays use its own DNS.

Page 10: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Search EngineSearch Engine

Google spider take note of two things:Google spider take note of two things:– The words within the pageThe words within the page– Where the words were foundWhere the words were found

– The frequency and location of The frequency and location of keywords within the Web pagekeywords within the Web page

– How long the Web page has existedHow long the Web page has existed– The number of other Web pages that The number of other Web pages that

link to the page in questionlink to the page in question

Page 11: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Search EngineSearch Engine

LycosLycos::

– keep track of the words in the title, subheadingskeep track of the words in the title, subheadings– Links-Links- the 100 most frequently used words on the 100 most frequently used words on

the pagethe page– each word in the first 20 lines of text. each word in the first 20 lines of text.

Each commercial search engine --Each commercial search engine -- different different formula for assigning weight to the words formula for assigning weight to the words in its index. in its index.

Page 12: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Meta TagsMeta Tags

Meta tags -Meta tags - key words and key words and concepts-concepts- under which the page under which the page will be indexed. will be indexed.

Meta tags can guide the search Meta tags can guide the search engine. engine.

There is of course careless page There is of course careless page owner might ( irrelevant meta tags). owner might ( irrelevant meta tags).

Page 13: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Meta TagsMeta Tags

To protect against this:To protect against this:– spiders correlate Meta tags with page content -spiders correlate Meta tags with page content -

rejecting the not matched meta tags.rejecting the not matched meta tags. <meta name="googlebot" <meta name="googlebot"

content="noindex">content="noindex">

– The owner of a page may or may not wants its The owner of a page may or may not wants its page to be included in the results of a search page to be included in the results of a search engine's activities. engine's activities.

– Exclusion protocolExclusion protocol was developed and was developed and implemented in the meta-tag section at the implemented in the meta-tag section at the beginning of a Web page to tell a spider to leave beginning of a Web page to tell a spider to leave the page alone.the page alone.

Page 14: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Building the IndexBuilding the Index

Once the spiders finish finding Once the spiders finish finding information on Web pages, the information on Web pages, the search engine must store the search engine must store the information in a useful way:information in a useful way:

– The information stored with the data The information stored with the data (for simplicity word + url)(for simplicity word + url)

– The method by which the information The method by which the information is indexedis indexed

Page 15: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Building the IndexBuilding the Index

Different search engines Different search engines – will produce different lists will produce different lists – pages presented in different orders.pages presented in different orders.

Page 16: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Building the IndexBuilding the Index

Indexing process allows information Indexing process allows information to be found as quickly as possible. to be found as quickly as possible.

One ways to build index is to build One ways to build index is to build a a hash table.hash table.

In hashing, a formula is applied to In hashing, a formula is applied to attach a numerical value to each attach a numerical value to each word. word.

Page 17: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Building the IndexBuilding the Index In English, the "M" section of the dictionary In English, the "M" section of the dictionary

is much thicker than the "X" section -finding is much thicker than the "X" section -finding a word beginning with a very "popular" a word beginning with a very "popular" letter tae time. letter tae time.

Hashing evens out the difference, and Hashing evens out the difference, and reduces the average time it takes to find an reduces the average time it takes to find an entry. entry.

It also separates the index from the actual It also separates the index from the actual entry. entry.

Page 18: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Building the IndexBuilding the Index

The hash table contains the The hash table contains the hashed number which Point to hashed number which Point to the actual data, which is sorted the actual data, which is sorted in efficiently way. in efficiently way.

Page 19: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Building a SearchBuilding a Search

Searching through an index involves a Searching through an index involves a user building a query and submitting user building a query and submitting it through the search engine. it through the search engine.

Boolean operators:Boolean operators:– AND AND -. Some search engines substitute -. Some search engines substitute

the operator "+" for the word AND.the operator "+" for the word AND.– OR OR - At least one of the terms joined by - At least one of the terms joined by

"OR" must appear in the pages or "OR" must appear in the pages or documents.documents.

Page 20: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Building a SearchBuilding a Search

NOT NOT - must not appear in the pages or - must not appear in the pages or documents. Some search engines substitute documents. Some search engines substitute the operator "-" for the word NOT.the operator "-" for the word NOT.

FOLLOWED BY FOLLOWED BY - One of the terms must be - One of the terms must be directly followed by the other.directly followed by the other.

NEAR NEAR - One of the terms must be within a - One of the terms must be within a specified number of words of the other.specified number of words of the other.

Quotation Marks Quotation Marks - The words between the - The words between the quotation marks are treated as a phrase, quotation marks are treated as a phrase, and that phrase must be found within the and that phrase must be found within the document or filedocument or file

Page 21: Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available

Overall viewOverall view