how search engines work general search strategies dr. dania bilal is 587 sis fall 2007
TRANSCRIPT
How Search Engines Work How Search Engines Work General Search StrategiesGeneral Search Strategies
Dr. Dania BilalDr. Dania Bilal
IS 587IS 587
SIS Fall 2007SIS Fall 2007
Fun QuizFun Quiz
Take the search engine quiz located atTake the search engine quiz located at
http://websearch.about.com/library/quizzeshttp://websearch.about.com/library/quizzes/search_engine_quiz/blsearchenginequiz.h/search_engine_quiz/blsearchenginequiz.htmtm
Record the no. of incorrect answersRecord the no. of incorrect answers
Share the results of the quiz with a Share the results of the quiz with a classmate.classmate.
How Search Engines Work?How Search Engines Work?
They collect information from selected web sitesThey collect information from selected web sitesThe employ special software robots, called spiders, The employ special software robots, called spiders, to crawl web pages to crawl web pages Spiders build lists of the words found in Web sites.Spiders build lists of the words found in Web sites.
When a spider is building its lists, the spider is Web When a spider is building its lists, the spider is Web crawling.crawling.
Spiders store the lists in the engine’s databaseSpiders store the lists in the engine’s databaseThe engine’s indexing software builds an index of The engine’s indexing software builds an index of words words Information is matched against query input and Information is matched against query input and retrieved (processing algorithm)retrieved (processing algorithm)
How Spiders and Crawlers Work?How Spiders and Crawlers Work?
They begin with popular and heavily used They begin with popular and heavily used web servers.web servers.
They begin with a popular site, collect the They begin with a popular site, collect the words on its pages and follow every link words on its pages and follow every link found within the site. found within the site. Spiders travel across pages and the most Spiders travel across pages and the most
widely used portions of the Web widely used portions of the Web
How Spiders and Crawlers Work?How Spiders and Crawlers Work?
A dedicated server of URLs is built by a A dedicated server of URLs is built by a search engine company (e.g., Google) so search engine company (e.g., Google) so that spiders collect information quicklythat spiders collect information quickly
More than one spider is used to craw web More than one spider is used to craw web pages at a timepages at a time Google uses 3-4 spiders and collect over 100 Google uses 3-4 spiders and collect over 100
pages per secondpages per second
How Spiders and Crawlers Work?How Spiders and Crawlers Work?
When no dedicated URL server is used, When no dedicated URL server is used, search engine company relies on ISP for search engine company relies on ISP for the domain names (translated into the domain names (translated into addresses) to use for crawling the webaddresses) to use for crawling the web Delay in gathering informationDelay in gathering information Delay in updating information Delay in updating information Lack of control over URL addressesLack of control over URL addresses
Google Spider and How it WorksGoogle Spider and How it Works
A spider looks at the html or xml or other A spider looks at the html or xml or other coding used to build a web page and collects coding used to build a web page and collects information from the meta-tagsinformation from the meta-tagsIt indexes words within the actual text of a It indexes words within the actual text of a page page It indicates where the words were found It indicates where the words were found (URL, title, headings, etc.)(URL, title, headings, etc.)It disregards initial articlesIt disregards initial articlesIt disregards pages that should not be It disregards pages that should not be crawled or indexed crawled or indexed
Google Spider and How it WorksGoogle Spider and How it Works
It uses Robot-Exclusion Protocol in disregarding It uses Robot-Exclusion Protocol in disregarding pages pages Implemented in the meta-tag section at the beginning Implemented in the meta-tag section at the beginning
of a Web pageof a Web page Tells a spider to leave the page alone, neither index Tells a spider to leave the page alone, neither index
the words on the page nor try to follow its linksthe words on the page nor try to follow its links
Franklin, C. How Internet Search Engines Work. Franklin, C. How Internet Search Engines Work. http://computer.howstuffworks.com/search-http://computer.howstuffworks.com/search-engine.htmengine.htm
How Search Engines Store Words How Search Engines Store Words Indexed?Indexed?
The process varies among enginesThe process varies among engines
Words are stored with no. of times they Words are stored with no. of times they appear on a pages (posting)appear on a pages (posting)
Weight is assigned to each word. Weight is assigned to each word.
Words appearing near top of a page may Words appearing near top of a page may have more weight than those appearing in have more weight than those appearing in subheadings, in links, in meta tags, in title, subheadings, in links, in meta tags, in title, etc. etc.
How Search Engines Store Words How Search Engines Store Words Indexed?Indexed?
Information is encoded to save spaceInformation is encoded to save spaceInformation is indexedInformation is indexed An index of words is built by the automatic An index of words is built by the automatic
indexer (indexing software)indexer (indexing software) A hash table is created with an assigned A hash table is created with an assigned
weight or value for each word indexedweight or value for each word indexed Hashing allows for even the distribution of Hashing allows for even the distribution of
popular entries (e.g., letter M) with those that popular entries (e.g., letter M) with those that are less popular (e.g., letter X) for quick are less popular (e.g., letter X) for quick retrieval retrieval
Using General DirectoriesUsing General Directories
Yahoo and its familyYahoo and its family
Browsing directoryBrowsing directory Directory databaseDirectory database Small and human-selected and indexed Small and human-selected and indexed
Searching using keywords Searching using keywords Search databaseSearch database Larger and non-selective databaseLarger and non-selective database Spider and machine indexingSpider and machine indexing
Yahoo Yahoo
Yahoo.com Yahoo.com Works like a search engine rather than a Works like a search engine rather than a
directorydirectory Searches the webSearches the web Exercise:Exercise: search under my name and see search under my name and see
how Yahoo processes query while you’re how Yahoo processes query while you’re inputting informationinputting information
Directory found under Directory found under moremore or ator at http://search.yahoo.com/dirhttp://search.yahoo.com/dir
Yahoo Search EngineYahoo Search Engine
Search Search WebWeb ImagesImages VideosVideos Local informationLocal information ShoppingShopping More…More…
Yahoo Advanced SearchYahoo Advanced Search
Advanced Search feature Advanced Search feature Shown on screen after you perform a search, Shown on screen after you perform a search,
or by going directly toor by going directly to http://search.yahoo.com/web/advanced?ei=Uhttp://search.yahoo.com/web/advanced?ei=U
TF-8&p=dr+dania+bilal&fr=yfp-t-471TF-8&p=dr+dania+bilal&fr=yfp-t-471
Lots of search features to exploreLots of search features to explore
Yahoo Advanced Search FeaturesYahoo Advanced Search Features
Boolean Boolean PhrasePhraseCurrencyCurrencyDomain Domain File formatFile formatCountryCountryLanguageLanguageOtherOther
Yahoo Advanced Search FeaturesYahoo Advanced Search Features
ExerciseExercise Perform a search on a topic of your choicePerform a search on a topic of your choice Use Boolean equivalents Use Boolean equivalents
All the words=ANDAll the words=ANDThe exact phrase=phrase; proximity searchThe exact phrase=phrase; proximity searchAny of these words=ORAny of these words=ORNone of these words=NotNone of these words=Not
Choose part of page to searchChoose part of page to search Choose language other than EnglishChoose language other than English Report results in classReport results in class
Yahoo Search ServicesYahoo Search ServicesFor searching specific content area such asFor searching specific content area such as
Search Services Search Services Web SearchWeb Search
Find anything from across the Web Find anything from across the Web AnswersAnswers
Ask questions and get answers from real people Ask questions and get answers from real people Audio SearchAudio Search
Find over 50mm audio files from across the Web Find over 50mm audio files from across the Web Creative Commons SearchCreative Commons Search
Find Creative Commons content that you can share or re-use in your own works Find Creative Commons content that you can share or re-use in your own works Directory SearchDirectory Search
Search or browse Yahoo!'s categorized guide to the Web Search or browse Yahoo!'s categorized guide to the Web Image SearchImage Search
Find over 1.6 Billion photos and illustrations from all over the Web Find over 1.6 Billion photos and illustrations from all over the Web Job SearchJob Search
Search for jobs, post your resume and more on Yahoo! HotJobs Search for jobs, post your resume and more on Yahoo! HotJobs LocalLocal
Find everything in your area from dry cleaners to day spas Find everything in your area from dry cleaners to day spas MapsMaps
Find maps and driving directions for anywhere you want to go Find maps and driving directions for anywhere you want to go Mobile SearchMobile Search
Find whatever, wherever you are Find whatever, wherever you are My Web (Beta)My Web (Beta)
The newest way to save, share and organize any page you want on the Web The newest way to save, share and organize any page you want on the Web News SearchNews Search
Search for news stories and related photos, videos and audio clips Search for news stories and related photos, videos and audio clips
Yahoo NextYahoo Next
http://next.yahoo.com/ http://next.yahoo.com/ Cutting edge technology at YahooCutting edge technology at Yahoo Blogs, Web 2.0, use of alltheweb, Yahoo Blogs, Web 2.0, use of alltheweb, Yahoo
Maps, Podcasts, audio and all other features Maps, Podcasts, audio and all other features that are in Beta testingthat are in Beta testing
Yahoo PreferencesYahoo Preferences
Customize Yahoo to fit your needsCustomize Yahoo to fit your needs
Go to Preferences from the Web search Go to Preferences from the Web search pagepage
Edit preferences based on your needsEdit preferences based on your needs
Edited preferences are saved in browser Edited preferences are saved in browser on desktopon desktop
StrategiesStrategies
BooleanBoolean
Boolean equivalentsBoolean equivalents
Proximity and phrase searchingProximity and phrase searching
Searching within a fieldSearching within a field
Search limitsSearch limits
Yahoo Search StrategiesYahoo Search Strategies
Explore Yahoo’s help pageExplore Yahoo’s help page
Read the Search TipsRead the Search Tips
Read the search limit parameters such asRead the search limit parameters such as Intitle:Intitle: url:url: inurl:inurl:
Read how to use Boolean equivalents and Read how to use Boolean equivalents and other search parametersother search parameters
Engines and Information NeedEngines and Information Need
Several general search engines on the Several general search engines on the WebWeb
Select engine(s) that best fit your needSelect engine(s) that best fit your need
Visit the Visit the Web Search GuideWeb Search Guide for latest for latest information:information: http://websearch.about.com/od/http://websearch.about.com/od/
generalsearchengines/generalsearchengines/General_AllPurpose_Search_Engines.htmGeneral_AllPurpose_Search_Engines.htm
Hands-on ActivityHands-on Activity
Browe the list of general search engines in Web Browe the list of general search engines in Web Search GuideSearch Guide
Explore 4 of the engines listedExplore 4 of the engines listed Wisenut, Snap.com, Lycos, ExaleadWisenut, Snap.com, Lycos, Exalead Search under my name in each engineSearch under my name in each engine Compare the results by viewing the first two pages Compare the results by viewing the first two pages
retrievedretrieved How many overlaps were found among the three How many overlaps were found among the three
enginesengines How many unique results were found in each engine How many unique results were found in each engine
Specialized Search EnginesSpecialized Search Engines
Web Search Guide has a listing of Web Search Guide has a listing of specialized search enginesspecialized search engines
Web companion to the textbook, chapter 3 Web companion to the textbook, chapter 3 describes a variety of specialized enginesdescribes a variety of specialized engines
Explore chapter 3 familiarize yourself with Explore chapter 3 familiarize yourself with the engines described the engines described
Hands-on ActivityHands-on Activity
Find the answer or relevant information for Find the answer or relevant information for these two queries using an appropriate, these two queries using an appropriate, specialized search engine:specialized search engine: Do squirrels hybernate?Do squirrels hybernate? Find me a list of foreign-owned companies Find me a list of foreign-owned companies
based in the U.S., organized by state. based in the U.S., organized by state.