the search for quality: productive web searching
DESCRIPTION
The Search for Quality: productive Web searching. John Cox James Hardiman Library NUI, Galway. The Problem. 7.3 million new Web pages daily Quality varies, mainly due to ease of publication and lack of checks Quality is in the eye of the beholder Over-dependence on general search engines - PowerPoint PPT PresentationTRANSCRIPT
The Problem
7.3 million new Web pages dailyQuality varies, mainly due to ease of
publication and lack of checksQuality is in the eye of the beholderOver-dependence on general search
enginesSimplistic use of search tools
Some Usage Findings
NUI, Galway Library survey, March 2000: Search engines cited by 79 out of 167 respondents Exclusively used for, eg Nazism, defamation law, hepatitis C Less than 50% satisfied
Other surveys show very simplistic use: 33% users enter one word only Further 33% users enter two words only
UK survey indicates 80% searchers waste some timeUS survey shows “search rage” within 12 minutes
Key Question
“How much better than users are information staff at finding high-quality information on the Web and what leadership do we provide?”
5 key actions needed
5 Key Actions
Get the best from the search enginesGo vertical: subject-specific sourcesTake time to experiment, eg helper
softwareExploit the invisible WebActively promote quality searching
1: Get the Best from the Search Engines
Understand how they workKnow their limitationsUse advanced featuresSearch more than oneKnow when not to use them
Search Engine Components
Crawler: follows linksIndexer: builds databaseQuery processor: lets us search
Common Limitations
Profit-orientedPaid entries listed at top Out of datePartial site indexingTechnically must exclude many sites,
eg Password-protected Registration needed Database-driven
Hidden search facilities
Understanding Google
Strengths Coverage Cached pages File types, eg
PDF,.doc,.ppt Relevance: link
popularity Beyond pages: images,
newsgroups
Weaknesses Poor Boolean support No truncation Limited date searching Invisible search
facilities Two pages per site
displayed by default
Google: hidden features 2
Partial URL v Specific Site Search:
Not possible on Advanced Search despite “Domains” limit
Other Search Engines
Always worth searching more than one, eg All the Web (FAST) AltaVista Lycos/HotBot Northern Light (?)
Overlap may be limitedDifferent ranking criteria
2. Go Vertical: specific tools
Type Example(s)
Region Doras, Yahoo Australia & NZ
Domain SearchEdu.com
Genre Newsindex
Discipline EEVL, LawCrawler
Subject Politicalinformation.com
4: Explore the “Invisible Web”
Material, often of high quality, that general search engines can’t or won’t index Unlinked pages Non-HTML file types, eg audio, video, PDF Authenticated sites Databases
Much greater in size than visible Web
Old Habits
Search strategy formulation
Critical source
selectionPatience Flexibility
Concept analysis
Critical appraisal of search hits
Towards a Brighter Future
Automatically-generated, accurate metadata
Smarter search engines More quality-sensitive More penetrative
XML: structured data
References
•Sherman, Chris and Price, Gary The invisible Web: uncovering information sources search engines can't see. Medford, N.J.: Information Today, 2001. ISBN 091096551X. (accompanying database at http://invisible-web.net)
•Search Engine Watch: http://www.searchenginewatch.com
•Search Engine Showdown: www.searchengineshowdown.com