hyper-searching the web. search engines basic search (index) cluster search (themes) meta-search...
TRANSCRIPT
Hyper-Searching the Web
Search Engines
Basic Search(index)
Cluster Search(themes)
Meta-search(outsource)
“Smarter” meta-search(themes + outsource)
Basic search engine
• Examples: AltaVista, InfoSeek, HotBot, Lycos, Excite, Google, etc
• Maintains an index for every word found
• Processes through crawling, indexing, and returning results
Basic search engine
• Different ranking systems used -most use heuristics (easiest solution) counts # of keywords that appear
-Google uses PageRank
Basic search engine
• No idea of searcher’s intent so “best” result hard to achieve
• Problems with synonymy and polysemy ex. car and automobile ex. jaguar
• One solution: store semantic relations -only can help w/synonmy
• Can’t identify concepts/author intent ex. IBM site does not say “computer”
Cluster search engine
• Example: Clusty
• Clusters results into categories/themes
• Can show results that would be ranked lower in another search engine -due to different meanings in words, can show the less searched-for
Meta-search engine
• Examples: Dogpile, Surfwax, Copernic, etc• Sends searcher’s query to a database of
search engines• Claimed to not be any better than
database; often the referenced search engines are small, free, commercial
• Users can create their own on Google of up to 5,000 URLs as “database”
“Smarter” meta-search engine
• Example: Clever project (n/a online yet)• Includes clustering and linguistic analysis
“cat”
AltaVista Yahoo
Clever“cat”
“cat”
Cat – feline
Cat – power
Cat – equipment
Cat – scans
etc.
The Clever Project
• Uses hyperlinks to locate hubs and authorities
“a respected authority is a page that is referred to by many good hubs; a useful hub is a location that points to many valuable authorities”
The Clever Project
• Obtains a list of webpages from a standard index & follows hyperlinks to increase own database
-resulting collection = “root set” -each page gets numerical hub & authority score
The Clever Project
• Similar to PageRank in determining method – guesses & constant calculations -useful by-product: clusters sites
• Adds to competition because competitors don’t have to acknowledge their competition through hyperlinks
Clever vs. Google
GOOGLE - gives initial rankings
- keeps pages indpt. of queries
- faster
- looks forward “link to link”
CLEVER - root sets per keyword
- page priority through query context
- forwards & backwards “hub and authority”
- sometimes too broad ex. Fallingwater