metasearch vs harvesting and indexing
Post on 19-Jan-2015
3.363 Views
Preview:
DESCRIPTION
TRANSCRIPT
MetaSearch vsHarvesting andIndexing
Lukas KosterLibrary of the University of Amsterdam--http://commonplace.net2009
http://www.flickr.com/photos/donpezzano/3044975399/
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
So many databases to search
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
MetaSearch – Federated Search
Z39.50
SRU
Proprietary
SearchTranslate search syntax
MARC21
MARCXML
DC
ConversionMergingDeduplicationRanking(First 30 per DB)
Results
Database Connectors
MetaSearch tool Databases
Searching and Data fetching: One integrated interdependent on-the-fly procedure
Search Engine
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Technical bottlenecks
Z39.50
SRU
Proprietary
SearchTranslate search syntax
MARC21
MARCXML
DC
ConversionMergingDeduplicationRanking(First 30 per DB)
Results
Database Connectors
MetaSearch tool Databases
Connection
Access Authorisation
Search Engine
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Technical bottlenecks
Changes in Remote database server IP address Remote database server hostname Remote database server configuration Remote database authentication Firewall Database system Network
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
MetaSearch limitations
Differences in searches, indexes Author Subject Multiple languages
Speed (slowness) Limited number of searchable databases Not all results in first set Relevance
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Author searches
Variations in author name storage formats Henry James James, Henry James, H. H.James Which Henry James? Or is it: Henry, J./James Henry ?
Variations in supported search formats Only one? All of the above?
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Variations in author names
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Subject searches
Different qualification, keyword schemes per database LoC subject Headings Dutch Basic Classification Local subject schemes
Different use of subjects per database Cooking Cookery Food
Different use of subjects within one database
Errors
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Multilingual searches
All words searches Subject searches
English “cooking” Japanese “???”
Title searches Translations (We need FRBR!)
Author searches (historical names) See: Erasmus
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
All processing on the fly
Issues, dependent on each other: Speed (slowness) Limited number of searchable databases Not all results in first set Relevance
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Speed (slowness)
Dependent on 1. Search term transformation
2. Response time of external databases
3. Speed of internet connection
4. Conversion of results to presentation format
5. Merging of results
6. Deduplication of results
7. Relevance ranking
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Limited number of databases
Searching too many databases takes too long
Local processing time influenced by Merging (takes time) Deduplication (takes time) Ranking (takes time)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Not all results in first set
Merging, deduplication, ranking of all results takes too long
Only first 30 or so of each database are processed initially
Get more: next 30 per database are fetched and processed
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Relevance
Dependent on default sort order (relevance?, date?) of each external database
Dependent on default ranking mechanism of each database
Local ranking initially performed on first batches of 30 records per database
After additional fetching records, ranking is done again: Initial top results may go down
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Solution?
1. Don’t rank
2. Don’t deduplicate
3. Don’t merge (in advance)
If you don’t merge, there is no point in deduplicating or ranking!!
1. “Does not make much sense anyway”
2. “Does not work always anyway”3. “So, you have separate lists that
you can merge later on”
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Search with MetaSearch
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Translate search syntax on the fly
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Fetching results
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Conversion of results on the fly
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Conversion of results on the fly
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Conversion of results on the fly
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Results with MetaSearch
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Results with MetaSearch
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Harvesting and indexing
SearchNormalisingIndexingRanking
Results
Central index
H&I tool Databases
Harvesting
Searching and Data fetching: Two completely separate procedures
Search Engine
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Advantages of H&I
Speed No maximum number of
searchable databases All results in first set No differences in searches,
indexes Relevance Fewer technical bottlenecks
Central index always available in case of connection problem
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
H&I: Aquabrowser
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
H&I: Primo
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
MetaSearch = “Just in time”
Bookshop – Central Book Deposit
Always order on request Risk of logistics problems
http://www.flickr.com/photos/stijnnieuwendijk/125159282/
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
H&I = “just in case”
Bookshop with large stock Customers always find something Maybe not the most recent stuff
http://www.flickr.com/photos/brewbooks/2131521680/
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Images
• http://www.flickr.com/photos/donpezzano/3044975399/• http://www.flickr.com/photos/halighalie/663414371/• http://www.flickr.com/photos/notionscapital/2280408255/• http://www.flickr.com/photos/giveawayboy/2691195763/• http://www.flickr.com/photos/stijnnieuwendijk/125159282/• http://www.flickr.com/photos/brewbooks/2131521680/• http://www.flickr.com/photos/joshb/444529511/• http://www.flickr.com/photos/eaglelover2006/3168378578/• http://www.flickr.com/photos/robbie73/3387189144/• http://www.flickr.com/photos/saralparker/2602254206/• http://www.flickr.com/photos/manchesterlibrary/2034771121/• http://www.flickr.com/photos/bk/158637798/• http://www.flickr.com/photos/saamiam/3802869384/• http://www.flickr.com/photos/roboppy/37024023/• http://www.flickr.com/photos/stijnnieuwendijk/125159282/• http://www.flickr.com/photos/brewbooks/2131521680/
top related