metasearch vs harvesting and indexing
DESCRIPTION
A comparison between metasearch/federated search and harvesting & indexing in libraries.TRANSCRIPT
![Page 1: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/1.jpg)
MetaSearch vsHarvesting andIndexing
Lukas KosterLibrary of the University of Amsterdam--http://commonplace.net2009
http://www.flickr.com/photos/donpezzano/3044975399/
![Page 2: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/2.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
So many databases to search
![Page 3: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/3.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
MetaSearch – Federated Search
Z39.50
SRU
Proprietary
SearchTranslate search syntax
MARC21
MARCXML
DC
ConversionMergingDeduplicationRanking(First 30 per DB)
Results
Database Connectors
MetaSearch tool Databases
Searching and Data fetching: One integrated interdependent on-the-fly procedure
Search Engine
![Page 4: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/4.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Technical bottlenecks
Z39.50
SRU
Proprietary
SearchTranslate search syntax
MARC21
MARCXML
DC
ConversionMergingDeduplicationRanking(First 30 per DB)
Results
Database Connectors
MetaSearch tool Databases
Connection
Access Authorisation
Search Engine
![Page 5: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/5.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Technical bottlenecks
Changes in Remote database server IP address Remote database server hostname Remote database server configuration Remote database authentication Firewall Database system Network
![Page 6: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/6.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
MetaSearch limitations
Differences in searches, indexes Author Subject Multiple languages
Speed (slowness) Limited number of searchable databases Not all results in first set Relevance
![Page 7: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/7.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Author searches
Variations in author name storage formats Henry James James, Henry James, H. H.James Which Henry James? Or is it: Henry, J./James Henry ?
Variations in supported search formats Only one? All of the above?
![Page 8: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/8.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Variations in author names
![Page 9: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/9.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Subject searches
Different qualification, keyword schemes per database LoC subject Headings Dutch Basic Classification Local subject schemes
Different use of subjects per database Cooking Cookery Food
Different use of subjects within one database
Errors
![Page 10: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/10.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Multilingual searches
All words searches Subject searches
English “cooking” Japanese “???”
Title searches Translations (We need FRBR!)
Author searches (historical names) See: Erasmus
![Page 11: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/11.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
All processing on the fly
Issues, dependent on each other: Speed (slowness) Limited number of searchable databases Not all results in first set Relevance
![Page 12: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/12.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Speed (slowness)
Dependent on 1. Search term transformation
2. Response time of external databases
3. Speed of internet connection
4. Conversion of results to presentation format
5. Merging of results
6. Deduplication of results
7. Relevance ranking
![Page 13: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/13.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Limited number of databases
Searching too many databases takes too long
Local processing time influenced by Merging (takes time) Deduplication (takes time) Ranking (takes time)
![Page 14: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/14.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Not all results in first set
Merging, deduplication, ranking of all results takes too long
Only first 30 or so of each database are processed initially
Get more: next 30 per database are fetched and processed
![Page 15: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/15.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Relevance
Dependent on default sort order (relevance?, date?) of each external database
Dependent on default ranking mechanism of each database
Local ranking initially performed on first batches of 30 records per database
After additional fetching records, ranking is done again: Initial top results may go down
![Page 16: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/16.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Solution?
1. Don’t rank
2. Don’t deduplicate
3. Don’t merge (in advance)
If you don’t merge, there is no point in deduplicating or ranking!!
1. “Does not make much sense anyway”
2. “Does not work always anyway”3. “So, you have separate lists that
you can merge later on”
![Page 17: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/17.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Search with MetaSearch
![Page 18: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/18.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Translate search syntax on the fly
![Page 19: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/19.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Fetching results
![Page 20: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/20.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Conversion of results on the fly
![Page 21: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/21.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Conversion of results on the fly
![Page 22: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/22.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Conversion of results on the fly
![Page 23: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/23.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Results with MetaSearch
![Page 24: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/24.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Results with MetaSearch
![Page 25: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/25.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Harvesting and indexing
SearchNormalisingIndexingRanking
Results
Central index
H&I tool Databases
Harvesting
Searching and Data fetching: Two completely separate procedures
Search Engine
![Page 26: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/26.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Advantages of H&I
Speed No maximum number of
searchable databases All results in first set No differences in searches,
indexes Relevance Fewer technical bottlenecks
Central index always available in case of connection problem
![Page 27: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/27.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
H&I: Aquabrowser
![Page 28: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/28.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
H&I: Primo
![Page 29: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/29.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
MetaSearch = “Just in time”
Bookshop – Central Book Deposit
Always order on request Risk of logistics problems
http://www.flickr.com/photos/stijnnieuwendijk/125159282/
![Page 30: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/30.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
H&I = “just in case”
Bookshop with large stock Customers always find something Maybe not the most recent stuff
http://www.flickr.com/photos/brewbooks/2131521680/
![Page 31: MetaSearch vs Harvesting and Indexing](https://reader036.vdocument.in/reader036/viewer/2022062417/54bc9c494a7959915c8b45eb/html5/thumbnails/31.jpg)
MetaSearch vs Harvesting and Indexing - Lukas Koster - 2009
Images
• http://www.flickr.com/photos/donpezzano/3044975399/• http://www.flickr.com/photos/halighalie/663414371/• http://www.flickr.com/photos/notionscapital/2280408255/• http://www.flickr.com/photos/giveawayboy/2691195763/• http://www.flickr.com/photos/stijnnieuwendijk/125159282/• http://www.flickr.com/photos/brewbooks/2131521680/• http://www.flickr.com/photos/joshb/444529511/• http://www.flickr.com/photos/eaglelover2006/3168378578/• http://www.flickr.com/photos/robbie73/3387189144/• http://www.flickr.com/photos/saralparker/2602254206/• http://www.flickr.com/photos/manchesterlibrary/2034771121/• http://www.flickr.com/photos/bk/158637798/• http://www.flickr.com/photos/saamiam/3802869384/• http://www.flickr.com/photos/roboppy/37024023/• http://www.flickr.com/photos/stijnnieuwendijk/125159282/• http://www.flickr.com/photos/brewbooks/2131521680/