documentse
DESCRIPTION
TRANSCRIPT
Searching and retrieval performance of search engines: a comparative study
Sangeeta NarangLibrarian, All India Institute of Medical
Sciences, New [email protected]
ABSTRACT
Objective: This study was undertaken to investigate the number of hits and types of sources of information retrieved by searching for sample queries drawn from real reference questions on the search engine over Internet.
Study design: The Google, Yahoo!, and Bing (previously MSN) search engines were
used to search for the single terms, multiple term queries, boolean logic, and phrase search capabilities. The first forty results retrieved for the “health literacy” search term was examined and thereafter, Information regarding the organization or individuals sponsoring the websites was compared based on domain study.
Results: Number of hits was high and sponsorship was significantly different
among each search engine. Many of the websites were Commercial, Non-Government agencies, few from Government organizations and Educational institutes.
Conclusion: As the Internet has become frequently used source of information,
Librarians should consider the best ways to disseminate best educational information to the users amidst insurmountable websites.
INTRODUCTION
Search engines are defined as a remotely accessible program that lets you do keyword searches for the information on the internet. There are several types of search engines the search may cover titles of documents, URL’s, headers or the full text.
(www.ameris.co.uk/glossary_of_terms.cfm)
COMPONENTS OF SEARCH ENGINE
Search engine constitute the web crawlers called the spiders or the robots .They visit remote sites download their contents for indexing. While indexing the program they create an outline of the document by stripping out all the headers and then takes the first 20 % or 20 lines whichever is smaller as an excerpt or abstract. Statistically more salient terms in the document are taken as keywords. In this way a highly efficient data structure, a tree/ Index is generated that is associated with the specific webpage. So whenever a user submits a query it is this inverted index that is searched. Various search engines accepts queries as a simple text and breaks the users text into a sequence of search terms.
TYPES OF SEARCH ENGINE
There are various types of search engines:
General search engines: They have their own index of documents and web pages which is generated by their web crawlers e.g. Yahoo, Google, Bing, Ask.com, Lycos, Altavista.
Metasearch engines: They combine the searches of multiple search engines and then deliver the results Mamma, Dogpile, Clusty, Kartoo
Specialty search engines: They have indexes on a specific field and provide an in depth coverage e.g. PubMed, Scirus.
MATERIALS AND METHODS
The following seven terms were chosen for study
Health, Literacy as single term, Health literacy as multiple term,Health AND Literacy used as boolean
operator, Role of health sciences librarian as phraseRole of librarians in health literacy as phraseHealth Literacy in quotes as combined search
FINDINGS
Term Google Yahoo Bing
Health 971000000 5760000000 615000000
Literacy 36300000 137000000 18500000
Health Literacy 4620000 54900000 17100000
Health and Literacy 4280000 55500000 17000000
"Health Literacy" 435000 1970000 22700000
Role of health sciences librarians 3780000 5880000 764000
Role of librarians + Literacy 2450000 1690000 500000
Number of hits retrieved for each search term
Graph
List of first ten websites retrieved for health Literacy by three search engines
Google Yahoo Bing
www.nlm.nih.gov/ en.wikipedia.org/ nnlm.gov/outreach/consumer/
en.wikipedia.org/wiki/Health literacy medindia.com/ www.iom.edu/
nnlm.gov/outreach/consumer/ www.healthliteracymonth.org www.healthliteracy.com
healthliteracy.worlded.org/ nnlm.gov/outreach/consumer/ www.hrsa.gov/healthliteracy
books.google.co.in/books? isbn www.healthliteracy.ie www.health.gov/communication/literacy2020ok.com/.../health-literacy-a prescription- www.medindia.net/news/ en.wikipedia.org/wiki/Health literacy
www.flipkart.com/...health-literacy. www.healthliteracy.ch/index.htm www.hsph.harvard.edu/healthliteracy
wiki.literacytent.org/ www.healthliteracyinnovations.com/ www.nlm.nih.gov/medlineplus/
video.google.com/video play www.hrsa.gov/healthliteracy www.iom.edu/?id=19750
www.nap.edu/catalog/ www.healthliteracy.com/ www.iha4health.org
Distributions of sponsorship of websites retrieved by search engines are
Domain Names Google Yahoo Bing
Government agencies 17.5% 20% 17.5%
Educational institutes 7.5% 7.5% 22.5%
Non government organizations 40% 40% 35%
Commercial 32.5% 22.5% 17.5%
Network organizations 2.5% 7.5% 5%
DISCUSSION
There is wide variation in Internet use and search strategy. Web users spend a lot of their time using
search engine to locate material on the vast and unorganized web. According to Visualization and Usability Center user survey about 85% of users use search engine to locate information. There is a strong competition among various search engines where each search engine is striving harder to outperform the other search engines either by expanding their coverage or by adding more features. Since search engine existence the merging, break over and takeover have been witnessed. For example Ask.com is formed by the merging of Ask Jeeves and Teoma.
In the study it was observed that each search engine gives different result for the search terms. By examining the single word search for Health and for Literacy the hits were more in case of Yahoo followed by Google and then Bing. While the multiword term Health Literacy again yahoo search results are more. But number of hits was less than single word search. When “health Literacy” in quotes is considered than Bing search engine shows more results while Google the least. Again boolean term Health and Literacy yahoo search results are more phrase search for Role of Health Sciences Librarian term Yahoo is ahead. When Role of librarians and the operator + with Literacy used Google leads in search results followed by Yahoo and Bing. It is observed that with the increase in numbers of query term the outcome is affected.
When the websites of the first forty hits for health literacy was studied it was observed that there are sponsored sites, commercial sites, nongovernmental sites, educational sites. The percentage of non government organizations and commercial organizations was high as compared to others.
Further it was found that there are changes in hits and number of returned documents over a period of time. The searches in yahoo India is different from yahoo also the number of hits is the estimate number of index terms. There does exist limitation for all the related websites are not possible to be studied, it is just the pilot study to seek the searching and retrieval behavior of search engine.
CONCLUSION
Single term queries return very high number of
hits in each search engine. Multiple terms lead to less number of hits. Secondly, Health is more sought out term over the search engine than Literacy. Finally librarians will play a very important role in refining the search strategy in finding the most appropriate source of information among abundant resources where sponsorship of websites has to be taken care off.
THANK YOU