documentse

13
Searching and retrieval performance of search engines: a comparative study Sangeeta Narang Librarian, All India Institute of Medical Sciences, New Delhi [email protected]

Upload: guest042d47

Post on 12-Nov-2014

894 views

Category:

Education


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: DocumentSe

Searching and retrieval performance of search engines: a comparative study

Sangeeta NarangLibrarian, All India Institute of Medical

Sciences, New [email protected]

 

Page 2: DocumentSe

ABSTRACT

Objective: This study was undertaken to investigate the number of hits and types of sources of information retrieved by searching for sample queries drawn from real reference questions on the search engine over Internet.

 Study design: The Google, Yahoo!, and Bing (previously MSN) search engines were

used to search for the single terms, multiple term queries, boolean logic, and phrase search capabilities. The first forty results retrieved for the “health literacy” search term was examined and thereafter, Information regarding the organization or individuals sponsoring the websites was compared based on domain study.

 Results: Number of hits was high and sponsorship was significantly different

among each search engine. Many of the websites were Commercial, Non-Government agencies, few from Government organizations and Educational institutes.

 Conclusion: As the Internet has become frequently used source of information,

Librarians should consider the best ways to disseminate best educational information to the users amidst insurmountable websites.

Page 3: DocumentSe

INTRODUCTION

Search engines are defined as a remotely accessible program that lets you do keyword searches for the information on the internet. There are several types of search engines the search may cover titles of documents, URL’s, headers or the full text.

(www.ameris.co.uk/glossary_of_terms.cfm)

Page 4: DocumentSe

COMPONENTS OF SEARCH ENGINE

  Search engine constitute the web crawlers called the spiders or the robots .They visit remote sites download their contents for indexing. While indexing the program they create an outline of the document by stripping out all the headers and then takes the first 20 % or 20 lines whichever is smaller as an excerpt or abstract. Statistically more salient terms in the document are taken as keywords. In this way a highly efficient data structure, a tree/ Index is generated that is associated with the specific webpage. So whenever a user submits a query it is this inverted index that is searched. Various search engines accepts queries as a simple text and breaks the users text into a sequence of search terms.

Page 5: DocumentSe

TYPES OF SEARCH ENGINE

There are various types of search engines:

General search engines: They have their own index of documents and web pages which is generated by their web crawlers e.g. Yahoo, Google, Bing, Ask.com, Lycos, Altavista.

Metasearch engines: They combine the searches of multiple search engines and then deliver the results Mamma, Dogpile, Clusty, Kartoo

Specialty search engines: They have indexes on a specific field and provide an in depth coverage e.g. PubMed, Scirus.

Page 6: DocumentSe

MATERIALS AND METHODS

The following seven terms were chosen for study

Health, Literacy as single term, Health literacy as multiple term,Health AND Literacy used as boolean

operator, Role of health sciences librarian as phraseRole of librarians in health literacy as phraseHealth Literacy in quotes as combined search

Page 7: DocumentSe

FINDINGS

Term Google Yahoo Bing

Health 971000000 5760000000 615000000

Literacy 36300000 137000000 18500000

Health Literacy 4620000 54900000 17100000

Health and Literacy 4280000 55500000 17000000

"Health Literacy" 435000 1970000 22700000

Role of health sciences librarians 3780000 5880000 764000

Role of librarians + Literacy 2450000 1690000 500000

Number of hits retrieved for each search term

Page 8: DocumentSe

Graph

Page 9: DocumentSe

List of first ten websites retrieved for health Literacy by three search engines

Google Yahoo Bing

www.nlm.nih.gov/ en.wikipedia.org/ nnlm.gov/outreach/consumer/

en.wikipedia.org/wiki/Health literacy medindia.com/ www.iom.edu/

nnlm.gov/outreach/consumer/ www.healthliteracymonth.org www.healthliteracy.com

healthliteracy.worlded.org/ nnlm.gov/outreach/consumer/ www.hrsa.gov/healthliteracy

books.google.co.in/books? isbn www.healthliteracy.ie www.health.gov/communication/literacy2020ok.com/.../health-literacy-a prescription- www.medindia.net/news/ en.wikipedia.org/wiki/Health literacy

www.flipkart.com/...health-literacy. www.healthliteracy.ch/index.htm www.hsph.harvard.edu/healthliteracy

wiki.literacytent.org/ www.healthliteracyinnovations.com/ www.nlm.nih.gov/medlineplus/

video.google.com/video play www.hrsa.gov/healthliteracy www.iom.edu/?id=19750

www.nap.edu/catalog/ www.healthliteracy.com/ www.iha4health.org

Page 10: DocumentSe

Distributions of sponsorship of websites retrieved by search engines are

Domain Names Google Yahoo Bing

Government agencies 17.5% 20% 17.5%

Educational institutes 7.5% 7.5% 22.5%

Non government organizations 40% 40% 35%

Commercial 32.5% 22.5% 17.5%

Network organizations 2.5% 7.5% 5%

Page 11: DocumentSe

DISCUSSION

  There is wide variation in Internet use and search strategy. Web users spend a lot of their time using

search engine to locate material on the vast and unorganized web. According to Visualization and Usability Center user survey about 85% of users use search engine to locate information. There is a strong competition among various search engines where each search engine is striving harder to outperform the other search engines either by expanding their coverage or by adding more features. Since search engine existence the merging, break over and takeover have been witnessed. For example Ask.com is formed by the merging of Ask Jeeves and Teoma.

In the study it was observed that each search engine gives different result for the search terms. By examining the single word search for Health and for Literacy the hits were more in case of Yahoo followed by Google and then Bing. While the multiword term Health Literacy again yahoo search results are more. But number of hits was less than single word search. When “health Literacy” in quotes is considered than Bing search engine shows more results while Google the least. Again boolean term Health and Literacy yahoo search results are more phrase search for Role of Health Sciences Librarian term Yahoo is ahead. When Role of librarians and the operator + with Literacy used Google leads in search results followed by Yahoo and Bing. It is observed that with the increase in numbers of query term the outcome is affected.

When the websites of the first forty hits for health literacy was studied it was observed that there are sponsored sites, commercial sites, nongovernmental sites, educational sites. The percentage of non government organizations and commercial organizations was high as compared to others.

Further it was found that there are changes in hits and number of returned documents over a period of time. The searches in yahoo India is different from yahoo also the number of hits is the estimate number of index terms. There does exist limitation for all the related websites are not possible to be studied, it is just the pilot study to seek the searching and retrieval behavior of search engine.

Page 12: DocumentSe

CONCLUSION

  Single term queries return very high number of

hits in each search engine. Multiple terms lead to less number of hits. Secondly, Health is more sought out term over the search engine than Literacy. Finally librarians will play a very important role in refining the search strategy in finding the most appropriate source of information among abundant resources where sponsorship of websites has to be taken care off.

 

Page 13: DocumentSe

THANK YOU