maximizing online information retrieval: how theological librarians can best access the gnostic...

Maximizing Online Information Retrieval: How Theological Librarians Can Best Access the Gnostic Areas of the Internet

Libby Peterek, M.S.Info.St.Division of Instructional Innovation and Assessment

The University of Texas at Austin

Surface v. Deep Web

• Surface Web - estimated between 1% and 20% of Internet

• Deep Web - content commercial search engines can’t reach (i.e., Google and Yahoo)– Unindexed

Unindexed Web content

• Databases / dynamically generated content

• File types (Flash, php, etc.)

• Institution sites

• “Gated” content – Require password / registration

Theological Librarianship

• Underserved user group

• Specialized content– Hidden– Database driven– Newly added

• Potential to add richness to research

Mining the Deep Web

• Deep Web search engines

• Federated searching

• RSS

Deep Web Search Engines

• Look like commercial engines

• Utilize different algorithms

• Vary in quality and result relevance

• Many free, growing number fee-based and subscription-based– You get what you pay for…


• http://www.invisible-web.net

• http://www.dipsie.com/ (later this year)

• http://www.brightplanet.com– The leader and most expensive– Mainly competitive intelligence

• http://www.profusion.com/

http://www.invisible-web.net/

http://www.dipsie.com/

http://www.brightplanet.com/

http://www.profusion.com/

Deep Web Issues

• Deep Web search engines underdeveloped

• Many of the same issues as commercial engines– Wait for search engines to improve?

• Federated Searching• RSS

Federated Searching

• Programs written to connect catalogs and databases

• No need for same code

• Specialized search– Access to different information– Aggregated based on user preference– One simple interface

Federated Searching

• Theological library databases, listservs, and indexes– Different form of content management– Access to all the tools available

Sherlock

Indeed

Library Use

• New York State Library Pilot Project• http://www.nysl.nysed.gov/library/novel/pilot/

• University of Toronto & British Columbia– Endeavor ENCompass

• http://www.endinfosys.com/

• Library of Congress vendor list• http://www.loc.gov/catdir/lcpaig/portalproducts.

html

http://www.nysl.nysed.gov/library/novel/pilot/

http://www.endinfosys.com/

http://www.loc.gov/catdir/lcpaig/portalproducts.html

http://www.loc.gov/catdir/lcpaig/portalproducts.html

Federated Searching Issues

• Need access to databases– Owned or agreed

• Can be expensive– Divide cost among interested parties or

content holders

RSS

• Really Simple Syndication

• Rich Site Summary

• RDF Site Summary

• Comparable to personalized library “alerts”

RSS

• Application of eXtensible Markup Language (XML), using W3C’s Resource Description Framework (RDF)

• What does this mean?– Metadata meets hyperlinks– Automates tasks

How RSS is used

• Feeds combine metadata and links– “Syndicate (XML)” or

• Typical sites with RSS– News– Blogs

• Explosion of “bloggers” opens arena for valid material from a wide user base and links to relevant resources

UT & RSS

• UT Austin strongly considering campus-wide blogging initiative– Content management– Content sharing– Archive– RSS

Aggregating RSS Feeds

• Browsers– Mozilla Firefox (Mac & PC)– Safari (Mac)

• Aggregators / News Readers (full list)– NetNewsWire Lite (Mac)

• Email

• Web

http://allrss.com/rssreaders.html

NetNewsWire Lite

How it works

• Library Jobs RSS feed from Chronicle of Higher Education– Blog– Organization site

• Elf– Library borrower RSS

http://blog.libraryassociates.com/?q=aggregator/sources/8

http://chronicle.com/jobs/100/600/6500/

Feedster

• RSS search engine

• Generates a unique RSS feed for each search to copy to an aggregator

• Notifications each time your subject is updated

• The better your search terms, the better your results

Feedster

Why RSS at your library?

• Two-way information exchange– Information retrieval and dissemination

• For patrons and librarians

– Filter information overload• You designate the boundaries

– Time sensitive• Be notified first when something is posted in

your area of interest

Online Content / Search Issues

• Information creation and migration speeds

• Standards - or lack thereof

• Competition v. collaboration

Looking forward

• Deep Web diminishing– XML– Commercial search engines

• Sophistication• File types

– Internet publishing increasing• More care about pages being indexed• Links

Sources

Bergman, M. 2001. The Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing. http://www.press.umich.edu/jep/07-01/bergman.html

BrightPlanet. Deep Web FAQs. http://www.brightplanet.com/deepcontent/deep_web_faq.asp

Devine, J. and Egger-Sider, F. 2004. Beyond Google:The Invisible Web in the Academic Library. The Journal of Academic Librarianship. 30(4), 265-269.

Olsen, S. 2004. Yahoo crawls deep into the Web. http://news.com.com/2100-1024-5167931.html

Smith, C. Invisible Web. http://www.libraryspot.com/features/invisibleweb.htmWired. 2005. Surfing the Deep Web.

http://www.wired.com/news/business/0,1367,67883,00.htmlUniversity at Albany. 2005. The Deep Web.

http://library.albany.edu/internet/deepweb.htmlWebster, P. 2004. Breaking Down Information Silos. Online. 30-34.Wright, A. 2004. In Search of the Deep Web. Salon.

http://www.salon.com/tech/feature/2004/03/09/deep_web/index_np.html

http://www.salon.com/tech/feature/2004/03/09/deep_web/index_np.html

Questions?

Libby Peterek

[email protected]

http://www.ischool.utexas.edu/~libby/atla

maximizing online information retrieval: how theological librarians can best access the gnostic...

Documents

search terms

fordeep web search engineshttp

deep websurface web

resultsfeedsterfeedsterwhy

unique rss feed

content holdersrss

workslibrary jobs rss

wide user base