maximizing online information retrieval: how theological librarians can best access the gnostic...
TRANSCRIPT
Maximizing Online Information Retrieval: How Theological Librarians Can Best Access the Gnostic Areas of the Internet
Libby Peterek, M.S.Info.St.Division of Instructional Innovation and Assessment
The University of Texas at Austin
Surface v. Deep Web
• Surface Web - estimated between 1% and 20% of Internet
• Deep Web - content commercial search engines can’t reach (i.e., Google and Yahoo)– Unindexed
Unindexed Web content
• Databases / dynamically generated content
• File types (Flash, php, etc.)
• Institution sites
• “Gated” content – Require password / registration
Theological Librarianship
• Underserved user group
• Specialized content– Hidden– Database driven– Newly added
• Potential to add richness to research
Mining the Deep Web
• Deep Web search engines
• Federated searching
• RSS
Deep Web Search Engines
• Look like commercial engines
• Utilize different algorithms
• Vary in quality and result relevance
• Many free, growing number fee-based and subscription-based– You get what you pay for…
Deep Web Search Engines
• http://www.invisible-web.net
• http://www.dipsie.com/ (later this year)
• http://www.brightplanet.com– The leader and most expensive– Mainly competitive intelligence
• http://www.profusion.com/
Deep Web Search Engines
Deep Web Search Engines
Deep Web Issues
• Deep Web search engines underdeveloped
• Many of the same issues as commercial engines– Wait for search engines to improve?
• Federated Searching• RSS
Federated Searching
• Programs written to connect catalogs and databases
• No need for same code
• Specialized search– Access to different information– Aggregated based on user preference– One simple interface
Federated Searching
• Theological library databases, listservs, and indexes– Different form of content management– Access to all the tools available
Sherlock
Indeed
Library Use
• New York State Library Pilot Project• http://www.nysl.nysed.gov/library/novel/pilot/
• University of Toronto & British Columbia– Endeavor ENCompass
• http://www.endinfosys.com/
• Library of Congress vendor list• http://www.loc.gov/catdir/lcpaig/portalproducts.
html
Federated Searching Issues
• Need access to databases– Owned or agreed
• Can be expensive– Divide cost among interested parties or
content holders
RSS
• Really Simple Syndication
• Rich Site Summary
• RDF Site Summary
• Comparable to personalized library “alerts”
RSS
• Application of eXtensible Markup Language (XML), using W3C’s Resource Description Framework (RDF)
• What does this mean?– Metadata meets hyperlinks– Automates tasks
How RSS is used
• Feeds combine metadata and links– “Syndicate (XML)” or
• Typical sites with RSS– News– Blogs
• Explosion of “bloggers” opens arena for valid material from a wide user base and links to relevant resources
UT & RSS
• UT Austin strongly considering campus-wide blogging initiative– Content management– Content sharing– Archive– RSS
Aggregating RSS Feeds
• Browsers– Mozilla Firefox (Mac & PC)– Safari (Mac)
• Aggregators / News Readers (full list)– NetNewsWire Lite (Mac)
• Web
NetNewsWire Lite
How it works
• Library Jobs RSS feed from Chronicle of Higher Education– Blog– Organization site
• Elf– Library borrower RSS
Feedster
• RSS search engine
• Generates a unique RSS feed for each search to copy to an aggregator
• Notifications each time your subject is updated
• The better your search terms, the better your results
Feedster
Feedster
Why RSS at your library?
• Two-way information exchange– Information retrieval and dissemination
• For patrons and librarians
– Filter information overload• You designate the boundaries
– Time sensitive• Be notified first when something is posted in
your area of interest
Online Content / Search Issues
• Information creation and migration speeds
• Standards - or lack thereof
• Competition v. collaboration
Looking forward
• Deep Web diminishing– XML– Commercial search engines
• Sophistication• File types
– Internet publishing increasing• More care about pages being indexed• Links
Sources
Bergman, M. 2001. The Deep Web: Surfacing Hidden Value. Journal of Electronic Publishing. http://www.press.umich.edu/jep/07-01/bergman.html
BrightPlanet. Deep Web FAQs. http://www.brightplanet.com/deepcontent/deep_web_faq.asp
Devine, J. and Egger-Sider, F. 2004. Beyond Google:The Invisible Web in the Academic Library. The Journal of Academic Librarianship. 30(4), 265-269.
Olsen, S. 2004. Yahoo crawls deep into the Web. http://news.com.com/2100-1024-5167931.html
Smith, C. Invisible Web. http://www.libraryspot.com/features/invisibleweb.htmWired. 2005. Surfing the Deep Web.
http://www.wired.com/news/business/0,1367,67883,00.htmlUniversity at Albany. 2005. The Deep Web.
http://library.albany.edu/internet/deepweb.htmlWebster, P. 2004. Breaking Down Information Silos. Online. 30-34.Wright, A. 2004. In Search of the Deep Web. Salon.
http://www.salon.com/tech/feature/2004/03/09/deep_web/index_np.html