![Page 1: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/1.jpg)
1
Open Access to Digital Libraries. Must Research Libraries be Expensive?
William Y. Arms
Department of Computer Science
Cornell University
![Page 2: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/2.jpg)
2
Before Digital Libraries
Access to scientific, medical, legal information
In the United States:
-- excellent if you belonged to a rich organization (e.g, a major university)
-- very poor otherwise
In many countries of the world:
-- very poor for everybody
![Page 3: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/3.jpg)
3
Research Libraries are Expensive
library materials
buildings & facilities
staff
![Page 4: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/4.jpg)
4
The Potential of Digital Libraries
materials
open access
buildings & facilities
staff
![Page 5: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/5.jpg)
5
Economic Models for Open Access
Who pays for open access to information?
![Page 6: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/6.jpg)
6
Two Fallacies
1. The Luddite Publishing Fallacy
Academic authors will never change. Prestige is determined by which journals a researcher publishes in. The prestigious journals make the rules.
2. The Free Lunch Fallacy
Web publishing costs nothing. Therefore groups of researchers should publish their own research. There is no need to waste money on publishers.
![Page 7: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/7.jpg)
7
Four Economic Models
Example: Broadcast Television
Open Access
Advertising network television
External funding public broadcasting
Restricted Access
Subscription cable
Pay-by-use pay-per-view
![Page 8: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/8.jpg)
8
Examples
Old New
Books in Print (subscription) Amazon.com (advertising)
Medline (pay-by-use) Grateful Med (external)
Journal (subscription) ePrint archives (external)
Westlaw (pay-by-use) Legal Information Institute (external)
Inspec (subscription) Google (advertising)
![Page 9: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/9.jpg)
9
Thoughts on the Future of Open Access
The dominant force is author pressure, which emphasizes open access rather than closed access.
1. A mixture of economic models will coexist.
2. Eventually, we will have open access to most scientific and professional information.
3. The most common economic model will be that information is published by the producing organization.
The producing organization may be a university (or part), a conference series, a laboratory, an association, etc.
![Page 10: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/10.jpg)
10
A New Role For Academic Libraries and Associations
Academic libraries and associations can provide support for open access information:
-- Establish standards for academic quality
-- Maintain local archives (e.g., M.I.T.'s archive of local research)
-- Protect and preserve for the long-term
![Page 11: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/11.jpg)
11
buildings & facilitiescomputers & networks
The Potential of Digital Libraries
materials
open access
staff
?
staff
![Page 12: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/12.jpg)
12
Automated Digital Libraries
How effectively can computers be used for the skilled tasks of professional librarianship?
-- Time horizon: 5 to 20 years
-- All materials in digital form
Computers cannot imitate intelligence. Can automated digital libraries provide equivalent services?
![Page 13: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/13.jpg)
13
Example: Catalogs and Indexes
Catalog, index and abstracting records are very expensive when created by skilled professionals
-- only available for certain categories of material (e.g., monographs, scientific journals)
-- contain limited fields of information (e.g., no contents page)
-- restricted to static information
![Page 14: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/14.jpg)
14
Equivalent Services: Catalogs and Indexes
Cataloguing rules
-- Application of cataloguing rules is skilled
-- It is hard to imagine a computer system with these skills
but ...
-- Cataloguing rules are the means, not the end
![Page 15: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/15.jpg)
15
Equivalent Services
Information discovery
I used to be a heavy user of Inspec. Now I use Google instead.
Why are web search services the most widely used information discovery tools in universities today?
![Page 16: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/16.jpg)
16
Conventional Criteria
Web search services have many weaknesses
-- selection is arbitrary
-- index records are crude
-- no authority control
-- duplicate detection is weak
-- search precision is deplorable
yet they clearly satisfy some users ...
![Page 17: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/17.jpg)
17
Effectiveness of Web Search
Why I use Google instead of Inspec:
=> Broader coverage
=> Better ranking
=> Immediate access to information (e.g., open access version of published paper)
Google is an equivalent service for information discovery (for some users)
![Page 18: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/18.jpg)
18
Simple Algorithms
+
Immense Computing Power
![Page 19: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/19.jpg)
19
Brute Force Computing
Few people really understand Moore's Law
-- Computing power doubles every 18 months
-- Increases 100 times in 10 years
-- Increases 10,000 times in 20 years
Simple algorithms + immense computing powermay outperform human intelligence
![Page 20: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/20.jpg)
20
Brute Force Computing
Example
Creators of the world champion chess program (Deep Thought later Deep Blue)
-- moderate chess players
-- simple tree-search algorithm
-- very, very fast computer hardware
![Page 21: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/21.jpg)
21
Examples of
Automated Digital Library Services
![Page 22: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/22.jpg)
22
Brute Force Computing:Web Search
Web search engines:
-- retrieve every page on the web
-- index every word
-- repeat every month
![Page 23: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/23.jpg)
23
Substitutes for Human Intelligence
Automated algorithms for information discovery
Closeness of match
-- vector space and statistical methods
(Salton, et al., c. 1970)
Importance of digital object
-- Google ranks web pages by how many other pages link to them
(NSF/DARPA/NASA Digital Libraries Initiative)
![Page 24: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/24.jpg)
24
Brute Force Computing: Archiving and Preservation
Internet Archive
-- Monthly, web crawler gathers every open access web page with associated images
-- Web pages are preserved for future generations
-- Files are available for scholarly research
![Page 25: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/25.jpg)
25
Brute Force Computing: Reference Linking
ResearchIndex (CiteSeer, ScienceIndex) (NEC)
-- fully automatic
-- all open access material in computer science
-- a free service
Contrast with the Web of Science (ISI)
-- input: combination of automatic means, skilled people
-- limited number of journals
-- very expensive
![Page 26: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/26.jpg)
26
Brute Force Computing: Automated Metadata Extraction
Informedia (Carnegie Mellon)
Automatic processing of segments of video, e.g., television news.
Algorithms for:
-- dividing raw video into discrete items
-- generating short summaries
-- indexing the sound track using speech recognition
-- recognizing faces
(NSF/DARPA/NASA Digital Libraries Initiative)
![Page 27: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/27.jpg)
27
Automating Interoperability
Example: Cornell University's Core System for the NSDL
(The National Science Foundation's digital library for science, mathematics, engineering and technology
education)
![Page 28: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/28.jpg)
28
Levels of Interoperability
A comprehensive science library:
The NSDL must provide coherent services across a vast range of materials managed by organizations with many objectives.
Three levels of interoperability:
Federation
Harvesting
Gathering
![Page 29: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/29.jpg)
29
Federation (e.g., Z39.50 and MARC)
Digital libraries that follow a full set of agreements form a federation.
Standards and agreements
-- Technical: formats, protocols, security systems, etc.
-- Content: data and metadata (including semantics)
-- Organizational: access, services, payment, authentication, etc.
Federations are desirable but very demanding and hence rare
![Page 30: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/30.jpg)
30
Gathering (e.g., Internet Archive, Google)
Gathering: service for open access information, even if information providers do not follow standard agreements:
-- web crawlers gather open access information
-- web search engines index it
-- automated services are possible (e.g., ResearchIndex)
Entirely automated
![Page 31: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/31.jpg)
31
Harvesting (e.g., Open Archives Initiative)
Digital libraries:
-- provide a brief metadata record for each item (e.g., minimal Dublin Core)
-- support a simple protocol for access to this metadata
Automated harvesters:
-- harvest the metadata automatically
-- build automated services
Mainly automated
![Page 32: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/32.jpg)
32
Costs and Benefits
![Page 33: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/33.jpg)
33
Costs of Automated Digital Libraries
The Google Company
-- 5.5 million searches daily
-- 85 people (half technical, 14 with Ph.D. in computing)
-- 2,500 PCs running Linux, with 80 terabytes of disk
The Internet Archive
-- 7 people plus support from Alexa
(March 2000)
![Page 34: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/34.jpg)
34
Overall
If you are rich ...
-- Research libraries, using commercial information services, provide excellent service at very high cost to a favored few
-- Automated digital libraries are far from providing the personal service available to a faculty member at a rich university
but ...
![Page 35: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/35.jpg)
35
The Model T Library
The Model T Ford, with mass production, brought car travel to the masses ...
-- Automated digital libraries, with open access materials, can already provide good service at low cost
-- In the future, automated digital libraries can bring scientific, scholarly, medical and legal information to everybody
![Page 36: 1 Open Access to Digital Libraries. Must Research Libraries be Expensive? William Y. Arms Department of Computer Science Cornell University](https://reader033.vdocument.in/reader033/viewer/2022061615/55195e1d550346aa698b4758/html5/thumbnails/36.jpg)
36
Some Light Reading
William Y. Arms, "Automated digital libraries." D-Lib Magazine, July/August 2000. http://www.dlib.org/dlib/july20/07contents.html
William Y. Arms, "Economic models for open-access publishing." iMP, March 2000. http://www.cisp.org/imp/march_2000/03_00arms.htm