digital libraries: study into the features of the dspace suite devika p. madalli documentation...
TRANSCRIPT
Digital Libraries: Study into the features of the DSpace Suite
Devika P. Madalli
Documentation Research and Training CentreIndian Statistical Institute
Bangalore 560059
2
Introduction
Digital libraries encompass a whole range of information services related work such as– Organization of digital information– Information retrieval– User interface– Archiving and preservation– Services and social issues– Evaluation and applications to particular areas
3
Desirable Features of DL Software
• Structures
• Accessible
• Searchable
• Extensible
• Massive
• Heterogeneous
• Persistent
4
DL’s operation should be examined under…
• Architectural design – Modular and Open
• Backend Database – scalable, robust, data formats
• Network capabilities – web-based and seamless operations, persistent Ids, security and authentication
• Metadata and Interoperability – compatible with world standards such as Dublin Core and OAI-PMH
5
Technical Issues
• Open source software Vs Commercial OS
• Hardware and peripheral requirements
• Network Components
• Standards – data formats, metadata, network, access, interoperability, encoding
6
Approaches to Building DL
• Digitization – retro-conversion of non-digital resources to digital
• Digitally born resources – involves inter-conversion to standard formats and storage
7
Why DSpace Digital Library
• An open source technology platform which can be customized and its capabilities can be extended
• A service model for open access and/or digital archiving for perpetual access
• A platform to build an Institutional Repository and the collections are searchable and retrievable by/on the Web
• To make available institution-based scholarly material in digital formats. The collection will be open and interoperable.
DSpace is
8
Architecture and System Requirement
The DSpace system is organized into three layers
– The Storage Layer: responsible for physical storage of metadata and content
– The Business Layer: deals with managing the content of the archive, users of the archive (e-people), authorization, and workflow
– The Application Layer: containing components that communicate with the networked world outside of the individual DSpace installation,
• for example the Web user interface and the modules for metadata harvesting service
Features of a near ideal DL
• Low cost, including all hardware and software components
• Technically simple to install and manage• Robust• Scalable• Open and inter-operable• Modular• User Friendly• Multi-user (including both searching and
maintenance) • Multimedia digital object enabled• Platform independent (including both client and
server components) interoperable
What is DSpace?
• Digital Object management system
• Create, search and retrieve digital objects
• Facilitate preservation of digital objects
• An open source software
• Allows open access and digital archiving
• Allows building Institutional Repositories
H/W and S/W requirements
• UNIX recommended (Java-based program should run on anything)
• Open source, built on Apache web server and Tomcat Servlet engine
• Uses postgreSQL or Oracle relational database
What DSpace can do?
• Captures– Digital content in any formats directly from creators
(e.g. researcher, authors)
• Describes– Descriptive, technical, rights metadata– Persistent identifiers
• OAI-PMH version 2.0 compliant– Allow metadata creation
Possible types of Content
• Preprints, articles• Postprints • Technical Reports• Conference Papers• Theses/Dissertations• Datasets
– e.g. statistical, geospatial, scientific
• Images– visual, scientific, etc.
• Audio files
• Video files
• Digitized library collections
Formats of Content
File Formats
Supported: Repository administrator can inform the submitters which file formats will be supported in the future by his organization
Known: recognizes the format, but cannot guarantee full support
Unsupported: cannot recognize a format; these will be listed as "application/octet-stream", -- Unknown
Information Model
• Communities – Departments, Labs, Research Centers, Schools…
• Collections • Items • Files (bitstreams)
– Multiple formats - same content– Complex objects – multiple files
Intellectual Property
• Click-through license during submission
• Grants DSpace non-exclusive right to acquire, manage, preserve, distribute the item
• Does not grant DSpace copyright
• Copy of license stored with item
Goodies
• Modular architecture, well-defined APIs
• 100% open source– Programmed in java– RDBMS and SQL for metadata
• CNRI “handles” for persistent identifiers
• OpenURL linking
• OAI-PMH for exposing metadata
Backend Technology
• Apache, Tomcat, OpenSSL/mod_ssl
• Java
• PostgreSQL/Oracle
• CNRI Handle System 5 (persistent ids)
• Lucene Search Engine
Standards
• Dublin Core only– Descriptive metadata only
• OAI-PMH v 2.0 (Open Archive’s Initiative Protocol for metadata harvesting)
• UNICODE Compliant
Capabilities
• Exports in XML format
• Supports crosswalks through OAI-PMH– DC (Dublin Core)– Qualified DC– METS (Metadata Encoding and Transmission Standard– MODS (Metadata Object Description Schema – sibling
of MARCXML)
• Can be extended to any Metadata Schema
Customization
• Screens (Manakin)• E-mails• Any language interface• Metadata• Input-forms• Display of results• Fields to be Indexed• Access restrictions• License (in addition to Creative Commons)
Advanced Feature
• Grid Compliant (Storage)• LDAP authentication• Usage statistics generation• SFX Server integration• RSS (Really Simple Syndication)• Item Recommendation to a friend• Use of Thesaurus (though not OWL/SKOS/RDF)• Full-text indexing of PDF, MS-WORD files
Important Sites
• http://www.dspace.org• http://www.sourceforge.net/projects/dspace• http://wiki.dspace.org• http://mailman.mit.edu/mailman/listinfo/dspace-
general• http://lists.sourceforge.net/lists/listinfo/dspace-tech• http://lists.sourceforge.net/lists/listinfo/dspace-
devel
DRTC Sites
• https://drtc.isibang.ac.in (Librarians' Digital Library)• http://drtc.isibang.ac.in/dlrg (Discussion Forum)• http://drtc.isibang.ac.in/sdl (Harvester in LIS)• http://drtc.isibang.ac.in• http://drtc.isibang.ac.in/blog