dspace manual

735
DSpace 3.x Documentation Page of 1 735 DSpace 3.x Documentation URL: Date: Author: The DSpace Developer Team 30 November 2012 https://wiki.duraspace.org/display/DSDOC3x

Upload: joel-sanchez

Post on 11-Nov-2014

135 views

Category:

Documents


6 download

TRANSCRIPT

DSpace 3.x Documentation

DSpace 3.x Documentation

Author: Date: URL:

The DSpace Developer Team 30 November 2012 https://wiki.duraspace.org/display/DSDOC3x

Page 1 of 735

DSpace 3.x Documentation

Table of Contents1 Preface _____________________________________________________________________________ 13 1.1 Release Notes ____________________________________________________________________ 13 2 Introduction __________________________________________________________________________ 17 3 Functional Overview ___________________________________________________________________ 19 3.1 Data Model ______________________________________________________________________ 20 3.2 Plugin Manager ___________________________________________________________________ 22 3.3 Metadata ________________________________________________________________________ 22 3.4 Packager Plugins _________________________________________________________________ 23 3.5 Crosswalk Plugins _________________________________________________________________ 24 3.6 E-People and Groups ______________________________________________________________ 24 3.6.1 E-Person __________________________________________________________________ 24 3.6.2 Groups ____________________________________________________________________ 25 3.7 Authentication ____________________________________________________________________ 25 3.8 Authorization _____________________________________________________________________ 25 3.9 Ingest Process and Workflow ________________________________________________________ 27 3.9.1 Workflow Steps _____________________________________________________________ 28 3.10 Supervision and Collaboration _______________________________________________________ 29 3.11 Handles _________________________________________________________________________ 29 3.12 Bitstream 'Persistent' Identifiers ______________________________________________________ 30 3.13 Storage Resource Broker (SRB) Support _______________________________________________ 31 3.14 Search and Browse ________________________________________________________________ 31 3.15 HTML Support ____________________________________________________________________ 32 3.16 OAI Support ______________________________________________________________________ 33 3.17 SWORD Support __________________________________________________________________ 33 3.18 OpenURL Support _________________________________________________________________ 33 3.19 Creative Commons Support _________________________________________________________ 34 3.20 Subscriptions _____________________________________________________________________ 34 3.21 Import and Export _________________________________________________________________ 34 3.22 Registration ______________________________________________________________________ 34 3.23 Statistics ________________________________________________________________________ 35 3.23.1 System Statistics ____________________________________________________________ 35 3.23.2 Item, Collection and Community Usage Statistics ___________________________________ 35 3.24 Checksum Checker ________________________________________________________________ 36 3.25 Usage Instrumentation _____________________________________________________________ 36 3.26 Choice Management and Authority Control _____________________________________________ 36 3.26.1 Introduction and Motivation ____________________________________________________ 37 4 Installation ___________________________________________________________________________ 39 4.1 For the Impatient __________________________________________________________________ 40 4.2 Prerequisite Software ______________________________________________________________ 40

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 2 of 735

DSpace 3.x Documentation 4.2.1 UNIX-like OS or Microsoft Windows _____________________________________________ 40 4.2.2 Oracle Java JDK 6 or 7 (standard SDK is fine, you don't need J2EE) or OpenJDK 6 or 7 ____ 41 4.2.3 Apache Maven 2.2.x or higher (Java build tool) _____________________________________ 41 4.2.4 Apache Ant 1.8 or later (Java build tool) __________________________________________ 42 4.2.5 Relational Database: (PostgreSQL or Oracle) ______________________________________ 42 4.2.6 Servlet Engine (Apache Tomcat 5.5 or later, Jetty, Caucho Resin or equivalent) ___________ 43 4.2.7 Perl (only required for [dspace]/bin/dspace-info.pl) __________________________________ 44 4.3 Installation Instructions _____________________________________________________________ 44 4.3.1 Overview of Install Options ____________________________________________________ 44 4.3.2 Overview of DSpace Directories ________________________________________________ 45 4.3.3 Installation _________________________________________________________________ 46 4.4 Advanced Installation ______________________________________________________________ 54 4.4.1 'cron' Jobs _________________________________________________________________ 54 4.4.2 Multilingual Installation ________________________________________________________ 55 4.4.3 DSpace over HTTPS _________________________________________________________ 55 4.4.4 The Handle Server ___________________________________________________________ 60 4.4.5 Google and HTML sitemaps ___________________________________________________ 61 4.4.6 Statistics ___________________________________________________________________ 62 4.5 Windows Installation _______________________________________________________________ 62 4.6 Checking Your Installation ___________________________________________________________ 63 4.7 Known Bugs _____________________________________________________________________ 63 4.8 Common Problems ________________________________________________________________ 63 4.8.1 Common Installation Issues ____________________________________________________ 64 4.8.2 General DSpace Issues _______________________________________________________ 66 5 Upgrading a DSpace Installation __________________________________________________________ 68 5.1 Upgrading From 1.8.x to 3.x _________________________________________________________ 68 5.1.1 Backup your DSpace _________________________________________________________ 69 5.1.2 Upgrade Steps ______________________________________________________________ 69 5.2 Upgrading From 1.8 to 1.8.x _________________________________________________________ 72 5.2.1 Backup your DSpace _________________________________________________________ 73 5.2.2 Upgrade Steps ______________________________________________________________ 74 5.3 Upgrading From 1.7.x to 1.8.x ________________________________________________________ 75 5.3.1 Backup your DSpace _________________________________________________________ 77 5.3.2 Upgrade Steps ______________________________________________________________ 78 5.4 Upgrading From 1.7 to 1.7.x _________________________________________________________ 82 5.4.1 Upgrade Steps ______________________________________________________________ 83 5.5 Upgrading From 1.6.x to 1.7.x ________________________________________________________ 84 5.5.1 Upgrade Steps ______________________________________________________________ 84 5.6 Upgrading From 1.6 to 1.6.x _________________________________________________________ 94 5.6.1 Upgrade Steps ______________________________________________________________ 95 5.7 Upgrading From 1.5.x to 1.6.x ________________________________________________________ 96 5.7.1 Upgrade Steps ______________________________________________________________ 97 5.8 Upgrading From 1.5 or 1.5.1 to 1.5.2 _________________________________________________ 110

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 3 of 735

DSpace 3.x Documentation 5.8.1 Upgrade Steps _____________________________________________________________ 110 5.9 Upgrading From 1.4.2 to 1.5 ________________________________________________________ 119 5.9.1 Upgrade Steps _____________________________________________________________ 119 5.10 Upgrading From 1.4.1 to 1.4.2 ______________________________________________________ 124 5.10.1 Upgrade Steps _____________________________________________________________ 124 5.11 Upgrading From 1.4 to 1.4.x ________________________________________________________ 124 5.11.1 Upgrade Steps _____________________________________________________________ 124 5.12 Upgrading From 1.3.2 to 1.4.x _______________________________________________________ 126 5.12.1 Upgrade Steps _____________________________________________________________ 126 5.13 Upgrading From 1.3.1 to 1.3.2 ______________________________________________________ 129 5.13.1 Upgrade Steps _____________________________________________________________ 129 5.14 Upgrading From 1.2.x to 1.3.x _______________________________________________________ 130 5.14.1 Upgrade Steps _____________________________________________________________ 130 5.15 Upgrading From 1.2.1 to 1.2.2 ______________________________________________________ 131 5.15.1 Upgrade Steps _____________________________________________________________ 132 5.16 Upgrading From 1.2 to 1.2.1 ________________________________________________________ 133 5.16.1 Upgrade Steps _____________________________________________________________ 133 5.17 Upgrading From 1.1.x to 1.2 ________________________________________________________ 135 5.17.1 Upgrade Steps _____________________________________________________________ 135 5.18 Upgrading From 1.1 to 1.1.1 ________________________________________________________ 138 5.18.1 Upgrade Steps _____________________________________________________________ 139 5.19 Upgrading From 1.0.1 to 1.1 ________________________________________________________ 139 5.19.1 Upgrade Steps _____________________________________________________________ 139 6 Configuration ________________________________________________________________________ 143 6.1 General Configuration _____________________________________________________________ 145 6.1.1 Input Conventions __________________________________________________________ 145 6.1.2 Update Reminder ___________________________________________________________ 146 6.2 The build.properties Configuration Properties File _______________________________________ 147 6.3 The dspace.cfg Configuration Properties File ___________________________________________ 148 6.3.1 Main DSpace Configurations __________________________________________________ 148 6.3.2 DSpace Database Configuration _______________________________________________ 149 6.3.3 DSpace Email Settings ______________________________________________________ 151 6.3.4 File Storage _______________________________________________________________ 154 6.3.5 SRB (Storage Resource Brokerage) File Storage __________________________________ 156 6.3.6 Logging Configuration _______________________________________________________ 158 6.3.7 Configuring Lucene Search Indexes ____________________________________________ 159 6.3.8 Handle Server Configuration __________________________________________________ 163 6.3.9 Delegation Administration : Authorization System Configuration _______________________ 164 6.3.10 Restricted Item Visibility Settings _______________________________________________ 169 6.3.11 Proxy Settings _____________________________________________________________ 170 6.3.12 Configuring Media Filters _____________________________________________________ 170 6.3.13 Crosswalk and Packager Plugin Settings ________________________________________ 172 6.3.14 Event System Configuration ___________________________________________________ 177

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 4 of 735

DSpace 3.x Documentation 6.3.15 Embargo __________________________________________________________________ 180 6.3.16 Checksum Checker Settings __________________________________________________ 181 6.3.17 Item Export and Download Settings _____________________________________________ 182 6.3.18 Subscription Emails _________________________________________________________ 183 6.3.19 Hiding Metadata ____________________________________________________________ 183 6.3.20 Settings for the Submission Process ____________________________________________ 184 6.3.21 Configuring Creative Commons License _________________________________________ 184 6.3.22 WEB User Interface Configurations _____________________________________________ 186 6.3.23 Browse Index Configuration ___________________________________________________ 190 6.3.24 Author (Multiple metadata value) Display ________________________________________ 196 6.3.25 Links to Other Browse Contexts ________________________________________________ 197 6.3.26 Recent Submissions ________________________________________________________ 198 6.3.27 Submission License Substitution Variables _______________________________________ 199 6.3.28 Syndication Feed (RSS) Settings _______________________________________________ 199 6.3.29 OpenSearch Support ________________________________________________________ 203 6.3.30 Content Inline Disposition Threshold ____________________________________________ 205 6.3.31 Multi-file HTML Document/Site Settings _________________________________________ 206 6.3.32 Sitemap Settings ___________________________________________________________ 206 6.3.33 Authority Control Settings ____________________________________________________ 207 6.3.34 JSPUI Upload File Settings ___________________________________________________ 208 6.3.35 JSP Web Interface (JSPUI) Settings ____________________________________________ 209 6.3.36 JSPUI Configuring Multilingual Support __________________________________________ 213 6.3.37 JSPUI Item Mapper _________________________________________________________ 215 6.3.38 Display of Group Membership _________________________________________________ 215 6.3.39 JSPUI / XMLUI SFX Server ___________________________________________________ 215 6.3.40 JSPUI Item Recommendation Setting ___________________________________________ 217 6.3.41 Controlled Vocabulary Settings ________________________________________________ 217 6.3.42 XMLUI Specific Configuration _________________________________________________ 219 6.4 Optional or Advanced Configuration Settings ___________________________________________ 223 6.4.1 The Metadata Format and Bitstream Format Registries _____________________________ 223 6.4.2 XPDF Filter ________________________________________________________________ 224 6.4.3 Configuring Usage Instrumentation Plugins _______________________________________ 227 6.5 Authentication Plugins _____________________________________________________________ 227 6.5.1 Stackable Authentication Method(s) ____________________________________________ 228 6.6 Batch Metadata Editing Configuration _________________________________________________ 251 6.7 Configurable Workflow ____________________________________________________________ 252 6.7.1 Introduction _______________________________________________________________ 253 6.7.2 Instructions for Enabling Configurable Reviewer Workflow in XMLUI ___________________ 253 6.7.3 Data Migration (Backwards compatibility) ________________________________________ 255 6.7.4 Configuration ______________________________________________________________ 256 6.7.5 Authorizations _____________________________________________________________ 262 6.7.6 Database _________________________________________________________________ 262 6.7.7 Additional workflow steps/actions and features ____________________________________ 264

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 5 of 735

DSpace 3.x Documentation 6.7.8 Known Issues ______________________________________________________________ 265 6.8 Discovery _______________________________________________________________________ 266 6.8.1 What is DSpace Discovery ____________________________________________________ 267 6.8.2 Discovery Features _________________________________________________________ 269 6.8.3 DSpace 1.8 Improvements ____________________________________________________ 269 6.8.4 DSpace 3 Improvements _____________________________________________________ 269 6.8.5 Enabling Discovery _________________________________________________________ 270 6.8.6 Configuration files __________________________________________________________ 275 6.8.7 General Discovery settings ( config/modules/discovery.cfg ) __________________________ 275 6.8.8 Modifying the Discovery User Interface ( config/spring/api/discovery.xml ) _______________ 277 6.8.9 Discovery Solr Index Maintenance ______________________________________________ 288 6.8.10 Advanced Solr Configuration __________________________________________________ 288 6.9 DSpace Service Manager __________________________________________________________ 289 6.9.1 Introduction _______________________________________________________________ 290 6.9.2 Configuration ______________________________________________________________ 290 6.9.3 Architectural Overview _______________________________________________________ 292 6.9.4 Tutorials __________________________________________________________________ 293 6.10 DSpace Statistics ________________________________________________________________ 293 6.10.1 What is exactly being logged ? ________________________________________________ 294 6.10.2 Web User Interface Elements _________________________________________________ 297 6.10.3 Architecture _______________________________________________________________ 299 6.10.4 Configuration settings for Statistics _____________________________________________ 299 6.10.5 Upgrade Process for Statistics _________________________________________________ 303 6.10.6 Statistics Administration ______________________________________________________ 303 6.10.7 Statistics differences between DSpace 1.7.x and 1.8.0 ______________________________ 304 6.10.8 Statistics differences between DSpace 1.6.x and 1.7.0 ______________________________ 304 6.10.9 Web UI Statistics Modification (XMLUI Only) ______________________________________ 305 6.10.10Custom Reporting - Querying SOLR Directly _____________________________________ 305 6.10.11Manually Installing/Updating GeoLite Database File ________________________________ 306 6.11 Elastic Search Usage Statistics ______________________________________________________ 307 6.11.1 What data is being recorded? _________________________________________________ 307 6.11.2 Enabling Elastic Search Statistics ______________________________________________ 308 6.11.3 Importing Legacy Data into Elastic Search Statistics ________________________________ 309 6.11.4 Viewing Data in Elastic Search Statistics _________________________________________ 310 6.12 Embargo _______________________________________________________________________ 311 6.12.1 What is an Embargo? ________________________________________________________ 312 6.12.2 DSpace 3.0 XMLUI Embargo Functionality _______________________________________ 312 6.12.3 Configuring and using Embargo's in DSpace 3.0 __________________________________ 313 6.12.4 Technical Specifications ______________________________________________________ 324 6.12.5 Pre-DSpace 3.0 Embargo ____________________________________________________ 326 6.13 Google Scholar Metadata Mappings __________________________________________________ 330 6.14 Item Level Versioning _____________________________________________________________ 331 6.14.1 What is Item Level Versioning? ________________________________________________ 332

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 6 of 735

DSpace 3.x Documentation 6.14.2 Enabling Item Level Versioning ________________________________________________ 332 6.14.3 Initial Requirements _________________________________________________________ 333 6.14.4 User Interface ______________________________________________________________ 334 6.14.5 Architecture _______________________________________________________________ 335 6.14.6 Configuration ______________________________________________________________ 341 6.14.7 Identified Challenges & Known Issues in DSpace 3.0 _______________________________ 342 6.15 OAI ___________________________________________________________________________ 343 6.15.1 OAI Interfaces _____________________________________________________________ 343 6.15.2 OAI 2.0 Server _____________________________________________________________ 349 6.15.3 OAI-PMH Data Provider 2.0 (Internals) __________________________________________ 359 6.16 SWORDv1 Client _________________________________________________________________ 362 6.16.1 Enabling the SWORD Client __________________________________________________ 362 6.16.2 Configuring the SWORD Client ________________________________________________ 363 6.17 SWORDv1 Server ________________________________________________________________ 363 6.17.1 Enabling SWORD Server _____________________________________________________ 364 6.17.2 Configuring SWORD Server __________________________________________________ 364 6.18 SWORDv2 Server ________________________________________________________________ 369 6.18.1 Enabling SWORD v2 Server __________________________________________________ 370 6.18.2 Configuring SWORD v2 Server ________________________________________________ 370 7 JSPUI Configuration and Customization ___________________________________________________ 378 7.1 Configuration ____________________________________________________________________ 378 7.2 Customizing the JSP pages ________________________________________________________ 378 8 XMLUI Configuration and Customization __________________________________________________ 380 8.1 Overview of XMLUI / Manakin _______________________________________________________ 380 8.1.1 Understanding the Flow of an XMLUI Request ____________________________________ 381 8.2 Manakin Configuration Property Keys _________________________________________________ 383 8.3 Configuring Themes and Aspects ____________________________________________________ 387 8.3.1 Aspects __________________________________________________________________ 387 8.3.2 Themes __________________________________________________________________ 388 8.4 Multilingual Support _______________________________________________________________ 389 8.5 Creating a New Theme ____________________________________________________________ 390 8.6 Customizing the News Document ____________________________________________________ 391 8.7 Adding Static Content _____________________________________________________________ 392 8.8 Harvesting Items from XMLUI via OAI-ORE or OAI-PMH __________________________________ 393 8.8.1 Automatic Harvesting (Scheduler) ______________________________________________ 395 8.9 Additional XMLUI Learning Resources ________________________________________________ 395 8.10 Mirage Configuration and Customization ______________________________________________ 395 8.10.1 Introduction _______________________________________________________________ 396 8.10.2 Configuration Parameters ____________________________________________________ 396 8.10.3 Technical Features __________________________________________________________ 397 8.10.4 Troubleshooting ____________________________________________________________ 399 8.11 XMLUI Base Theme Templates (dri2xhtml) ____________________________________________ 399 8.11.1 dri2xhtml __________________________________________________________________ 400

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 7 of 735

DSpace 3.x Documentation 8.11.2 dri2xhtml-alt _______________________________________________________________ 400 8.12 DRI Schema Reference ___________________________________________________________ 402 8.12.1 Introduction _______________________________________________________________ 403 8.12.2 DRI in Manakin _____________________________________________________________ 405 8.12.3 Common Design Patterns ____________________________________________________ 406 8.12.4 Schema Overview __________________________________________________________ 407 8.12.5 Merging of DRI Documents ___________________________________________________ 409 8.12.6 Version Changes ___________________________________________________________ 410 8.12.7 Element Reference _________________________________________________________ 411 9 Advanced Customisation _______________________________________________________________ 449 9.1 Additions module _________________________________________________________________ 449 9.2 Maven WAR Overlays _____________________________________________________________ 449 9.3 DSpace Source Release ___________________________________________________________ 449 10 System Administration _________________________________________________________________ 451 10.1 AIP Backup and Restore ___________________________________________________________ 451 10.1.1 Background & Overview ______________________________________________________ 452 10.1.2 Makeup and Definition of AIPs _________________________________________________ 456 10.1.3 Running the Code __________________________________________________________ 457 10.1.4 Additional Packager Options __________________________________________________ 469 10.1.5 Configuration in 'dspace.cfg' __________________________________________________ 474 10.1.6 Common Issues or Error Messages _____________________________________________ 477 10.1.7 DSpace AIP Format _________________________________________________________ 478 10.2 Batch Metadata Editing ____________________________________________________________ 499 10.2.1 Batch Metadata Editing Tool __________________________________________________ 499 10.3 Curation System _________________________________________________________________ 504 10.3.1 Changes in 1.8 _____________________________________________________________ 505 10.3.2 Tasks ____________________________________________________________________ 505 10.3.3 Activation _________________________________________________________________ 506 10.3.4 Writing your own tasks _______________________________________________________ 506 10.3.5 Task Invocation ____________________________________________________________ 507 10.3.6 Asynchronous (Deferred) Operation ____________________________________________ 511 10.3.7 Task Output and Reporting ___________________________________________________ 511 10.3.8 Task Properties ____________________________________________________________ 512 10.3.9 Task Annotations ___________________________________________________________ 514 10.3.10Scripted Tasks _____________________________________________________________ 515 10.3.11Starter Tasks ______________________________________________________________ 516 10.4 Importing and Exporting Content via Packages _________________________________________ 521 10.4.1 Package Importer and Exporter ________________________________________________ 522 10.5 Importing and Exporting Items via Simple Archive Format _________________________________ 528 10.5.1 Item Importer and Exporter ___________________________________________________ 529 10.6 Importing Items via basic bibliographic formats (Endnote, BibTex, RIS, TSV, CSV) _____________ 535 10.6.1 About the Biblio-Transformation-Engine (BTE) ____________________________________ 535 10.7 Importing Community and Collection Hierarchy _________________________________________ 539

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 8 of 735

DSpace 3.x Documentation 10.7.1 Community and Collection Structure Importer _____________________________________ 540 10.8 Managing Community Hierarchy _____________________________________________________ 542 10.8.1 Sub-Community Management _________________________________________________ 542 10.9 Managing Embargoed Content ______________________________________________________ 543 10.9.1 Embargo Lifter _____________________________________________________________ 543 10.10Managing Usage Statistics _________________________________________________________ 544 10.10.1DSpace Log Converter ______________________________________________________ 544 10.10.2Filtering and Pruning Spiders _________________________________________________ 546 10.10.3Routine Solr Index Maintenance _______________________________________________ 547 10.10.4Solr Sharding By Year _______________________________________________________ 547 10.11Moving Items ___________________________________________________________________ 548 10.11.1Moving Items via Web UI ____________________________________________________ 548 10.11.2Moving Items via the Batch Metadata Editor ______________________________________ 549 10.12Performance Tuning DSpace _______________________________________________________ 549 10.12.1Give Tomcat (DSpace UIs) More Memory _______________________________________ 549 10.12.2Give the Command Line Tools More Memory _____________________________________ 551 10.12.3Give PostgreSQL Database More Memory _______________________________________ 552 10.12.4SOLR Statistics Performance Tuning ___________________________________________ 553 10.13Registering (not Importing) Bitstreams via Simple Archive Format __________________________ 553 10.13.1Overview _________________________________________________________________ 553 10.14ReIndexing Content (for Browse or Search) ___________________________________________ 556 10.14.1Overview _________________________________________________________________ 556 10.14.2Creating the Browse & Search Indexes _________________________________________ 556 10.14.3Running the Indexing Programs _______________________________________________ 557 10.14.4Indexing Customization ______________________________________________________ 558 10.15Testing Database Connection ______________________________________________________ 559 10.15.1Test Database _____________________________________________________________ 560 10.16Transferring or Copying Content Between Repositories __________________________________ 560 10.16.1Transferring Content via Export and Import ______________________________________ 560 10.16.2Transferring Items using Simple Archive Format __________________________________ 560 10.16.3Transferring Items using OAI-ORE/OAI-PMH Harvester ____________________________ 561 10.16.4Copying Items using the SWORD Client _________________________________________ 561 10.17Transforming DSpace Content (MediaFilters) __________________________________________ 561 10.17.1MediaFilters: Transforming DSpace Content _____________________________________ 562 10.18Updating Items via Simple Archive Format ____________________________________________ 567 10.18.1Item Update Tool ___________________________________________________________ 567 10.19Validating CheckSums of Bitstreams _________________________________________________ 570 10.19.1Checksum Checker _________________________________________________________ 570 11 Directories and Files __________________________________________________________________ 575 11.1 Overview _______________________________________________________________________ 575 11.2 Source Directory Layout ___________________________________________________________ 575 11.3 Installed Directory Layout __________________________________________________________ 577 11.4 Contents of JSPUI Web Application __________________________________________________ 577

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 9 of 735

DSpace 3.x Documentation 11.5 Contents of XMLUI Web Application (aka Manakin) ______________________________________ 578 11.6 Log Files _______________________________________________________________________ 578 11.6.1 log4j.properties File. _________________________________________________________ 580 12 Architecture _________________________________________________________________________ 582 12.1 Overview _______________________________________________________________________ 582 12.1.1 DSpace System Architecture __________________________________________________ 582 12.2 Application Layer _________________________________________________________________ 584 12.2.1 Web User Interface _________________________________________________________ 584 12.2.2 OAI-PMH Data Provider ______________________________________________________ 594 12.2.3 DSpace Command Launcher __________________________________________________ 595 12.3 Business Logic Layer _____________________________________________________________ 596 12.3.1 Core Classes ______________________________________________________________ 597 12.3.2 Content Management API ____________________________________________________ 601 12.3.3 Plugin Manager ____________________________________________________________ 606 12.3.4 Workflow System ___________________________________________________________ 616 12.3.5 Administration Toolkit ________________________________________________________ 617 12.3.6 E-person/Group Manager ____________________________________________________ 617 12.3.7 Authorization ______________________________________________________________ 618 12.3.8 Handle Manager/Handle Plugin ________________________________________________ 620 12.3.9 Search ___________________________________________________________________ 621 12.3.10Browse API _______________________________________________________________ 622 12.3.11Checksum checker _________________________________________________________ 626 12.3.12OpenSearch Support ________________________________________________________ 626 12.3.13Embargo Support __________________________________________________________ 627 12.4 DSpace Services Framework _______________________________________________________ 629 12.4.1 Architectural Overview _______________________________________________________ 630 12.4.2 Basic Usage _______________________________________________________________ 632 12.4.3 Providers and Plugins _______________________________________________________ 633 12.4.4 Core Services ______________________________________________________________ 634 12.4.5 Examples _________________________________________________________________ 635 12.4.6 Tutorials __________________________________________________________________ 636 12.5 Storage Layer ___________________________________________________________________ 636 12.5.1 RDBMS / Database Structure _________________________________________________ 636 12.5.2 Bitstream Store ____________________________________________________________ 639 13 Submission User Interface _____________________________________________________________ 645 13.1 Understanding the Submission Configuration File _______________________________________ 646 13.1.1 The Structure of item-submission.xml ___________________________________________ 646 13.1.2 Defining Steps ( ) within the item-submission.xml ____________________________ 647 13.2 Reordering/Removing Submission Steps ______________________________________________ 649 13.3 Assigning a custom Submission Process to a Collection __________________________________ 650 13.3.1 Getting A Collection's Handle _________________________________________________ 651 13.4 Custom Metadata-entry Pages for Submission __________________________________________ 651 13.4.1 Introduction _______________________________________________________________ 651

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 10 of 735

DSpace 3.x Documentation 13.4.2 Describing Custom Metadata Forms ____________________________________________ 651 13.4.3 The Structure of input-forms.xml _______________________________________________ 652 13.4.4 Deploying Your Custom Forms ________________________________________________ 658 13.5 Configuring the File Upload step _____________________________________________________ 659 13.6 Creating new Submission Steps _____________________________________________________ 659 13.6.1 Creating a Non-Interactive Step ________________________________________________ 660 14 Appendices _________________________________________________________________________ 661 14.1 Appendix A _____________________________________________________________________ 661 14.1.1 Default Dublin Core Metadata Registry __________________________________________ 661 14.1.2 Default Bitstream Format Registry ______________________________________________ 664 15 History _____________________________________________________________________________ 666 15.1 Changes in DSpace 3.0 ___________________________________________________________ 666 15.1.1 New Features ______________________________________________________________ 666 15.1.2 General Improvements _______________________________________________________ 667 15.1.3 Bug Fixes _________________________________________________________________ 670 15.2 Changes in DSpace 1.8.2 __________________________________________________________ 675 15.2.1 General Improvements _______________________________________________________ 675 15.2.2 Bug Fixes _________________________________________________________________ 675 15.3 Changes in DSpace 1.8.1 __________________________________________________________ 676 15.3.1 General Improvements _______________________________________________________ 676 15.3.2 Bug Fixes _________________________________________________________________ 676 15.4 Changes in DSpace 1.8.0 __________________________________________________________ 677 15.4.1 New Features ______________________________________________________________ 677 15.4.2 General Improvements _______________________________________________________ 678 15.4.3 Bug Fixes _________________________________________________________________ 681 15.5 Changes in DSpace 1.7.2 __________________________________________________________ 686 15.5.1 Bug Fixes _________________________________________________________________ 686 15.6 Changes in DSpace 1.7.1 __________________________________________________________ 686 15.6.1 General Improvements _______________________________________________________ 687 15.6.2 Bug Fixes _________________________________________________________________ 687 15.7 Changes in DSpace 1.7.0 __________________________________________________________ 689 15.7.1 New Features ______________________________________________________________ 689 15.7.2 General Improvements _______________________________________________________ 690 15.7.3 Bug Fixes _________________________________________________________________ 693 15.8 Changes in DSpace 1.6.2 __________________________________________________________ 698 15.8.1 General Improvements _______________________________________________________ 698 15.8.2 Bug Fixes _________________________________________________________________ 699 15.9 Changes in DSpace 1.6.1 __________________________________________________________ 699 15.9.1 General Improvements _______________________________________________________ 699 15.9.2 Bug Fixes _________________________________________________________________ 700 15.10Changes in DSpace 1.6.0 _________________________________________________________ 702 15.10.1New Features _____________________________________________________________ 702 15.10.2General Improvements ______________________________________________________ 704

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 11 of 735

DSpace 3.x Documentation General Improvements _____________________________________________________ 704 15.10.3Bug Fixes ________________________________________________________________ 706 15.11Changes in DSpace 1.5.2 _________________________________________________________ 712 15.11.1New Features _____________________________________________________________ 712 15.11.2General Improvements ______________________________________________________ 712 15.11.3Bug Fixes ________________________________________________________________ 715 15.12Changes in DSpace 1.5.1 _________________________________________________________ 720 15.12.1General Improvements and Bug Fixes __________________________________________ 720 15.13Changes in DSpace 1.5 ___________________________________________________________ 722 15.13.1General Improvements ______________________________________________________ 722 15.13.2Bug fixes and smaller patches ________________________________________________ 722 15.14Changes in DSpace 1.4.1 _________________________________________________________ 723 15.14.1General Improvements ______________________________________________________ 723 15.14.2Bug fixes _________________________________________________________________ 724 15.15Changes in DSpace 1.4 ___________________________________________________________ 725 15.15.1General Improvements ______________________________________________________ 725 15.15.2Bug fixes _________________________________________________________________ 726 15.16Changes in DSpace 1.3.2 _________________________________________________________ 726 15.16.1General Improvements ______________________________________________________ 726 15.16.2Bug fixes _________________________________________________________________ 726 15.17Changes in DSpace 1.3.1 _________________________________________________________ 726 15.17.1Bug fixes _________________________________________________________________ 727 15.18Changes in DSpace 1.3 ___________________________________________________________ 727 15.18.1General Improvements ______________________________________________________ 727 15.18.2Bug fixes _________________________________________________________________ 727 15.19Changes in DSpace 1.2.2 _________________________________________________________ 728 15.19.1General Improvements ______________________________________________________ 728 15.19.2Bug fixes _________________________________________________________________ 728 15.19.3Changes in JSPs ___________________________________________________________ 728 15.20Changes in DSpace 1.2.1 _________________________________________________________ 729 15.20.1General Improvements ______________________________________________________ 729 15.20.2Bug fixes _________________________________________________________________ 729 15.20.3Changed JSPs ____________________________________________________________ 729 15.21Changes in DSpace 1.2 ___________________________________________________________ 730 15.21.1General Improvments _______________________________________________________ 730 15.21.2Administration _____________________________________________________________ 730 15.21.3Import/Export/OAI __________________________________________________________ 731 15.21.4Miscellaneous _____________________________________________________________ 731 15.21.5JSP file changes between 1.1 and 1.2 __________________________________________ 731 15.22Changes in DSpace 1.1.1 _________________________________________________________ 733 15.22.1Bug fixes _________________________________________________________________ 733 15.22.2Improvements _____________________________________________________________ 734 15.23Changes in DSpace 1.1 ___________________________________________________________ 734

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 12 of 735

DSpace 3.x Documentation

1 PrefaceOnline Version of Documentation also available This documentation was produced with Confluence software. A PDF version was generated directly from Confluence. An online, updated version of this 3.0 Documentation is also available at: https://wiki.duraspace.org/display/DSDOC3x

1.1 Release NotesWelcome to Release 3.0. The developers have volunteered many hours to fix, re-write and contribute new software code for this release. Documentation has also been updated. The following is a list of the new features included for release 3.0 (not an exhaustive list): DSpace 3.0 ships with a number of new features. Certain features are automatically enabled by default while others require deliberate activation. The following non-exhaustive list contains the major new features in 3.0 that are enabled by default: Completely rewritten OAI-PMH Interface (see page 349) Driver and Open-AIRE compatible Allows for multiple contexts (URL endpoints), each with a different configuration 12 default metadata export formats and easy way to write new ones using XSLT Runs on Solr for great performance, legacy mode over DSpace database supported Even faster thanks to caching Kindly contributed by Lyncode Improvements to Solr-based Statistics (see page 293) Workflow statistics Search Query statistics Solr version upgrade and performance optimization Kindly contributed by @mire

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 13 of 735

DSpace 3.x Documentation

Batch import for Bibliographic formats (see page 535) Support for Endnote, BibTex, RIS, TSV, CSV Enhanced batch import routines Kindly contributed by the Greek National Documentation Centre/EKT Controlled Vocabulary Support for XMLUI (see page 656) Submission form vocabulary lookup Includes The Norwegian Science Index and the Swedish Research Subject Categories Kindly contributed by @mire's Kevin Van de Velde Google Analytics support for JSPUI support for statistics collection by entering the GA key into dspace.cfg Kindly contributed by Denys Slipetskyy Improvements to Authentication by Password (see page 229) now stores salted hashes old passwords will continue to work and will be automatically converted to salted hashes on next user login Kindly contributed by Mark H. Wood with the support of IUPUI University Library The following list contains all features that are included in the DSpace 3.0 release, but need to be enabled manually. Review the documentation for these features carefully, especially if you are upgrading from an older version of DSpace. Discovery: Search & Browse (see page 266) Enhancements for XMLUI: Search Snippets Hit Highlighting Related Items Hiding restricted results Kindly contributed by @mire with the support of the World Bank Discovery is now supported (see page 272) in JSPUI (example) Kindly contributed by CILEA

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 14 of 735

DSpace 3.x Documentation

Item Level Versioning (see page 331) Create and preserve different item versions Enhanced identifiers XMLUI only Kindly contributed by @mire with the support of MBLWHOI Library and Dryad Advanced Embargo (see page 311) Time based restrictions on both bitstreams and metadata Advanced mode for additional user group restrictions XMLUI only Kindly contributed by @mire with the support of the University of Michigan Libraries Mobile Theme for XMLUI (beta) Documentation Kindly contributed by Elias Tzoc and James Russell with the support of Miami University

Type-based submissions (see page 655) Show or hide metadata fields in the submission forms, based on the type of content submitted Kindly contributed by Nestor Oviedo and SeDiCI ElasticSearch-based Usage Statistics (see page 307) scalable ElasticSearch backend, runs on embedded node by default uses Google Chart API for graphs and maps export to CSV available displaying can be either public or restricted Kindly contributed by Peter Dietz with the support of Ohio State University Libraries

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 15 of 735

DSpace 3.x Documentation

Improvements to LDAP Authentication (see page 244) LDAPHierarchicalAuthentication superseded by LDAPAuthentication, see Enabling Hierarchical LDAP Authentication (see page 244) New option to map LDAP group membership to internal DSpace groups (see page 247) Kindly contributed by Samuel Ottenhoff A full list of all changes / bug fixes in 3.0 is available in the History (see page 666) section. The following individuals have contributed directly to this release of DSpace: Linna R. Agne, Jacob Andersson, Andrea Bollini, Jos Carvalho, David Chandek-Stark, Peter Dietz, Mark Diggory, Tim Donohue, Sands Fish, Brian Freels-Stendel, lex Magaz Graa, Bo Gundersen, Bill Hays, Ivan Masr, Onivaldo Rosa Junior, Claudia Jrgen, Artur Konczak, Dirk Leinders, Alex Lemann, Ariel J. Lira, Emilio Lorenzo, Bram Luyten, Joo Melo, Samuel Ottenhoff, Nestor Oviedo, Christina Paschou, Scott Phillips, Hardy Pottinger, James Russell, Andrea Schweer, Jonathon Scott, Milton Shintaku, Denys Slipetskyy, Kostas Stamatis, Rania Stathopoulou, Keiji Suzuki, Steve Swinsburg, Robin Taylor, Elias Tzoc, Kevin Van de Velde, Jennifer Whalan, Jennifer Whitney, and Mark H. Wood. Many of them could not do this work without the support (release time and financial) of their associated institutions. We offer thanks to those institutions for supporting their staff to take time to contribute to the DSpace project. A big thank you also goes out to the DSpace Community Advisory Team (DCAT), who helped the developers to prioritize and plan out several of the new features that made it into this release. The current DCAT members include: Amy Lana, Augustine Gitonga, Bram Luyten, Ciarn Walsh, Claire Bundy, Dibyendra Hyoju, Elena Feinstein, Elin Stangeland, Iryna Kuchma, Jim Ottaviani, Leonie Hayes, Maureen Walsh, Michael Guthrie, Sarah Molloy, Sarah Shreeves, Sue Kunda, Valorie Hollister and Yan Han. We apologize to any contributor accidentally left off this list. DSpace has such a large, active development community that we sometimes lose track of all our contributors. Our ongoing list of all known people/institutions that have contributed to DSpace software can be found on our DSpace Contributors page. Acknowledgments to those left off will be made in future releases. Want to see your name appear in our list of contributors? All you have to do is report an issue, fix a bug, improve our documentation or help us determine the necessary requirements for a new feature! Visit our Issue Tracker to report a bug, or join dspace-devel mailing list to take part in development work. If you'd like to help improve our current documentation, please get in touch with one of our Committers with your ideas. You don't even need to be a developer! Repository managers can also get involved by volunteering to join the DSpace Community Advisory Team and helping our developers to plan new features. The Release Team consisted of Sands Fish, Ivan Masr, Hardy Pottinger, and Robin Taylor. Additional thanks to Tim Donohue from DuraSpace for keeping all of us focused on the work at hand, for calming us when we got excited, and for the general support for the DSpace project.

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 16 of 735

DSpace 3.x Documentation

2 IntroductionDSpace is an open source software platform that enables organisations to: capture and describe digital material using a submission workflow module, or a variety of programmatic ingest options distribute an organisation's digital assets over the web through a search and retrieval system preserve digital assets over the long term This system documentation includes a functional overview of the system (see page 19), which is a good introduction to the capabilities of the system, and should be readable by non-technical folk. Everyone should read this section first because it introduces some terminology used throughout the rest of the documentation. For people actually running a DSpace service, there is an installation guide (see page 39), and sections on configuration (see page 143) and the directory structure (see page 575). Finally, for those interested in the details of how DSpace works, and those potentially interested in modifying the code for their own purposes, there is a detailed architecture and design section (see page 582). Other good sources of information are: The DSpace Public API Javadocs. Build these with the command mvn javadoc:javadoc The DSpace Wiki contains stacks of useful information about the DSpace platform and the work people are doing with it. You are strongly encouraged to visit this site and add information about your own work. Useful Wiki areas are: A list of DSpace resources (Web sites, mailing lists etc.) Technical FAQ A list of projects using DSpace Guidelines for contributing back to DSpace www.dspace.org has announcements and contains useful information about bringing up an instance of DSpace at your organization. The DSpace General List. Join DSpace-General to ask questions or join discussions about non-technical aspects of building and running a DSpace service. It is open to all DSpace users. Ask questions, share news, and spark discussion about DSpace with people managing other DSpace sites. Watch DSpace-General for news of software releases, user conferences, and announcements from the DSpace Federation. The DSpace Technical List. DSpace developers help answer installation and technology questions, share information and help each other solve technical problems through the DSpace-Tech mailing list. Post questions or contribute your expertise to other developers working with the system.

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 17 of 735

DSpace 3.x Documentation The DSpace Development List. Join Discussions among DSpace Developers. The DSpace-Devel listserv is for DSpace developers working on the DSpace platform to share ideas and discuss code changes to the open source platform. Join other developers to shape the evolution of the DSpace software. The DSpace community depends on its members to frame functional requirements and high-level architecture, and to facilitate programming, testing, documentation and to the project.

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 18 of 735

DSpace 3.x Documentation

3 Functional OverviewThe following sections describe the various functional aspects of the DSpace system. Data Model (see page 20) Plugin Manager (see page 22) Metadata (see page 22) Packager Plugins (see page 23) Crosswalk Plugins (see page 24) E-People and Groups (see page 24) E-Person (see page 24) Groups (see page 25) Authentication (see page 25) Authorization (see page 25) Ingest Process and Workflow (see page 27) Workflow Steps (see page 28) Supervision and Collaboration (see page 29) Handles (see page 29) Bitstream 'Persistent' Identifiers (see page 30) Storage Resource Broker (SRB) Support (see page 31) Search and Browse (see page 31) HTML Support (see page 32) OAI Support (see page 33) SWORD Support (see page 33) OpenURL Support (see page 33) Creative Commons Support (see page 34) Subscriptions (see page 34) Import and Export (see page 34) Registration (see page 34) Statistics (see page 35) System Statistics (see page 35) Item, Collection and Community Usage Statistics (see page 35) Checksum Checker (see page 36) Usage Instrumentation (see page 36) Choice Management and Authority Control (see page 36) Introduction and Motivation (see page 37)

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 19 of 735

DSpace 3.x Documentation

Definitions (see page 37) About Authority Control (see page 37) Some Terminology (see page 38)

3.1 Data Model

Data Model Diagram The way data is organized in DSpace is intended to reflect the structure of the organization using the DSpace system. Each DSpace site is divided into communities, which can be further divided into sub-communities reflecting the typical university structure of college, department, research center, or laboratory.

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 20 of 735

DSpace 3.x Documentation Communities contain collections, which are groupings of related content. A collection may appear in more than one community. Each collection is composed of items, which are the basic archival elements of the archive. Each item is owned by one collection. Additionally, an item may appear in additional collections; however every item has one and only one owning collection. Items are further subdivided into named bundles of bitstreams. Bitstreams are, as the name suggests, streams of bits, usually ordinary computer files. Bitstreams that are somehow closely related, for example HTML files and images that compose a single HTML document, are organized into bundles. In practice, most items tend to have these named bundles:

ORIGINAL the bundle with the original, deposited bitstreams THUMBNAILS thumbnails of any image bitstreams TEXT extracted full-text from bitstreams in ORIGINAL, for indexing LICENSE contains the deposit license that the submitter granted the host organization; in other words, specifies the rights that the hosting organization have CC_LICENSE contains the distribution license, if any (a Creative Commons license) associated with the item. This license specifies what end users downloading the content can do with the contentEach bitstream is associated with one Bitstream Format. Because preservation services may be an important aspect of the DSpace service, it is important to capture the specific formats of files that users submit. In DSpace, a bitstream format is a unique and consistent way to refer to a particular file format. An integral part of a bitstream format is an either implicit or explicit notion of how material in that format can be interpreted. For example, the interpretation for bitstreams encoded in the JPEG standard for still image compression is defined explicitly in the Standard ISO/IEC 10918-1. The interpretation of bitstreams in Microsoft Word 2000 format is defined implicitly, through reference to the Microsoft Word 2000 application. Bitstream formats can be more specific than MIME types or file suffixes. For example, application/ms-word and .doc span multiple versions of the Microsoft Word application, each of which produces bitstreams with presumably different characteristics. Each bitstream format additionally has a support level, indicating how well the hosting institution is likely to be able to preserve content in the format in the future. There are three possible support levels that bitstream formats may be assigned by the hosting institution. The host institution should determine the exact meaning of each support level, after careful consideration of costs and requirements. MIT Libraries' interpretation is shown below: Supported The format is recognized, and the hosting institution is confident it can make bitstreams of this format usable in the future, using whatever combination of techniques (such as migration, emulation, etc.) is appropriate given the context of need. Known The format is recognized, and the hosting institution will promise to preserve the bitstream as-is, and allow it to be retrieved. The hosting institution will attempt to obtain enough information to enable the format to be upgraded to the 'supported' level.

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 21 of 735

DSpace 3.x Documentation

Unsupported The format is unrecognized, but the hosting institution will undertake to preserve the bitstream as-is and allow it to be retrieved. Each item has one qualified Dublin Core metadata record. Other metadata might be stored in an item as a serialized bitstream, but we store Dublin Core for every item for interoperability and ease of discovery. The Dublin Core may be entered by end-users as they submit content, or it might be derived from other metadata as part of an ingest process. Items can be removed from DSpace in one of two ways: They may be 'withdrawn', which means they remain in the archive but are completely hidden from view. In this case, if an end-user attempts to access the withdrawn item, they are presented with a 'tombstone,' that indicates the item has been removed. For whatever reason, an item may also be 'expunged' if necessary, in which case all traces of it are removed from the archive. Object Community Collection Item Bundle Bitstream Example Laboratory of Computer Science; Oceanographic Research Center LCS Technical Reports; ORC Statistical Data Sets A technical report; a data set with accompanying description; a video recording of a lecture A group of HTML and image bitstreams making up an HTML document A single HTML file; a single image file; a source code file

Bitstream Format Microsoft Word version 6.0; JPEG encoded image format

3.2 Plugin ManagerThe PluginManager is a very simple component container. It creates and organizes components (plugins), and helps select a plugin in the cases where there are many possible choices. It also gives some limited control over the lifecycle of a plugin. A plugin is defined by a Java interface. The consumer of a plugin asks for its plugin by interface. A Plugin is an instance of any class that implements the plugin interface. It is interchangeable with other implementations, so that any of them may be "plugged in". The mediafilter is a simple example of a plugin implementation. Refer to the Business Logic Layer (see page 596) for more details on Plugins.

3.3 MetadataBroadly speaking, DSpace holds three sorts of metadata about archived content:

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 22 of 735

DSpace 3.x Documentation Descriptive Metadata: DSpace can support multiple flat metadata schemas for describing an item. A qualified Dublin Core metadata schema loosely based on the Library Application Profile set of elements and qualifiers is provided by default. The set of elements and qualifiers used by MIT Libraries comes pre-configured with the DSpace source code. However, you can configure multiple schemas and select metadata fields from a mix of configured schemas to describe your items. Other descriptive metadata about items (e.g. metadata described in a hierarchical schema) may be held in serialized bitstreams.

Communities and collections have some simple descriptive metadata (a name, and some descriptive prose), held in the DBMS. Administrative Metadata: This includes preservation metadata, provenance and authorization policy data. Most of this is held within DSpace's relational DBMS schema. Provenance metadata (prose) is stored in Dublin Core records. Additionally, some other administrative metadata (for example, bitstream byte sizes and MIME types) is replicated in Dublin Core records so that it is easily accessible outside of DSpace. Structural Metadata: This includes information about how to present an item, or bitstreams within an item, to an end-user, and the relationships between constituent parts of the item. As an example, consider a thesis consisting of a number of TIFF images, each depicting a single page of the thesis. Structural metadata would include the fact that each image is a single page, and the ordering of the TIFF images/pages. Structural metadata in DSpace is currently fairly basic; within an item, bitstreams can be arranged into separate bundles as described above. A bundle may also optionally have a primary bitstream. This is currently used by the HTML support to indicate which bitstream in the bundle is the first HTML file to send to a browser. In addition to some basic technical metadata, a bitstream also has a 'sequence ID' that uniquely identifies it within an item. This is used to produce a 'persistent' bitstream identifier for each bitstream. Additional structural metadata can be stored in serialized bitstreams, but DSpace does not currently understand this natively.

3.4 Packager PluginsPackagers are software modules that translate between DSpace Item objects and a self-contained external representation, or "package". A Package Ingester interprets, or ingests, the package and creates an Item. A Package Disseminator writes out the contents of an Item in the package format.A package is typically an archive file such as a Zip or "tar" file, including a manifest document which contains metadata and a description of the package contents. The IMS Content Package is a typical packaging standard. A package might also be a single document or media file that contains its own metadata, such as a PDF document with embedded descriptive metadata. Package ingesters and package disseminators are each a type of named plugin (see Plugin Manager (see page 22)), so it is easy to add new packagers specific to the needs of your site. You do not have to supply both an ingester and disseminator for each format; it is perfectly acceptable to just implement one of them. Most packager plugins call upon Crosswalk Plugins (see page 24) to translate the metadata between DSpace's object model and the package format.

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 23 of 735

DSpace 3.x Documentation More information about calling Packagers to ingest or disseminate content can be found in the Package Importer and Exporter (see page ) section of the System Administration documentation.

3.5 Crosswalk PluginsCrosswalks are software modules that translate between DSpace object metadata and a specific external representation. An Ingestion Crosswalk interprets the external format and crosswalks it to DSpace's internal data structure, while a Dissemination Crosswalk does the opposite.For example, a MODS ingestion crosswalk translates descriptive metadata from the MODS format to the metadata fields on a DSpace Item. A MODS dissemination crosswalk generates a MODS document from the metadata on a DSpace Item. Crosswalk plugins are named plugins (see Plugin Manager (see page 22)), so it is easy to add new crosswalks. You do not have to supply both an ingester and disseminator for each format; it is perfectly acceptable to just implement one of them. There is also a special pair of crosswalk plugins which use XSL stylesheets to translate the external metadata to or from an internal DSpace format. You can add and modify XSLT crosswalks simply by editing the DSpace configuration and the stylesheets, which are stored in files in the DSpace installation directory. The Packager plugins and OAH-PMH server make use of crosswalk plugins.

3.6 E-People and GroupsAlthough many of DSpace's functions such as document discovery and retrieval can be used anonymously, some features (and perhaps some documents) are only available to certain "privileged" users. E-People and Groups are the way DSpace identifies application users for the purpose of granting privileges. This identity is bound to a session of a DSpace application such as the Web UI or one of the command-line batch programs. Both E-People and Groups are granted privileges by the authorization system described below.

3.6.1 E-PersonDSpace holds the following information about each e-person: E-mail address First and last names Whether the user is able to log in to the system via the Web UI, and whether they must use an X509 certificate to do so; A password (encrypted), if appropriate A list of collections for which the e-person wishes to be notified of new items

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 24 of 735

DSpace 3.x Documentation Whether the e-person 'self-registered' with the system; that is, whether the system created the e-person record automatically as a result of the end-user independently registering with the system, as opposed to the e-person record being generated from the institution's personnel database, for example. The network ID for the corresponding LDAP record, if LDAP authentication is used for this E-Person.

3.6.2 GroupsGroups are another kind of entity that can be granted permissions in the authorization system. A group is usually an explicit list of E-People; anyone identified as one of those E-People also gains the privileges granted to the group. However, an application session can be assigned membership in a group without being identified as an E-Person. For example, some sites use this feature to identify users of a local network so they can read restricted materials not open to the whole world. Sessions originating from the local network are given membership in the "LocalUsers" group and gain the corresponding privileges. Administrators can also use groups as "roles" to manage the granting of privileges more efficiently.

3.7 AuthenticationAuthentication is when an application session positively identifies itself as belonging to an E-Person and/or Group. In DSpace 1.4 and later, it is implemented by a mechanism called Stackable Authentication: the DSpace configuration declares a "stack" of authentication methods. An application (like the Web UI) calls on the Authentication Manager, which tries each of these methods in turn to identify the E-Person to which the session belongs, as well as any extra Groups. The E-Person authentication methods are tried in turn until one succeeds. Every authenticator in the stack is given a chance to assign extra Groups. This mechanism offers the following advantages:Separates authentication from the Web user interface so the same authentication methods are used for other applications such as non-interactive Web Services Improved modularity: The authentication methods are all independent of each other. Custom authentication methods can be "stacked" on top of the default DSpace username/password method. Cleaner support for "implicit" authentication where username is found in the environment of a Web request, e.g. in an X.509 client certificate.

3.8 Authorization

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 25 of 735

DSpace 3.x Documentation DSpace's authorization system is based on associating actions with objects and the lists of EPeople who can perform them. The associations are called Resource Policies, and the lists of EPeople are called Groups. There are two built-in groups: 'Administrators', who can do anything in a site, and 'Anonymous', which is a list that contains all users. Assigning a policy for an action on an object to anonymous means giving everyone permission to do that action. (For example, most objects in DSpace sites have a policy of 'anonymous' READ.) Permissions must be explicit - lack of an explicit permission results in the default policy of 'deny'. Permissions also do not 'commute'; for example, if an e-person has READ permission on an item, they might not necessarily have READ permission on the bundles and bitstreams in that item. Currently Collections, Communities and Items are discoverable in the browse and search systems regardless of READ authorization. The following actions are possible: Collection ADD/REMOVE DEFAULT_ITEM_READ add or remove items (ADD = permission to submit items) inherited as READ by all submitted items

DEFAULT_BITSTREAM_READ inherited as READ by Bitstreams of all submitted items. Note: only affects Bitstreams of an item at the time it is initially submitted. If a Bitstream is added later, it does not get the same default read policy. COLLECTION_ADMIN collection admins can edit items in a collection, withdraw items, map other items into this collection. Item ADD/REMOVE add or remove bundles READ WRITE Bundle ADD/REMOVE add or remove bitstreams to a bundle Bitstream READ view bitstream can view item (item metadata is always viewable) can modify item

WRITE modify bitstream Note that there is no 'DELETE' action. In order to 'delete' an object (e.g. an item) from the archive, one must have REMOVE permission on all objects (in this case, collection) that contain it. The 'orphaned' item is automatically deleted.

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 26 of 735

DSpace 3.x Documentation Policies can apply to individual e-people or groups of e-people.

3.9 Ingest Process and WorkflowRather than being a single subsystem, ingesting is a process that spans several. Below is a simple illustration of the current ingesting process in DSpace.

DSpace Ingest Process The batch item importer is an application, which turns an external SIP (an XML metadata document with some content files) into an "in progress submission" object. The Web submission UI is similarly used by an end-user to assemble an "in progress submission" object. Depending on the policy of the collection to which the submission in targeted, a workflow process may be started. This typically allows one or more human reviewers or 'gatekeepers' to check over the submission and ensure it is suitable for inclusion in the collection. When the Batch Ingester or Web Submit UI completes the InProgressSubmission object, and invokes the next stage of ingest (be that workflow or item installation), a provenance message is added to the Dublin Core which includes the filenames and checksums of the content of the submission. Likewise, each time a workflow changes state (e.g. a reviewer accepts the submission), a similar provenance statement is added. This allows us to track how the item has changed since a user submitted it. Once any workflow process is successfully and positively completed, the InProgressSubmission object is consumed by an "item installer", that converts the InProgressSubmission into a fully blown archived item in DSpace. The item installer: Assigns an accession date Adds a "date.available" value to the Dublin Core metadata record of the item Adds an issue date if none already present

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 27 of 735

DSpace 3.x Documentation Adds a provenance message (including bitstream checksums) Assigns a Handle persistent identifier Adds the item to the target collection, and adds appropriate authorization policies Adds the new item to the search and browse index

3.9.1 Workflow StepsA collection's workflow can have up to three steps. Each collection may have an associated e-person group for performing each step; if no group is associated with a certain step, that step is skipped. If a collection has no e-person groups associated with any step, submissions to that collection are installed straight into the main archive. In other words, the sequence is this: The collection receives a submission. If the collection has a group assigned for workflow step 1, that step is invoked, and the group is notified. Otherwise, workflow step 1 is skipped. Likewise, workflow steps 2 and 3 are performed if and only if the collection has a group assigned to those steps. When a step is invoked, the submission is put into the 'task pool' of the step's associated group. One member of that group takes the task from the pool, and it is then removed from the task pool, to avoid the situation where several people in the group may be performing the same task without realizing it. The member of the group who has taken the task from the pool may then perform one of three actions: Workflow Possible actions Step 1 2 Can accept submission for inclusion, or reject submission. Can edit metadata provided by the user with the submission, but cannot change the submitted files. Can accept submission for inclusion, or reject submission. 3 Can edit metadata provided by the user with the submission, but cannot change the submitted files. Must then commit to archive; may not reject submission.

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 28 of 735

DSpace 3.x Documentation Submission Workflow in DSpace If a submission is rejected, the reason (entered by the workflow participant) is e-mailed to the submitter, and it is returned to the submitter's 'My DSpace' page. The submitter can then make any necessary modifications and re-submit, whereupon the process starts again. If a submission is 'accepted', it is passed to the next step in the workflow. If there are no more workflow steps with associated groups, the submission is installed in the main archive. One last possibility is that a workflow can be 'aborted' by a DSpace site administrator. This is accomplished using the administration UI. The reason for this apparently arbitrary design is that is was the simplest case that covered the needs of the early adopter communities at MIT. The functionality of the workflow system will no doubt be extended in the future.

3.10 Supervision and CollaborationIn order to facilitate, as a primary objective, the opportunity for thesis authors to be supervised in the preparation of their e-theses, a supervision order system exists to bind groups of other users (thesis supervisors) to an item in someone's pre-submission workspace. The bound group can have system policies associated with it that allow different levels of interaction with the student's item; a small set of default policy groups are provided: Full editorial control View item contents No policies Once the default set has been applied, a system administrator may modify them as they would any other policy set in DSpace This functionality could also be used in situations where researchers wish to collaborate on a particular submission, although there is no particular collaborative workspace functionality.

3.11 HandlesResearchers require a stable point of reference for their works. The simple evolution from sharing of citations to emailing of URLs broke when Web users learned that sites can disappear or be reconfigured without notice, and that their bookmark files containing critical links to research results couldn't be trusted in the long term. To help solve this problem, a core DSpace feature is the creation of a persistent identifier for every item, collection and community stored in DSpace. To persist identifiers, DSpace requires a storage- and location- independent mechanism for creating and maintaining identifiers. DSpace uses the CNRI Handle System for creating these identifiers. The rest of this section assumes a basic familiarity with the Handle system.

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 29 of 735

DSpace 3.x Documentation DSpace uses Handles primarily as a means of assigning globally unique identifiers to objects. Each site running DSpace needs to obtain a unique Handle 'prefix' from CNRI, so we know that if we create identifiers with that prefix, they won't clash with identifiers created elsewhere. Presently, Handles are assigned to communities, collections, and items. Bundles and bitstreams are not assigned Handles, since over time, the way in which an item is encoded as bits may change, in order to allow access with future technologies and devices. Older versions may be moved to off-line storage as a new standard becomes de facto. Since it's usually the item that is being preserved, rather than the particular bit encoding, it only makes sense to persistently identify and allow access to the item, and allow users to access the appropriate bit encoding from there. Of course, it may be that a particular bit encoding of a file is explicitly being preserved; in this case, the bitstream could be the only one in the item, and the item's Handle would then essentially refer just to that bitstream. The same bitstream can also be included in other items, and thus would be citable as part of a greater item, or individually. The Handle system also features a global resolution infrastructure; that is, an end-user can enter a Handle into any service (e.g. Web page) that can resolve Handles, and the end-user will be directed to the object (in the case of DSpace, community, collection or item) identified by that Handle. In order to take advantage of this feature of the Handle system, a DSpace site must also run a 'Handle server' that can accept and resolve incoming resolution requests. All the code for this is included in the DSpace source code bundle. Handles can be written in two forms:

hdl:1721.123/4567 http://hdl.handle.net/1721.123/4567

The above represent the same Handle. The first is possibly more convenient to use only as an identifier; however, by using the second form, any Web browser becomes capable of resolving Handles. An end-user need only access this form of the Handle as they would any other URL. It is possible to enable some browsers to resolve the first form of Handle as if they were standard URLs using CNRI's Handle Resolver plug-in, but since the first form can always be simply derived from the second, DSpace displays Handles in the second form, so that it is more useful for end-users. It is important to note that DSpace uses the CNRI Handle infrastructure only at the 'site' level. For example, in the above example, the DSpace site has been assigned the prefix '1721.123'. It is still the responsibility of the DSpace site to maintain the association between a full Handle (including the '4567' local part) and the community, collection or item in question.

3.12 Bitstream 'Persistent' Identifiers

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 30 of 735

DSpace 3.x Documentation Similar to handles for DSpace items, bitstreams also have 'Persistent' identifiers. They are more volatile than Handles, since if the content is moved to a different server or organization, they will no longer work (hence the quotes around 'persistent'). However, they are more easily persisted than the simple URLs based on database primary key previously used. This means that external systems can more reliably refer to specific bitstreams stored in a DSpace instance. Each bitstream has a sequence ID, unique within an item. This sequence ID is used to create a persistent ID, of the form:

dspace url/bitstream/handle/sequence ID/filenameFor example:

https://dspace.myu.edu/bitstream/123.456/789/24/foo.html

The above refers to the bitstream with sequence ID 24 in the item with the Handle hdl:123.456/789. The foo.html is really just there as a hint to browsers: Although DSpace will provide the appropriate MIME type, some browsers only function correctly if the file has an expected extension.

3.13 Storage Resource Broker (SRB) SupportDSpace offers two means for storing bitstreams. The first is in the file system on the server. The second is using SRB (Storage Resource Broker). Both are achieved using a simple, lightweight API. SRB is purely an option but may be used in lieu of the server's file system or in addition to the file system. Without going into a full description, SRB is a very robust, sophisticated storage manager that offers essentially unlimited storage and straightforward means to replicate (in simple terms, backup) the content on other local or remote storage resources.

3.14 Search and BrowseDSpace allows end-users to discover content in a number of ways, including: Via external reference, such as a Handle Searching for one or more keywords in metadata or extracted full-text

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 31 of 735

DSpace 3.x Documentation Browsing though title, author, date or subject indices, with optional image thumbnails Search is an essential component of discovery in DSpace. Users' expectations from a search engine are quite high, so a goal for DSpace is to supply as many search features as possible. DSpace's indexing and search module has a very simple API which allows for indexing new content, regenerating the index, and performing searches on the entire corpus, a community, or collection. Behind the API is the Java freeware search engine Lucene. Lucene gives us fielded searching, stop word removal, stemming, and the ability to incrementally add new indexed content without regenerating the entire index. The specific Lucene search indexes are configurable enabling institutions to customize which DSpace metadata fields are indexed. Another important mechanism for discovery in DSpace is the browse. This is the process whereby the user views a particular index, such as the title index, and navigates around it in search of interesting items. The browse subsystem provides a simple API for achieving this by allowing a caller to specify an index, and a subsection of that index. The browse subsystem then discloses the portion of the index of interest. Indices that may be browsed are item title, item issue date, item author, and subject terms. Additionally, the browse can be limited to items within a particular collection or community.

3.15 HTML SupportFor the most part, at present DSpace simply supports uploading and downloading of bitstreams as-is. This is fine for the majority of commonly-used file formats for example PDFs, Microsoft Word documents, spreadsheets and so forth. HTML documents (Web sites and Web pages) are far more complicated, and this has important ramifications when it comes to digital preservation: Web pages tend to consist of several files one or more HTML files that contain references to each other, and stylesheets and image files that are referenced by the HTML files. Web pages also link to or include content from other sites, often imperceptibly to the end-user. Thus, in a few year's time, when someone views the preserved Web site, they will probably find that many links are now broken or refer to other sites than are now out of context.In fact, it may be unclear to an end-user when they are viewing content stored in DSpace and when they are seeing content included from another site, or have navigated to a page that is not stored in DSpace. This problem can manifest when a submitter uploads some HTML content. For example, the HTML document may include an image from an external Web site, or even their local hard drive. When the submitter views the HTML in DSpace, their browser is able to use the reference in the HTML to retrieve the appropriate image, and so to the submitter, the whole HTML document appears to have been deposited correctly. However, later on, when another user tries to view that HTML, their browser might not be able to retrieve the included image since it may have been removed from the external server. Hence the HTML will seem broken. Often Web pages are produced dynamically by software running on the Web server, and represent the state of a changing database underneath it. Dealing with these issues is the topic of much active research. Currently, DSpace bites off a small, tractable chunk of this problem. DSpace can store and provide on-line browsing capability for

self-contained, non-dynamic HTML documents. In practical terms, this means:

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 32 of 735

DSpace 3.x Documentation No dynamic content (CGI scripts and so forth) All links to preserved content must be relative links, that do not refer to 'parents' above the 'root' of the HTML document/site:

diagram.gif is OK image/foo.gif is OK ../index.html is only OK in a file that is at least a directory deep in the HTML document/site hierarchy /stylesheet.css is not OK (the link will break) http://somedomain.com/content.html is not OK (the link will continue to link to the external site which may change or disappear) Any 'absolute links' (e.g. http://somedomain.com/content.html) are stored 'as is', and will continue to link to the external content (as opposed to relative links, which will link to the copy of the content stored in DSpace.) Thus, over time, the content referred to by the absolute link may change or disappear.

3.16 OAI SupportThe Open Archives Initiative has developed a protocol for metadata harvesting. This allows sites to programmatically retrieve or 'harvest' the metadata from several sources, and offer services using that metadata, such as indexing or linking services. Such a service could allow users to access information from a large number of sites from one place. DSpace exposes the Dublin Core metadata for items that are publicly (anonymously) accessible. Additionally, the collection structure is also exposed via the OAI protocol's 'sets' mechanism. OCLC's open source OAICat framework is used to provide this functionality. You can also configure the OAI service to make use of any crosswalk plugin to offer additional metadata formats, such as MODS. DSpace's OAI service does support the exposing of deletion information for withdrawn items, but not for items that are 'expunged' (see above). DSpace also supports OAI-PMH resumption tokens.

3.17 SWORD SupportSWORD (Simple Web-service Offering Repository Deposit) is a protocol that allows the remote deposit of items into repositories. SWORD was further developed in SWORD version 2 to add the ability to retrieve, update, or delete deposits. DSpace supports the SWORD protocol via the 'sword' web application and SWord v2 via the swordv2 web application. The specification and further information can be found at http://swordapp.org.

3.18 OpenURL Support

30-Nov-2012

https://wiki.duraspace.org/display/DSDOC3x

Page 33 of 735

DSpace 3.x Documentation DSpace supports the OpenURL protocol from SFX, in a rather simple fashion. If your institution has an SFX server, DSpace will display an OpenURL link on every item page, automatically using the Dublin Core metadata. Additionally, DSpace can respond to incoming OpenURLs. Presently it simply passes the information in the OpenURL to the search subsystem. A list of results is then displayed, which usually gives the relevant item (if it is