Download - dorsdl2006-arrow
The ARROW Project: A consortial institutional repository solution, combining Open Source and proprietary software
David Groenewegen
ARROW Project Manager
DORDSL Workshop, 21 September 2006
2
Outline
Why did we want a repository? What is ARROW? What is VITAL and how does it relate to Fedora? Where is ARROW now? What have we learnt so far? ARROW Stage-2
DORDSL Workshop, 21 September 2006
3
Why did we want a repository?
provides a platform for promoting research output in the ARROW context
safeguards digital information gathers an institution’s research output into one place provides consistent ways of finding similar objects allows information to be preserved over the long term allows information from many repositories to be gathered
and searched in one step enables resources to be shared, while respecting access
constraints (when software allows access controls) enables effective communication and collaboration between
researchers
DORDSL Workshop, 21 September 2006
4
What is ARROW?
ARROW Project: Originally funded for 3 years until December 31, 2006,
recently extended for 12 months. Funded by the Australian Commonwealth Department of
Education, Science and Training (DEST), under the Research Information Infrastructure Framework for Australian Higher Education.
“The ARROW project will identify and test software or solutions to support best practice institutional digital repositories comprising e-prints, digital theses and electronic publishing.”
DORDSL Workshop, 21 September 2006
5
Who is ARROW?
Founding ARROW Partners: Monash University (lead institution) National Library of Australia The University of New South Wales Swinburne University of Technology.
ARROW Members: University of South Australia University of Southern Queensland Queensland University of Technology Central Queensland University University of Western Sydney La Trobe University 4 other RUBRIC members are expected to sign soon
Together they form the ARROW Community
DORDSL Workshop, 21 September 2006
6
What did the ARROW projectset out to achieve?
Solution for storing any digital output Initial focus on print equivalents – theses, journal articles Now looking at other datasets, learning objects
More than just Open Access – some things need to be restricted
Copyright Confidentiality/ethical considerations Work in progress
DORDSL Workshop, 21 September 2006
7
What did the ARROW projectset out to achieve? (2)
Meeting DEST reporting requirements Expected move to Research Quality Framework (RQF) has increased
the focus on repositories
Employ Open Standards Making sure the data is transferable in the future
Deliver Open Source Tools back to the FEDORA Community
Solution that could offer on-going technical support and development past the end of the funding period
DORDSL Workshop, 21 September 2006
8
What is ARROW now?
A development project Combining Open Source and proprietary software:
Fedora™ VITAL Open Journal Services (OJS)
NOT a centralised or hosting solution Every member has their own hardware and software
DORDSL Workshop, 21 September 2006
9
Why Fedora?
ARROW wanted: a robust, well architected underlying platform a flexible object-oriented data model to be able to have persistent identifiers down to the level of
individual datastreams, accommodating its compound content model
to be able to version both content and disseminators (think of software behaviours for content)
clean and open exposure of APIs with well-documented SOAP/REST web services.
DORDSL Workshop, 21 September 2006
10
ARROW and Fedora™
Since the beginning of the project ARROW has worked actively and closely with Fedora™ and the Fedora Community ARROW Project Technical Architect is a member of Fedora
Advisory Board ARROW Project Technical Architect sits on Fedora
Development Group
This is reinforced by VTLS Inc. VTLS President is a member of Fedora Advisory Board VITAL Lead Developer sits on Fedora Development Group
DORDSL Workshop, 21 September 2006
11
Partnering for success, support and survivability
ARROW needed to partner with a developer who could not only produce the software but could provide ongoing user support and development after December 31, 2006
Why VTLS Inc.? VTLS wanted to be a development partner Had begun work on a repository solution already Familiar with library sector Willing to produce a combination of a proprietary solution,
Fedora and other Open Source software
DORDSL Workshop, 21 September 2006
12
What is VITAL?
ARROW specified software created and fully supported by VTLS Inc. built on top of Fedora™ that currently provides: VITAL Manager VITAL Portal VITAL Access Portal VALET - Web Self-Submission Tool Batch Loader Tool Handles Server (CNRI) Google Indexing and Exposure SRU / SRW Support
DORDSL Workshop, 21 September 2006
13
Fedora™
VITAL architecture overview
IndexesHandles server
Web servicesGoogle exposureSRU/SRW
Batch Loading Tool
Access Portal
Valet
Vital Manager
DORDSL Workshop, 21 September 2006
14
Where are we now?
2004
• Developed architecture• Selected, tested Fedora™ OJS• VITAL 1.0
2005• VITAL 1.3• Started populating repositories• OAI-PMH harvesting• ARROW Discovery Service• Open sources tools released• VITAL 2.0
2006
• VITAL 2.1• VITAL 3.0 (in test)• Authentication/Authorization Services• Enhanced Content Models• Usage and access statistics• User configurable interfaces• Movement towards a pure Web based interface• Support for OAI sets• Integration with 3rd party modules like federated search
2007ARROW Stage-2
DORDSL Workshop, 21 September 2006
15
ARROW Repositories
Monash University http://arrowprod.lib.monash.edu.au:8000/access
University of New South Wales http://arrow.unsw.edu.au/
Swinburne University of Technology http://researchbank.swinburne.edu.au/access/
Central Queensland University http://library-resources.cqu.edu.au:8888/access/
DORDSL Workshop, 21 September 2006
16
Implementation decisions
Atomistic or compound objects?Descriptive metadata
adopt one or enjoy MANY types? JHOVE validation JHOVE metadata extraction
Use cases and content modellingWhat import /export formats?
honouring what standards? validation, when and how?
DORDSL Workshop, 21 September 2006
17
Policy frameworks and decisions
Direct or mediated deposit? managing workflows
Open or closed access? LDAP authentication? XACML authorisation
creating policies -who can do what? Shibboleth
Persistent URL format? External searching and harvesting?
OAI-PMH spidering
post ARROW project support For more detail see Andrew Treloar’s talk at:
http://www.lib.virginia.edu/digital/fedoraconf/schedule.shtml
DORDSL Workshop, 21 September 2006
18
External searching and harvesting
Realised need to develop a discovery service for Australian institutional repositories
The ARROW Discovery Service developed by the NLA, provides consolidated searching across many Australian repositories, (uses OAI-PMH)
Picture Australia developed by the NLA, harvesting image collections (uses AOI-PMH)
SRU/SRW interface released as Open Source Software
Harvesting
Google and other service providers
DORDSL Workshop, 21 September 2006
19
ARROW Discovery Service
http://search.arrow.edu.au/
Provides a national resource discovery service including: providing an appropriate search interface
simple search, advanced search, & browse options
contributing to other networks OAIster, Yahoo, Google
Ensuring appropriate local institutional and national “branding” of the service
occurs throughout the ADS interface and the exchanged metadata
providing appropriate subject-based access The Australian Standard Research Classification list
DORDSL Workshop, 21 September 2006
20
Open Source contributions for Fedora
Already made: SRU/SRW HANDLES JHOVE Metadata extraction Exposure to Web indexing crawlers.
Coming in 2006: LDAP Authentication Administrative Reporting Bulk Citation Export Statistics for Public Users Metadata Synchronisation Requirements
DORDSL Workshop, 21 September 2006
21
Upcoming VITAL Version 3.0 Authentication/Authorization Services.
XACML (Policy enforcement)
Enhanced Content Models.
Usage and access statistics.
User configurable interfaces.
Movement towards a pure Web based interface.
Support for OAI sets.
Integration with 3rd party modules like federated search.
Access to content via VTLS reseller arrangements.
Future of VITAL
DORDSL Workshop, 21 September 2006
22
What have we learnt so far?
Multiple partners are good: Sharing of information and experiences Sharing of development work Multiple perspectives on issues
and bad: Multiple perspectives on issues Scope creep Managing expectations Pressure on the project management team Pressure on development team and partners Deadline conflicts
Software development feels slow, both commercial and open source
Development with a commercial partner can be tricky
DORDSL Workshop, 21 September 2006
23
What have we learnt so far? (2)
That there aren’t enough real standards in this areaOpen versus closed repositories, or information
management versus accessibility is a BIG ISSUERepositories are only partly about software -
advocacy, policy, institutional engagement and grunt work need equal attention
Constraints of dealing with copyright
DORDSL Workshop, 21 September 2006
24
ARROW Stage 2
Funded to the end of 2007Supporting the RQFCreative development of institutional repositoriesSupporting Australian engagement with institutional
repositoriesBuilding partnerships to further enhance repositories Identifier Management Infrastructure for e-Research
Resources (PILIN)
DORDSL Workshop, 21 September 2006
25
Some changes in direction
Trying to do more development ourselves to: Spread the knowledge Leverage our use of Fedora
Want to work with VTLS in new ways Contract is finished now Some work we need to do is too local for VTLS VINES
DORDSL Workshop, 21 September 2006
26
Supporting the RQF
Inclusion of all discrete pieces of evidence, regardless of content type
Including traditional text evidence and less traditional evidence, such as art works and music compositions or performances
Provision for maximum possible exposure of content Subject to copyright constraints.
Inclusion of metadata and links to content in commercial resources.
Reporting to DEST through multiple channels Such as Research Master, or direct to the repository.
Support for access and authorisation regimes. Retention of all evidence
To build institutional research profiles over time.
DORDSL Workshop, 21 September 2006
27
Creative development of ARROW institutional repositories
Inclusion of multimedia and creative works produced in Australian universities.
To date have had limited exposure nationally or internationally. Addition of annotation capability Inclusion of datasets and other research output not easily
provided in any other publishing channel. In conjunction with the DART (ARCHER) Project.
Exploration of the research-teaching nexus by facilitating multiple uses of content held in repositories.
Integration with or development of new tools that will allow value added services for repositories.
For instance the creation of e-portfolios or CVs of research output of individual academics.
DORDSL Workshop, 21 September 2006
28
ARROW Projects
ARROW is planning a number of local projects targeting local and community needs. These will interact directly with Fedora™ and VITAL where appropriate. The development is being done within the ARROW Community.
DORDSL Workshop, 21 September 2006
29
Partner projects in 2007
Gathering research output from websites (UNSW) Displaying outputs through websites (portfolios)
(UNSW/Swinburne) Understanding workflows and needs of academics
(UNSW/Swinburne) Improving the ARROW Discovery Service (NLA)
OAI Sets support Greater automation Statistics capture and reporting Integration of e-journals
Usability analysis (Swinburne) Data needs survey (Swinburne) Building Rules for Access to Controlled Electronic Resources
(BRACER) (Monash)
DORDSL Workshop, 21 September 2006
30
Supporting Australian engagement with institutional repositories
FRODO and MERRI projects have resulted in a significant leap in the levels of understanding and engagement with repositories in Australia,
Now the challenge is to translate this into substantial repository activity.
The newly formed ARROW Community is intended to provide a central platform for support and the exchange of information.
DORDSL Workshop, 21 September 2006
31
The ARROW community
Sharing knowledge and experiences Annual meeting – inaugural one September 8, 2006 Regular workshops Working Groups
ARROW Repository Managers Group ARROW Development Group
Possibly groups for: Portfolio design Metadata: METS, MODS, DC and the future
Discussion group GoogleGroup
ARROW provides logistical and admin support
DORDSL Workshop, 21 September 2006
32
Building partnerships to further enhance repositories
Through partnerships with other projects ARROW will endeavor to use best practice and new innovations to further enhance Australian repositories beyond their current limitations.
These include: APSR: http://www.apsr.edu.au/ DART/ARCHER: http://www.dart.edu.au/ ICE: http://ice.usq.edu.au/ MAMS: http://www.melcoe.mq.edu.au/projects/MAMS/ OAK-Law: http://www.oaklaw.qut.edu.au/ RUBRIC: http://www.rubric.edu.au/
DORDSL Workshop, 21 September 2006
33
PILIN - Persistent Identifiers and Linking INfrastructure Growing realisation that sustainable identifier infrastructure is
required to deal with the vast amount of digital assets being produced and stored within universities.
This is a particular challenge for e-research communities where massive amounts of data are being generated without any means of managing this data over any length of time.
The broad objectives are to: Support adoption and use of persistent identifiers and shared
persistent identifier management services by the project stakeholders.
Plan for a sustainable, shared identifier management infrastructure that enables persistence of identifiers and associated services over archival lengths of time.
DORDSL Workshop, 21 September 2006
34
Questions?
ARROW Project [email protected] http://arrow.edu.au/
ARROW Project Manager [email protected]