institutional repositories: populating and searching irs anthony w. ferguson university of hong kong...
TRANSCRIPT
Institutional Repositories: Populating and Searching
IRs
Anthony W. FergusonUniversity of Hong KongLibrarian and Acting Director of IT in Learning
Today’s Goals
Discuss how to get the members of your academic community to populate your institutional repository (IR) and
Discuss how to maximize for your faculty the value of the items added to it through the optimal use of metadata
Background: What is an Institutional Repository?
“islands of information across the landscape of the Web.”
“A recognition that the intellectual life and scholarship of our universities will increasingly be represented, documented, and shared in digital form, and that a primary responsibility of our universities is to exercise stewardship over these riches: both to make them available and to preserve them.”
“. . . the means by which our universities will address this responsibility both to the members of their communities and to the public.”
“It is a new channel for structuring the university's contribution to the broader world, and as such invites policy and cultural reassessment of this relationship.”
Basic Components
Digital information Computer server A relational database placed upon the server A client system used to load content into the
database An end user interface to search/harvest the database
and display what is sought Some sort of system that screens what data can be
viewed and by whom can it be viewed People to make all sorts of decisions about what
content can be added to the database, how the content should be formatted, and how to describe the data so that search engines can find what users want
Information Characteristics
Grouped into communities, archivesCommunities might be departments, etc.Can hold the full text data itself or pointers
to other servers where the full text residesData are given persistent URLSData is described using metadata, e.g.
author, title, creator, etc., so that search engines like Google can find it.
Types of Digital Information
Bibliographies Campus magazines Campus newspapers Committee reports
and memoranda Conference
proceedings Consulting (technical)
reports
Departmental publications
Grant applications and other documentation
Journal articles Learning objects Maps Multimedia
publications Pamphlets
More Types
Photograph/slides Poster session
displays Preprints Reformatted digital
library collections Research and
technical reports Sound recordings Statistical reports
Surveys and survey results
Technical documentation
Technical drawings Theses & Dissertations Theses and
dissertations Videos Working papers
IR’s No. 1 Problem
Getting ContentAn April 2004 survey of 45 IRs found the average
number of documents to be only 1,250 per repository, with a median of 290.
This is a small number when considering the hundreds of thousands of dollars and staff hours that go into establishing and maintaining an IR.
For example, MIT Libraries estimate that their IR will cost $285,000 annually in staffing, operating expenses, and equipment escrow. With approximately 4,000 items currently in their IR, that is over $71 spent per item, per year.
What is the problem? Fills few needs: Foster/Gibbons
Author Needs (F/G) What an IR can Do
Work with co-authors Not a concern
Track variant versions
Not a concern
Make work know Yes – can help a lot
Access others’ work Only if they have an IR
Keep up with field Yes – can help a bit
Organize own papers Possible
Control ownership, security, & access
Yes – but allows many more people to access
More of What is the problem?
Author Needs (F/G) What an IR can Do
Preserve access Yes. Strong point.
Not responsible for servers
Yes. Strong point
Avoid copyright issues
Maybe a larger problem
Avoid computer issues
Yes – but still complicated
Reduce chaos One more thing to do
No more work This is more work
Foster and Gibbons’ Solution
Focus on solving faculty concernsShowcase them personallyPreserve their workProvide links to their workLet them control their own dataMaintain server for themDon’t ask them to do anything
complicated
Top 18 Things You Can DoTo Get People to Populate
Your Institutional Repository
This is one time that it is not true that “if you build it they will come”
No. 1 Solution: Low hanging fruit
1. Look for low hanging fruit Find out who already contributing to
other repositories: OAIster harvester Surf your own university’s home page Find out who is publishing in OAI
journals Load exiting digital content you might
have Google your own University
No. 2: Actively educate
2. Conduct active education campaign F2F introductions Departmental meetings Newsletters
Researcher at MIT said it took 5-7 contacts with IR before they participate
No. 3 Showcase individuals not departments
Showcase individualsMake it look like clickable resumeMake them know that Google will
crawl itTell them about higher citation ratesTell them how easy it will be to refer
others to their workMake sure it is easy to do
No. 4: Tell them when someone reads their content
Provide faculty with statistics on use of their materials
Feature top 10 downloaded papers/objects
Publicize, publicize, publicize
No 5: Look for retirees
Look for recently or soon to retire faculty
They have a lot of contentThey may not be sure what to do with itShow them a better way to be
rememberedThey have time to work on copyright
problems, metadata, etc.
No. 6: Ask Library Liaisons For Help
Ask the about top publishersAsk about who is up for tenureAsk about who is creating datasetsAsk them to give presentations at
faculty meetingsAsk them to talk to the graduate
students who need to get known
No. 7: Send Out Surveys
Ask who is contributing to other repositories or preprint archives?
Ask them where they store their digital content?
Ask about number of times they send out copies of articles to others?
See who has their own pages with data?Ask them if they think their digital
materials will be there in 5 to 10 years?
No 8: Create an IR Web Page
Give step by step instructions on how to add content to their part of the IR
Show how they can point from their web page to the IR
Give information about the value of persistent URLs
Provide links to information about publishers who support IRs
No. 9: Contact Those Using Blackboard, etc.
Ask them about what they are doing with their non active files when they are not currently teaching a class
Invite them to deposit their files
No. 10: Check University Calendar
Most departmental conferences, symposia, etc will have unpublished papers, power points, etc., digital content
Help them put them together in the form of a online book of readings, a transaction
No’s 11 & 12
11. Hold pizza and beer/tea lunches for graduate students to introduce them to IRs and how they can get their name/work out there. Ask Faculties and Departments to select Top X papers each year and put them into the IR.
12. Ask departments that process professional leave/travel requests to insert a brochure about contributing their papers to the repository.
No’s 13 & 14
13.Carefully get University Administration to encourage/require deposit sponsored research in the repository
14.Give faculty members an easy to use form/addendum securing IR deposit rights which they can send along with their paper once a journal publisher has accepted their paper
No. 15 & 16
15.Look for departments trying to publish their own journals and show them how an IR can help them do it online for much less money
16.Ask your university press/others in your area if they wouldn’t like to digitize their out of print books and add to the IR
No. 17 & 18
17.Work with your university archive to use the IR as a medium of exposing their digital resources to the world of scholarship
18.Point out that an IR’s contents capture and reflect a university’s intellectual capital, the quality and quantity of the research being done there.
2nd Topic: Maximizing Value Through Metadata
What is metadata?In the old days we had author,
title, and subject headings in our card catalogues
Now we have author, title, etc., metadata headings in our online catalogues and in IRs
Metadata Schemes
Libraries use the MARC MAchine Readable Cataloguing metadata standard
There are US, China, UK, etc. flavours IRs use the Open Archives Initiative
Protocol for Metadata harvesting (OAI-PMH) standard
OAI-PMH employs the 15 Dublin Core headings
Dublin Core Elements
1. 1 Title 2. Creator 3. Subject 4. Description 5. Publisher 6. Contributor 7. Date 8. Type
9. Format 10. Identifier11. Source12. Language]13. Relation 14. Coverage 15. Rights
Dublin Core Pro’s and Con’s
Pro Compared to MARC,
DC is wonderfully simple
Search engines like Google can trawl it and know what it is looking like since the XML markup language is familiar to it – users can do systematic searches and find what they want
Con Still time consuming Time costs money HKU’s library adds
50,000 books a year Employs 20 or so
cataloguers Most universities
would be loathe to create another costly library bureaucracy
How Metadata is Added to an IR
Possible Solutions
Possible solutionsMake the faculty
do all the work
Possible reactionsFaculty members
like to do research not fill out forms.
Possible Solutions
Idea Consequence
Make the contributor do all the work
Faculty want to do research, not fill forms
Have computer automatically do it
Easier to say it than do it
Train faculty secretaries/staff to do all the work
Still will need editors to review and fill in holes
Use existing metadata for items that already have it
Good idea when you have it, e.g., digital theses, etc.
Conclusion:
The IR movement was initially dominated by the anti commercial publishing movement
Now, it isn’t. More and more publishers are allowing IR postings.
IRs are now becoming mainstreamThe suggestions given here will hopefully
help gain community support and will maximize the value of what is contributed.