[powerpoint]

Post on 27-Nov-2014

511 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Metadata Issues for e-Prints:experiences from setting up an

Institutional Repository

Jessie HeyResearch Fellow TARDis Project

University of Southampton

ePrints UK WorkshopAshmolean Museum Oxford

22 Mar 2004

e-Prints

A simple illustration of diversity in metadata!

• EPrints (software)• e-Prints (Soton)• ePrints (UK project)• eprints (in URLs, emails)• E-print (Network – US gateway)

Searching for e-Prints in Googlee-Prints 1,200,000; eprints 225,000

Plam pilot?

• Looking for a PDA?

• Just try searching for plam pilot on eBay

• Even a sale is not incentive enough

Metadata

• The modern word for ‘Data about data’

• Generally structured data describing an e-Print in this context

• Describing an object such as a journal article or book chapter or thesis

Metadata issues for today

• Who needs the quality?• What kind of quality?

• How we approached it in TARDis– the depositor– the process– classification– mediation

• Balancing demands the pragmatic way

Who needs the quality?

Service providers (i.e. search services)

• Analysis in both e-learning and e-prints communities showed concern about quality of metadata in individual databases to give good search results when combined in cross-domain search services

Barton, Jane, Currier, Sarah and Hey, Jessie M.N. (2003) Building quality assurance into metadata creation: an analysis based on the learning objects and e-Prints communities of practice. In: 2003 Dublin Core Conference: Supporting Communities of Discourse and Practice - Metadata Research and Applications, DCMI, 39-48.http://eprints.soton.ac.uk/archive/00000020/

As I am in Oxford…

• a tribute in Elvish to JRR Tolkien from the Lord of the Rings

Gandalf on Dublin Core metadata

• ‘I cannot read the fiery letters,’ said Frodo in a quavering voice.

• ‘No’ said Gandalf ‘but I can. ……this in the Common Tongue is what is said, close enough:

• One Ring to rule them all, One Ring to find them,

• One Ring to bring them all and in the darkness bind them.’

Standards for e-Prints: Dublin Core Metadata Sets

• Define minimal metadata elements for simple resource discovery

e.g. title, creator, subject and keywords, publisher, date, rights management

• Fundamental building blocks for Open Archive Initiative compliant repositories

• Software such as GNU EPrints is OAI compliant (in DSpace may need ‘switching on’)

• Full text searching (in latest version) will give additional help to compensate for weaknesses

Who needs the quality?

• Academics (the depositors) need reasonable quality for their publication record whether full text is available or not– Tendency to think a good citation matters less if

access leads straight to the full text

An institutional repository needs• To represent their own work well• To represent their faculty and university well

• For publicity and communication• For research assessment and proposals• For promotion

What kind of quality?

• Fit for purpose – visibility and citability

• Rolls Royce or Volkswagon Golf or a Skoda?

• The Rolls Royce may not produce a sustainable repository

• Library of Congress had to think again with a backlog of millions

• A departmental archive had to scrap its editors (too slow)

• Need a model with a light touch

Examples to correct

From an academic’s current departmental publication record:

• Co-author given as Fadden on older references

• Given as McFadden on newer ones

• McFadden would not find all his papers!

Examples to correct

• Authors are not perfect but neither are information specialists or other sources

Recent examples:

• Author’s assistant put a conference in year 2400

• ‘Web of Knowledge’ put a conference in 2010

NB Amazon proved useful for checking book information from the title page (new Amazon ‘search inside’ service) but main entries may be less accurate

Quality Assurance Procedures

• Would like to pick up these and obvious examples of metadata in the wrong field eg book title used for title of chapter

• Options include regular checking (e.g at or close to time of deposit or for annual reporting) or random checking

• Visualisation techniques promising but still expensive

How we approached it in TARDis

• Looked at process from point of view of depositor– to decrease the barriers to deposit– to improve quality by design or example

• Looked at metadata required for a good citation– academics using e-print records for many purposes

not just visibility

• Some information may be easier to strip out if required but harder to add later e.g.– first name or initials – although cultural variations

too– journal title or abbreviation

Simple things deter

• Questions you can’t answer• No place to put it• Errors which force you to enter it again

• On a credit card payment– Date on the card: 06/05– Date to enter: 06/2005How many times do I do this incorrectly!

To help the depositor

• Aimed to enter information as the depositor sees it on the full text

• Arranged input in the order the information is seen

• With relevant information grouped together

• With ‘pages’ of daunting size• Fields of a size to view as much of the

text as possible

TARDis - Aiding deposit – relevant fields – relevant help

The Process

• Added help where examples are useful• Added extra buttons at top to ease

navigation• Made mandatory fields where essential• Tension between full details and

deterrent– commentary field currently not included

although some might find useful

Some ‘quality’ traditions may be less practical

• Search service recommendations: capitals only for first word of title except proper nouns

• Process is generally ‘cut and paste’ so result is variable and advice ignored

• Get Caps, non-caps, rarely ALL CAPS

• Found in practice likely to be too time consuming to insist

• Think retrieval first rather than consistency

Classification – a specific area of debate

• ePrints UK exploring automatic classification with Dewey

• TARDis looked at current practice: Reviewed subject classification in discipline

based and early institutional archivesFound whole variety of choices and levels

of complexity

TARDis on subject classification

• Discussion of issues and snapshot chart http://tardis.eprints.org

• Using basic Library of Congress with view to harvesting eg papers in Oceanography

• Added search box to find subject• Departments could use an additional scheme if they

wish (software option)• Keywords can be added (cut and paste) if available

(sometimes papers also have classification categories added for a journal)

• Computer classification generally expensive and requires learning examples but accuracy is improving

Towards the future – subject classification – on the fly

Mediation

• TARDis is experimenting with deposit choices

• Branch to:

– Self archiving (author or local assistant) with light review as pass through submission buffer

– Assisted archiving – give us the file with essential details not evident from the full text

Mediation in practice

• Current experience:

– Assisted archiving often time consuming – meeting the difficult ones - but can add value (e.g.fuller publisher location details such as DOI)

– Self archiving less accurate but author may know details which may be missing from full text

– Balance likely to change as authors become either more familiar with early deposit or perhaps happy to delegate to save time

– Learning curve for us – later may devolve some quality responsibility (use editorial options)

– Give additional feedback into software

The challenge of cutting and pasting from PDFs

• Sometimes rather like the Hyperbookworms (Jasper Fforde, The Eyre Affair)

• Who produce spurious capitals, apostrophes, hyphens

• Problems with hyphens, accents and words starting with f!

• LaTex usually the culprit so Humanities have an advantage here

Balancing demands the pragmatic way

• Author deposit changes the equation• Incentives can increase accuracy

– Deposit support– Requests by department or university or

funding council for up to date records

• Collaboration between author, department and information specialist may be best way forward

• Aim: light quality control to achieve visibility and citability

The New World of e-Prints

• Not so elegant to work in as an Oxford College Library such as Brasenose

• But should be just as satisfying to use as it meets new needs

Thank you

For further information:

TARDis http://tardis.eprints.org/

e-Prints Soton (Research Soton) http://eprints.soton.ac.uk/

FAIR Focus on Access to Institutional Resources Programme

"Improving the Quality of Metadata in Eprint Archives" Marieke Guy and Andy Powell Ariadne Issue 38 30-January-2004

Barton, Jane, Currier, Sarah and Hey, Jessie M.N. (2003) Building quality assurance into metadata creation: an analysis based on the learning objects and e-Prints communities of practice. In: 2003 Dublin Core Conference: Supporting Communities of Discourse and Practice - Metadata Research and Applications, DCMI, 39-48.http://eprints.soton.ac.uk/archive/00000020/

top related