managing your metadata quality 2010 crossref workshops

35
Patricia Feeney Metadata Quality Coordinator Managing your metadata quality

Upload: crossref

Post on 10-May-2015

1.103 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Managing Your Metadata Quality 2010 CrossRef Workshops

Patricia FeeneyMetadata Quality Coordinator

Managing your metadata quality

Page 2: Managing Your Metadata Quality 2010 CrossRef Workshops

Agenda

I. Metadata quality auditII. DOI registrationIII. Conflicts overhaul (discussion)IV. Metadata Quality tools

Page 3: Managing Your Metadata Quality 2010 CrossRef Workshops

Best query ever -> bad metadata = matchMediocre query -> bad metadata = matchHorrible query -> bad metadata = match

Best query ever -> good metadata = match ✓+

Mediocre query -> good metadata = match (probably) ✓

Horrible query -> good metadata = match (maybe) ✓-

Metadata Quality Audit: Overview

Accurate and complete metadata is vital to querying and citation linking.

If the metadata for a DOI is incorrect, incomplete, or messy, a match can't be made, regardless of  the quality of a query.

Page 4: Managing Your Metadata Quality 2010 CrossRef Workshops

Current efforts include:

ReportsResolution report (emailed

monthly)depositor report (on website)crawler (on website)field report (on website)conflict report (on website,

emailed monthly)schematron reports (emailed

weekly)failed query report (on website)DOI error reports (emailed daily)

Contact members individually (as issues arise)

Documentation and communication

Page 5: Managing Your Metadata Quality 2010 CrossRef Workshops

Metadata Quality Audit

A Metadata Quality Audit will:   provide publishers with detailed feedback on

the quality of their metadata by identifying problem areas

identify members who need attention provide motivation and support to members

with metadata issues The intent of the audit is to provide information, but there may be consequences for extreme abusers.

Page 6: Managing Your Metadata Quality 2010 CrossRef Workshops

Audit Scope

I. DOI resolutionII. ConflictsIII.Overall metadata

qualityIV.Metadata

maintenance Hello, I’d like to

audit you

Great, lets get

started!Hooray!

Page 7: Managing Your Metadata Quality 2010 CrossRef Workshops

Level I: DOIs that have been distributed but not deposited and resolve to the Handle error page. * Level II: DOIs resolving to an error page *

Level III: DOIs with response page blocked by access control Level IV: DOIs that resolve to an inadequate response page.

I. DOI Resolution

* actionable transgressions

Page 8: Managing Your Metadata Quality 2010 CrossRef Workshops

II. Conflicts

Conflicts occur when two (or more) DOIs are deposited with identical metadata.

Level I: conflicts created between members * 

Level II: conflicts within a publisher prefix(es) *

Level III: conflicts created due to insufficient metadata +

Level IV: conflicts created due to item/content type +

* actionable transgressions+ this may change, more later

Page 9: Managing Your Metadata Quality 2010 CrossRef Workshops

Quality of deposited metadata

I. Missing metadata: is all available metadata deposited?

II. Accuracy: is metadata correct?

III. Unusual metadata: does metadata fit into the correct content type?

IV. Overall quality: is metadata messy?

Page 10: Managing Your Metadata Quality 2010 CrossRef Workshops

Maintenance

I. Gaps in coverage - this usually indicates undeposited DOIs (very very bad)

II. Currency of deposits - are deposits made ahead of DOIs being distributed?

III. Title maintenance - less of a problem with recent title restrictions, but we still have problems, title abbreviations

IV. Reference linking compliance

Page 11: Managing Your Metadata Quality 2010 CrossRef Workshops

Actionable AreasDOI Resolution:

Level I (Undeposited DOIs)Level II (DOIs resolving to error page)

If action is not taken within a reasonable time period (TBD), DOIs will be registered on behalf of the member (eventually for a fee) Continual distribution of unregistered DOIs may affect membership

Conflicts:Level I conflict created between members Level II conflicts within a publisher prefix

A $2 per DOI conflict penalty fee may be imposed for conflicts of this type if they are not resolved within a reasonable time period (TBD).

Metadata Maintenance:Outbound linking compliance

members found to not be linking during the audit will be subject to non-linking penalties

Page 12: Managing Your Metadata Quality 2010 CrossRef Workshops

Audit Process

Page 13: Managing Your Metadata Quality 2010 CrossRef Workshops

Questions?

Page 14: Managing Your Metadata Quality 2010 CrossRef Workshops

II. DOI Registration Pilot

DOIs should without exception be registered before they are released to the public.

Most DOIs resolve, but the ones that don’t are a big problem.

Solution: we’re going to register them*

*(ideal solution: publisher registers them)

Page 15: Managing Your Metadata Quality 2010 CrossRef Workshops

DOI selection: At the moment, we will register DOIs reported by end users, using the DOI error report as a source.

Page 16: Managing Your Metadata Quality 2010 CrossRef Workshops
Page 17: Managing Your Metadata Quality 2010 CrossRef Workshops

DOI error report:

Implemented mid-2008

~4,000 DOI errors reported monthly

> 1,400 fixed monthly through publisher deposits

Some of the unfixed DOIs are not ‘real’ DOIs, but many are.

Page 18: Managing Your Metadata Quality 2010 CrossRef Workshops

We will register DOIs that meet the following criteria: Have been distributed publicly by the

publisher/prefix owner Have an identifiable response page Have been reported to the publisher’s

technical and business contacts

Page 19: Managing Your Metadata Quality 2010 CrossRef Workshops

DOI Registration Process

1.  DOI reported: a user reports an unresolving DOI using the DOI error form

2. Technical contact notified (DOI error report email)

3. CrossRef review: CR staff reviews reported DOIs and expires DOIs that do not meet our registration criteria

4. Business contact notified: 2 weeks from the initial report, business contact is notified of remaining valid unregistered DOIs.

5. CR deposit: after 2 weeks have passed from business contact notification, CrossRef will register any undeposited DOIs.

Page 20: Managing Your Metadata Quality 2010 CrossRef Workshops

Questions?

Page 21: Managing Your Metadata Quality 2010 CrossRef Workshops

Conflicts overhaulConflicts occur when two (or more) DOIs

share the same metadata, suggesting two DOIs are assigned to a single item.

Page 22: Managing Your Metadata Quality 2010 CrossRef Workshops

Why are conflicts bad?

Only one DOI should be assigned per item

Queries will return multiple DOIs, causing confusion

Some queries (OpenURL) may not return a DOI if multiple results are present

Conflicts between two DOIs often result in one of the DOIs being neglected***

Page 23: Managing Your Metadata Quality 2010 CrossRef Workshops

We currently have ~200,000+ conflicts in our system. Not all of them are a problem:

For some items, our schema only allows minimal metadata

Some content types require matching metadata (standards and book chapters with minimal metadata (dictionaries) for example)

Page 24: Managing Your Metadata Quality 2010 CrossRef Workshops

Legitimate conflicts

Conflict between 2 prefixes:

http://dx.doi.org/10.1639/0044-7447(2001)030[0037:IOPOFU]2.0.CO;2

http://dx.doi.org/10.1579/0044-7447-30.1.37

Sample query

Conflict within 1 prefix:

http://dx.doi.org/10.3724/SP.J.1006.2008.00070http://dx.doi.org/10.3724/SP.J.1006.2008.00770

Journal Title Year Vol Issue Page Author

Article Title

AMBIO 2001

30 1 37 Köhlin Impact of Plantations on Forest Use a...

Journal Title Year Vol Iss

Page

Author Article Title

ACTA AGRONOMICA SINICA

2008 34 5 770 Zhang Differential Gene Expression in Upper…

Page 25: Managing Your Metadata Quality 2010 CrossRef Workshops

‘Bad’ conflicts

Conflicts with minimal metadata:

10.1002/ijc.1109510.1002/ijc.11093

Conflict due to content type:

10.1520/C0506-10 10.1520/C0506-10A10.1520/C0506-10B

Journal Title Year Vol Issue

Page Author Article Title

International Journal of Cancer 2003 104 6 798 Errata

Book Title Year Edition

Page Author Title

Specification for Reinforced Concrete...

2010 2010

C13 Committee

Page 26: Managing Your Metadata Quality 2010 CrossRef Workshops

Elements considered during conflict generation: Content type Journal, book and/or series title Article title /content_item title (book chapters) Publication year Volume Issue First page Author Edition

If there is a match between all deposited elements, a conflict is generated.

2 Items with matching journal title, volume, issue, and article title will cause a conflict.

Page 27: Managing Your Metadata Quality 2010 CrossRef Workshops

Ideas?What should our minimum set of

metadata be?

How should conflicts be monitored/reported?

Page 28: Managing Your Metadata Quality 2010 CrossRef Workshops

Managing your metadata quality

Page 29: Managing Your Metadata Quality 2010 CrossRef Workshops

Sample #1: incorrect metadataQ: My link resolver is retrieving the wrong metadata for

DOI 10.1002/rra.1288, causing our links to break - here is my query*:

http://www.crossref.org/[email protected]&aulast=Null&title=River Research and Applications&volume=26&issue=6&page=663&year=2010

*query metadata matches the response page metadata

A: Two problems with deposited metadata (DOI query):#1 <year media_type="print">2009</year>

#2 <pages> <first_page>n/a</first_page> <last_page>n/a</last_page>

</pages>

Page 30: Managing Your Metadata Quality 2010 CrossRef Workshops

Sample #2: messy metadata

Q: I know DOI 10.1068/p6742 exists, why doesn’t my query work?

A: Let’s check the guest query form

Metadata for article:

Newport R, Preston C, 2010, "Pulling the finger off disrupts agency, embodiment and peripersonal space" Perception 39(9) 1296 – 1298

Problem is: author surname is deposited as: <person_name sequence="first" contributor_role="author">

<given_name>Roger</given_name></given_name>

<surname><surname>Newport</surname></surname>

</person_name>

Page 31: Managing Your Metadata Quality 2010 CrossRef Workshops

Sample #3: duplicate authorsQ: Why does DOI 10.2307/1382491 have multiple

versions of the same author?

A: attempt to improve query matching

<contributors> <person_name sequence="first"

contributor_role="author"> <given_name>Erling Johan</given_name> <surname>Solberg</surname>

</person_name> <person_name sequence="additional"

contributor_role="author"> <given_name>Bernt-Erik</given_name> <surname>Sæther</surname> </person_name> <person_name sequence="additional"

contributor_role="author"> <given_name>Bernt-Erik</given_name> <surname>Saether</surname> </person_name>

</contributors>

Page 32: Managing Your Metadata Quality 2010 CrossRef Workshops

New(ish) tools for managing metadata and deposit problems

Schema documentation: http://www.crossref.org/schema/documentation/ or linked from help doc

Reporting problems / asking for help:

Help documentation (http://www.crossref.org/help/)

Support portal and forums (http://support.crossref.org)

Contact [email protected]

Page 33: Managing Your Metadata Quality 2010 CrossRef Workshops

Schematron update

Schematron reports notify depositors of non-fatal deposit issues

35-40 emails sent out weekly

Alerts are generated for < 1% of deposits

Tend to identify ‘messy’ deposits

Rules updated periodically

Page 34: Managing Your Metadata Quality 2010 CrossRef Workshops

Schematron Warnings

page number contains under-

score2%

first page contains dash4%

last page contains

dash7%

Jr.' in surname61%

punctuation in surname

26%

Jr. in surname:Araújo JrPrata Jr.Szezech Jr.Punctuation in surname:(Earven) TribbleFrederick (Frikkie) J.Arch Marin [email protected]********Other rules:

‘ed’ ‘iss’ ‘vol’ in edition, issue, volume elements

Publication year exceeds current year by >2

Surname / title all upper case