© keith g jeffery, anne asserson keith g jeffery consultant keith.jeffery@...

15
© Keith G Jeffery, Anne Asserson Keith G Jeffery Consultant keith.jeffery@ keithgjefferyconsultan ts.co.uk Anne Asserson University Library University of Bergen [email protected]. no Auditing Grey in a CRIS Environment 2-3 Dec 20123 Bratislava Auditing Grey in a CRIS Environment 1 Keith G Jeffery Consultants

Upload: jeffry-french

Post on 13-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

© Keith G Jeffery, Anne Asserson

Keith G JefferyConsultant

[email protected]

Anne AssersonUniversity Library

University of Bergen

[email protected]

Auditing Grey in a CRIS Environment

2-3 Dec 20123 BratislavaAuditing Grey in a CRIS Environment 1

Keith G Jeffery Consultants

© Keith G Jeffery, Anne Asserson

Prologue

• Metadata and data• Real world• ‘library’ metadata: MARC, DC etc• Key dependencies– Functional– Referential

• No AUDIT without QUALITY METADATAAuditing Grey in a CRIS Environment 2

© Keith G Jeffery, Anne Asserson

Structure

3Auditing Grey in a CRIS Environment 2-3 Dec 20123 Bratislava

• Introduction• Reliable Information• Open Data• ENGAGE• Conclusion

© Keith G Jeffery, Anne Asserson

Introduction• The vast majority of (research) information is grey– It is not peer reviewed scholarly publications

• We use information object to mean any digital grey object encoded in any format on any medium– Document, data file, video, software….

• Mechanisms are required to audit grey to assure quality

• We assert that audit of grey requires high quality metadata

Auditing Grey in a CRIS Environment 4

© Keith G Jeffery, Anne Asserson

Reliable Information• Quality– Represents accurately world of interest

• Context– Environment within which collected – related entities

• Persons, organisations, projects, funding, equipment, publications…..

• Availability– Persistence (preservation / curation)– Conditions of use (open access)

Auditing Grey in a CRIS Environment 2-3 Dec 20123 Bratislava 5

We have to encode this as metadata for audit

© Keith G Jeffery, Anne Asserson

Reliable Information: Quality

• Data integrity– Schema– Constraints

• Accuracy, precision• Incomplete and

inconsistent information

• Temporal validity• Independent validation– Quality rating

Auditing Grey in a CRIS Environment 2-3 Dec 20123 Bratislava 6

(With acknowledgements to FINETIK)

© Keith G Jeffery, Anne Asserson

Reliable Information: Context

• Related entities that give confidence that the information of interest is understood in context

• CERIF (Common European Research Information Format)

• EU Recommendation to member states• Used in 42 countries• National standard in 10• Maintained, developed, promoted by euroCRIS

(not for profit) www.eurocris.org

Auditing Grey in a CRIS Environment 2-3 Dec 20123 Bratislava 7

© Keith G Jeffery, Anne Asserson

CERIF

Dataset is here

82-3 Dec 20123 BratislavaAuditing Grey in a CRIS Environment

Document is here

© Keith G Jeffery, Anne Asserson

Reliable Information: Availability• Persistence

– Media migration• Who can read a 7 inch floppy

disk? Or a 3420 IBM tape?– Declared syntax and semantics

• Machine readable AND machine understandable

– Preservation of related software• Changing languages, compilers /

interpreters• Changing operating

environment (sequential, parallel, distributed, data dependencies)

• Specifications• Access

– Open– Toll-free (conditions, licences)

Auditing Grey in a CRIS Environment 2-3 Dec 20123 Bratislava 9

© Keith G Jeffery, Anne Asserson

Open Data• Semantic Web• LOD: Linked Open Data• RDF– Triples– Expressed as XML

• Metadata– DC– CKAN

• Most portals clickable lists of datasets

• Most datasets pdf or xls– Essentially documents

• Very little metadata• Metadata ‘flat’ and

poor• Not linked to underlying

research datasets

Auditing Grey in a CRIS Environment 102-3 Dec 20123 Bratislava

Open data implies open access to any digital information object

© Keith G Jeffery, Anne Asserson

Open Data• Semantic Web• LOD: Linked Open Data• RDF– Triples– Expressed as XML

• Metadata– DC– CKAN

• Most portals clickable lists of datasets

• Most datasets pdf or xls– Essentially documents

• Very little metadata• Metadata ‘flat’ and

poor• Not linked to underlying

research datasets

Auditing Grey in a CRIS Environment 11

An Opportunity A Problem2-3 Dec 20123 Bratislava

© Keith G Jeffery, Anne Asserson

The Vision: Metadata for Data Model

DISCOVERY(DC, eGMS…)

CONTEXT(CERIF)

DETAIL(SUBJECT OR TOPIC SPECIFIC)

Generate

Point to

Linked open data

Formal Information

Systems

Auditing Grey in a CRIS Environment 122-3 Dec 20123 Bratislava

© Keith G Jeffery, Anne Asserson

Open Data and The worlds of information processing

LOD, Semantic Web, RDFLOD, Semantic Web, RDFBrowsing, ease of useBrowsing, ease of use

Relational (Links)Relational (Links)Integrity, performanceIntegrity, performance

generategenerateprovide provide

access toaccess to

Example: summary data in semantic web/LOD environment (RDF) with associated processing

Example: research datasets in Relational DB environment with associated analysis, visualisation, data mining ….

Manual downloadManual connection to softwareManual integration

Automated downloadAutomatic connection to software

Automated integrationAuditing Grey in a CRIS Environment 2-3 Dec 20123 Bratislava 13

© Keith G Jeffery, Anne Asserson

Complete ICT environment for research

The Vision: The ModelsComplete cohort of researchers, research managers,

innovators, media

Processing Model

User Model

Data Model

Resource Model

interaction with data, processing, persons

providing what the user requires

representing research

representing ICT

We are We are

talking talking

about about

thisthis

Auditing Grey in a CRIS Environment 2-3 Dec 20123 Bratislava 14

© Keith G Jeffery, Anne Asserson

Conclusion

15Auditing Grey in a CRIS Environment 2-3 Dec 20123 Bratislava

• Architecture underpinning open data with quality research information

• CERIF provides formality and assurance

• Metadata interconvertors : CERIF superset generating the less rich metadata formats: DC, CKAN…

The provision of quality metadata assures quality to be confirmed by audit