© keith g jeffery, anne g s asserson gl 10 amsterdam 2008 200812 1 keith g jeffery director, it...

25
© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC [email protected] Anne G S Asserson Research Department University of Bergen [email protected] o INTEREST INTERoperation for Exploitation, Science and Technology

Upload: gabriel-brady

Post on 27-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1

Keith G JefferyDirector, IT & International Strategy, STFC

[email protected]

Anne G S AssersonResearch Department

University of Bergen

[email protected]

INTERESTINTERoperation for Exploitation,

Science and Technology

Page 2: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 2

Authors

Anne Asserson UiB

Keith G Jeffery STFC-RAL

Page 3: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 3

Structure• Background• The Hypothesis

• Conclusion

• Remote Wrapper• Local Wrapper• Catalog• Catalog Plus Pull

(ERGO2++)• Full CERIF• Harvesting

Page 4: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 4

Background: GL• Grey literature is important but is only a small

component of the total research information environment and must be seen in context of the overall research process

• Grey literature is a product• To understand the product need to have

information on the sources and the process i.e. the research context

Do not try to obtain information through a ‘fog’ backwards from GL metadata

Get it moving forwards through the research process then much GL metadata derived directly and consistently

Page 5: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 5

Background: Access• Interoperation: homogeneous access to distributed

heterogeneous information– Query against schema (of user)– Translation to other schemas (of sources)– Answer reconciled to original schema (of user)– If common interoperation format n interfaces– If not n(n-1) interfaces

• Utilise one common interoperation format• [Character set, language, syntax, semantics]

• The alternative is ‘google-like’ where the end-user has to do the translations and reconciliations

• This does not scale

Page 6: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 6

Background: Metadata• Grey literature repositories can be

interoperated without CERIF-CRIS using OAI-PMH and DC (OAISTER)

• Grey Literature Repositories provide better recall and relevance when interlinked via CERIF-CRIS – research context

• formal syntax, declared semantics• Metadata

– Schema, Navigational, Associative {descriptive, restrictive, supportive}

• The key to everything is quality metadata– input validation, query/retrieval, relationship

linking, INTEROPERATION

Page 7: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 7

PROJECT

ORGUNIT

Skills

CV

GeneralFacility

ParticularEquipment

ContactResults

PublicationResultsPatentResultsProduct

Service

FundingProgramme

Event

ClassificationPrize/Award

PERSON

CERIF: EU Recommendation to Member States

Background

Page 8: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 8

Result PublicationInstance Diagram

Person A

Publication X

OrgUnit O

OrgUnit M

OrgUnit N

Project P

member

member

employee

Part of

Part of

owns IPR

author

Project leader

Metadata in CERIF-CRIS much richer than usual repository

Page 9: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 9

CERIF- CRIS + Repositories at 1 institution

CRISResearch Context

[projects, persons, organisational unitsfunding, products, patents, publications

facilities, equipment, events]

OA Repository(hypermedia) Documents

e-Research repositoryDatasets and Software

OAI-PMH

Various

protocols

End-User

CERIFCERIF

Page 10: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 10

….and multiple institutions

CRIS

OA repository

e-Researchrepository

CRIS

OA repository

e-Researchrepository

CRIS

OA repository

e-Researchrepository

End-User End-User End-User

Institution A Institution B Institution C

Page 11: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 11

Hypothesis• Comparison of possible architectures

for interoperation of grey repositories– (of publications or data and software)

• Leads inexorably to === • CERIF should be used either :

– as the native storage format, – as the storage format of a derived data

warehouse (transformed copy of the CRIS)

– as the export format converted from the CRIS native format using a wrapper.

Page 12: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 12

Remote Wrapperuser

dispatcheraddresses

receiveraddresses

Query convertor

answer convertor

Queryschema

receiver dispatcheraddresses

Query form

receiver

answer convertor

Queryschema

dispatcher

network

integrationschemas

Presentation convertor

Presentation form

<<<<Non-CERIF CRISs>>>>

LAN

Query convertor

Page 13: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 13

Remote Wrapper• the user needs only web browser and

simple query form

• the host has to write query converter

• the host has to write answer (XML?) converter (to a specific XML DTD?)

• the query expressivity is very limited

• the user client has to write an integrator for the answers

Page 14: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 14

Local Wrapper

integration

user

Query convertor Presentation convertor

Query

schemas

Query form Presentation form

schemas

dispatcher receiveraddresses

receiver dispatcheraddresses receiver dispatcheraddresses

network

<<< non-CERIF CRISs >>>>>

LAN

Page 15: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 15

Local Wrapper• each host has only to supply and update its schema to the

client (all clients if there is not a central query server)• each host has no software to provide except receiver and

dispatcher• the client (if it is a central service) has a very large

workload• if there is no central service then each client has to have

all schemas supplied and updated• the client software has to include a complex query refiner• the client software has to include multiple complex query

converters• the client software has to include a complex answer

integrator• the client software has to include a presentation converter

(complexity depends on specification of presentation required and complexity of the answer structure)

Page 16: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 16

Catalog

Retrieve phase by user

User phase1

Hit list

CERIF Metadata Catalog

Query form

Query

CERIF Metadata Catalog

receiver

convertor

Query (standard)

schema

dispatcherCRIS

network

loader

Construction phase from each host

LAN

user

Page 17: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 17

Catalog• simple query on union catalog (which

may be centralised or replicated)

• possibly not all required entities and attributes in catalog

• effort to populate catalog; requires converter at each host to supply CERIF metadata

Page 18: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 18

Catalog Plus Pull (ERGO2++)User phase1

Hit list processing

CERIF Metadata Catalog

receiver dispatcheraddresses receiver dispatcher

addresses

network

Query form

Query

dispatcher receiveraddresses

Unique id query

Unique id query

User phase2

<<< non-CERIF CRISs >>>>>

LANPresentation form

Page 19: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 19

Catalog Plus Pull (ERGO2++)• advantage of simplicity as for catalog-only

architecture• advantage of additional information

provision• disadvantage that additional information is

heterogeneous (unless converted to CERIF export data model)

• disadvantage of hosts having to maintain entries representing their database content in the CERIF metadata catalog

Page 20: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 20

Full CERIFuser

dispatcher receiveraddresses

Query

receiver dispatcheraddresses

Query

receiver dispatcheraddresses

network

Query form Presentation form

<<<<< CERIF CRISs >>>>>

LAN

Page 21: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 21

Full CERIF• very simple and easy to use for the

end-user

• each host has to either run a full CERIF model database or provide a full CERIF model version of the host database

Page 22: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 22

Harvesting (construction phase)

Crawling robot

Catalog of documents with associative descriptive metadata

Html pages

converter converter converter converter

CRIS non-CRIF CRISs << << >>>>

CRIS CRIS CRIS

Html pages Html pages Html pages

network

network

Page 23: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 23

Harvesting (search phase)User phase1

Hit list processing

HarvesterAssociative descriptive metadata catalog

receiver dispatcheraddresses receiver dispatcher

addresses

network

Query form

Query

dispatcher receiveraddresses

URL query

URL query

User phase2

Html pages from CRIS

Html pages from CRIS

network

LANPresentation form

Page 24: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 24

Harvesting• The host has to provide a copy of the database as

webpages to be available to the search robot and subsequent accesses based on clicks from URL of metadata.

• The query is based on existence of term(s); constraining by entity or attribute is not possible (without sophisticated xml form processing).

• The results are unstructured and one page at a time (click on URL in metadata catalog to see page); this inhibits statistical processing or report generation.

• It is easy to implement and maintain (although the database may be ~2 weeks out of date) and has a familiar interface for many WWW users.

Page 25: © Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 1 Keith G Jeffery Director, IT & International Strategy, STFC keith.jeffery@stfc.ac.uk

© Keith G Jeffery, Anne G S Asserson GL 10 Amsterdam 2008 200812 25

Conclusion

To interoperate grey repositories link to a CRIS

Best: Full CERIF architecture

Else: wrap CRIS to interoperate using CERIF