neeo project ec final review meeting gateway and portal 23 march 2010 benoit pauwels université...
Post on 31-Dec-2015
218 Views
Preview:
TRANSCRIPT
NEEO project
EC Final review meetingGateway and portal
23 March 2010
Benoit PauwelsUniversité Libre de Bruxelles, Belgium
1
2
• Overview of technical infrastructure
• EO as a network of data providers – descriptive metadata
• EO as a network of data providers – usage statistics
• Added value services• Publication lists• Enriched metadata• Full-text searching• Multilinguality
• Collaboration with RePEc
• EO gateway and portal
Plan
Meresco
Metadata
Harvester
Objects
HTTP
Crawler
Metadata
Lucene
EO portal Homemade - FOSS
Exporter engineHomemade - FOSS
Logs
OAI-PMH
OAI-PMH RSS/Atom
Other portals
SRU
RePEc
SRU
Enrichment service
OA
I-PM
H
DIDL / MODS SWUP
4
Descriptive metadata exchange format
Desired EO functionality Technical decision
Facetted search&find experience Normalized/normalizable metadata
APA formatted citations Granular metadata
Publication list per EO author Unambiguous identification of authors
Full text indexing/searching Unambiguous links to full texts
Enrichment of metadata (JEL, datasets, citations)
Extensible metadata format
5
• DIDL – XML container structure that can hold semantically distinct metadata• Descriptive, object files (by-ref), splash page, enriched metadata • Based on existing container structure defined by SurfShare
• MODS (3.2) – granular descriptive metadata• Based on existing metadata structure defined by SurfShare
• DAI – Unambiguous identification of authors• National or institution-unique persistent identifier
• Continuous aim of standardization at a level that surpasses the NEEO project• NEEO adaptations fed back to SurfShare
Descriptive metadata exchange format
DIDL[1]
Item[1]
Descriptor/Identifier (persistent identifier)
Item[1..∞] (of type descriptiveMetadata)
Descriptor/type (« descriptiveMetadata »)
Component/Resource -- representation by value (XML)
Item[0..∞] (of type objectFile)
Component/Resource -- representation by ref. (URL)
Descriptor/modified
Descriptor/Identifier (persistent identifier)
Descriptor/modified
Descriptor/type (« objectFile »)
Descriptor/Identifier (persistent identifier)
Descriptor/modified
Item[0..1] (of type humanStartPage)
Component/Resource -- representation by ref. (URL)
Descriptor/type (« humanStartPage »)
EO descriptive metadata model
• Publication is described as a complex (compound) object– persistent identifier
• Aggregation of 3 types of components– descriptiveMetadata (MODS)– objectFiles– humanStartPage
• Extensible– additional items can be stored within
the complex object
• MODS contains DAI of EO author
• Semantic Web - Linked Data – OAI-ORE ready
7
• Central EO gateway
• DIDL and MODS application profiles• Vocabularies in DIDL and MODS
• Technical guidelines for project partners• All documentation is OA available
• Partner solutions: home-made or with external support
• ARNO home-made• Dspace home-made, AtMire• Eprints home-made, ECS-University Of Southampton• Fedora METS/MODS -> DIDL/MODS• DigiTool METS/MARC -> DIDL/MODS
• All original partners + 2 new partners
Descriptive metadata exchange format
8
• Aim: sustainable solution for big network with many partners
• Decentralized Admin file
• Format XML-RDF | FOAF + NEEO-specific vocabulary• Decentralized file sits on local web server of project partner• Content - information of institution : name, description, ...
- OAI baseURL + OAI sets to harvest- EO authors: DAI, photograph, full name, affiliation
• EO gateway HTTP gets and validates at regular intervals• Used for - information in EO portal screens
- publication lists (match on DAI)- automated harvesting process
Decentralized registry service
9
Usage statistics – EO use case• EO use case: present download rates through EO portal per publication,
scholar, institution
• Normalization of exchange format and communication protocolOAI-PMH exchange of SWUP OpenURL ContextObjects (Scholarly Works Usage Community Profile)
•Special considerations:• Enryption of IP address of requester (MD5)• Filtering out robot requests (list of 50 regular expressions)• Filtering out double clicks
• Similar initiatives come together at Knowledge Exchange workshop, Berlin 29-30 March 2010• JISC (Usage Statistics Review project), Pirus2, SurfSure, Counter, Mesur,
OA-Statistik, Economists Online
10
Usage statistics – implementation status• Central EO Gateway – DoDoCo (Document Download Counter)
• PMH harvesting of SWUP ContextObjects into SQL database• Enrich with information on item, scholar, institution• Web servicelevel (item, scholar, institution) + date range
• Technical guidelines for project partners (OA available)
• Partners
• Implementation - for all major IR platforms- solution for Combined Log Format web logs
• Registration through Admin file• 7 original + 1 new partner
• Not enough data available
• Not visible through EO portal yet, although DoDoCo software is ready
12
• Publication lists
• Per DAI of authors who are registered in Admin file
• SRU extract publications from EO gateway and Format• APA+ in HTML
• with links to full text in EO partner repository• with links to publisher sites (through OpenURL resolution)
• APA in PDF• APA in RTF• RIS• BibTex
Added value services
13
• Enriched descriptive metadata
• JEL classification
• Enrichment service (ES) gets records to be enriched from EO, over SRU• ES creates enrichment record(s), using text mining technology• ES makes enrichment record(s) available to EO, over OAI-PMH• EO harvests enrichment records from ES and integrates into original record• EO reuses enrichment information in its services: index & present
• Bibliographic references
• Through collaboration with RePEc/CitEc
• Visible through EO portal
Added value services
14
• Full-text search service
• Process
• Full-text indexer component in Meresco fetches relevant records from EO Gateway over SRU
• Follow links to PDF object files • Text is extracted from PDF, and added to record through SRU
Update • EO can now index & present
• Prototype exists
• Not yet fully deployed in EO portal
Added value services
15
• Multilinguality (EN, FR, GE, ES)
• Complete EO portal interface• JEL classification• MLIA functionality in EO portal
• Student thesis – Prof. Bouillon (Univ. Of Geneva -- multilingual information processing department )• (uncustomized) Systran and Google Translate show equivalent results
• Contacts with CACAO (also through Europeana)• comes as a complete portal solution, not as an add-in for existing portals
like EO• Considerations:
• Lingua franca in economics = EN• NEEO = NOT research project in linguistics, aim: reuse best existing
technology Use “Google Translate” for translation of queries
Added value services
16
• Harvesting metadata from RePEc into EO• AMF to DIDL/MODS mapping
• Push metadata from EO to RePEc• “RePEc:ner” archive, with separate series for each EO institution• According to agreed-upon reviewed ReDIF format
Admin file directives in order to limit overlap
• Contribute to LogEc
• Reuse CitEc data in EO portal
Collaboration with RePEc
17
• Gateway – metadata store and search engine • Choice between Summa, SOLR/Lucene, Meresco• Open source solution, based on Lucene search engine • Support available from software developers (CQ2 company)• Has proven its qualities in the past (DARENet)
• Portal• First version: home-made• Final version:
• outsourced design to private company• HTML, CSS, JavaScript, all images
EO gateway and portal
top related