esa-saps: science archives publication system

1
ESA SAPS Science Archives Publications System European Space Astronomy Centre, Madrid, Spain Pedro Osuna (ESA) Stratos Gerakakis (Planetek Hellas-NOA) [email protected] ADASS XXIV October 5-9, 2014 Calgary, Canada Scientific data from many of ESA’s Space Missions are archived at the European Space Astronomy Centre’s system of Scientific Archives. Those archives store and provide access to data from astronomy, planetary and solar missions. (Visit the online website at http://archives.esac.esa.int/ ) Scientists make use of the scientific data produced by the missions to publish their findings in peer reviewed scientific literature. The data used to produce a particular scientific paper is often not routinely recorded, although for many missions, authors are requested to provide this information in the paper. Currently, the link between papers’ bibcodes and the observational data used has only been made systematically for some of ESA’s scientific missions by reading the papers and recording identifiers to the data used (referred to as the “OBSID”). Once these links have been established, it is possible to gain valuable insights into the scientific productivity of a mission. Interested parties can investigate which scientific areas that are being contributed to, how the scientific productivity is evolving with time, the delay between making an observation and publication, the number of new scientists Objectives The main objective of the activity is to develop a system that can provide information on the scientific performance of ESA’s operating missions by examining the publications and the observational data used to produce them. This will be performed by providing: a human user interface, allowing information from publications in the archive and the associated archival data to be presented. a human interface allowing the listed publications to be selected using various criteria which may be mission dependent a human interface to allow standard statistical sum- maries to be produced for the selected publications. a human interface that will allow the production of on-the-fly statistics on the scientific publications and any parameter in the associated archived data a machine interface that will allow the ESAC Science Archives to make the necessary queries to the system and retrieve relevant relation of observational data and papers to be shown within the archives contribution to the ADS tagging effort for Linking Literature and Data High Level Overview High Level Overview 1.Consumes PDF files Groups of zipped PDFs Excel files with URLs of PDFs 2.Classifies Automatically detects Observations in the PDFs Requests Human Intervention only if unsure about the detection 1. Reports Web based Search page Full text searching Faceted searching OLAP reports 2. Integrates ESAC Mission Archives AIO Machine to Machine RESTfull API server Architectural Design Architectural Design User uploads Publications System tries to locate in the publication references to Observation IDs If none found, it tries to locate references to Observation Dates Dates are filtered to remove invalid matches They are scored according to location and matching keywords in the surrounding text An aggregated score is calculated for each Date Top scoring Observation Dates are used to pull Observations performed in those Dates These Observations are scored according to references in the publication (instrument names, targets etc) The scored Observations are displayed in Mission Administrator's dashboard The Mission Administrator: Approves or Rejects Observation suggestions Can manual specify Observations missed by automatic parsing. Can specify threshold limits for scored Observations so verification process can be automated 1. Project based on Java Spring Boot Framework 2. Front-end build with Java Google Window Toolkit (GWT) Twitter Bootstrap 3. Indexing / Searching provided by ElasticSearch 8. Charting HighCharts 9. Development Stack Eclipse IDE Apache Tomcat Maven Jenkins CI Trello Project management 10.Powered by Lots and lots of developer love 4. PDF parsing PDFBox TIKA 5. Reporting – OLAP Cubes JasperServer 6. Backend Datastore PostgreSQL 7. Communication between services RESTful calls with JSON payloads Classification Workflow Classification Workflow Log output from Observation ID and Observation Date matching showing a list of the matches and their relative location in the surrounding text. (WIP) using a mission, whether the data for a publication is obtained from an archive of via a successful response to an observing Announcement of Opportunity, the nationalities of the authors and so on. For multi-instrument missions the productivity of the different instruments and their different operation modes, if applicable, can be assessed. These insights are useful to a mission’s Project Scientist, management and those involved in the selection of ESA’s future science missions. Some of the missions already provide links to their publications with some relevant information extracted (see e.g., http://herschel.esac.esa.int/hpt ). In some cases the archives provide links from the literature to the observational data. (i.e. the XMM- Newton Science Archive, XSA, see http://archives.esac.esa.int/xsa/ ). ESA has awarded a contract for under geo-returned countries to the consortium Planetek Hellas - National Observatory of Athens, for the building of a system that will allow to homogeneously extract and classify relation of paper published information with observational data from ESA space based missions. This poster presents the main characteristics of this system: the ESA SAPS (Science Archives Publications System).

Upload: planetek-italia-srl

Post on 21-Jun-2015

90 views

Category:

Presentations & Public Speaking


0 download

DESCRIPTION

Poster presentation of ESA-SAPS: Science Archives Publication System at ADASS XXIV held in Calgary, Canada, October 5-9, 2014. (http://www.adass2014.org/) This poster presents the main characteristics of this system: the ESA SAPS (Science Archives Publications System), a system that will allow to homogeneously extract and classify relation of paper published information with observational data from ESA space based missions. ESA has awarded a contract for under geo-returned countries to the consortium Planetek Hellas - National Observatory of Athens, for the building of this system. Details of the ESA-SAPS project: http://www.planetek.it/progetti/esa_science_archives_publication_system

TRANSCRIPT

Page 1: ESA-SAPS: Science Archives Publication System

ESA SAPSScience Archives Publications System

European Space Astronomy Centre, Madrid, SpainPedro Osuna (ESA)

Stratos Gerakakis (Planetek Hellas-NOA)[email protected]

ADASS XXIVOctober 5-9, 2014Calgary, Canada

Scientific data from many of ESA’s Space Missions are archived at the European Space Astronomy Centre’s system of Scientific Archives. Those archives store and provide access to data from astronomy, planetary and solar missions. (Visit the online website at http://archives.esac.esa.int/ ) Scientists make use of the scientific data produced by the missions to publish their findings in peer reviewed scientific literature. The data used to produce a particular scientific paper is often not routinely recorded, although for many missions, authors are requested to provide this information in the paper. Currently, the link between papers’ bibcodes and the observational data used has only been made systematically for some of ESA’s scientific missions by reading the papers and recording identifiers to the data used (referred to as the “OBSID”). Once these links have been established, it is possible to gain valuable insights into the scientific productivity of a mission. Interested parties can investigate which scientific areas that are being contributed to, how the scientific productivity is evolving with time, the delay between making an observation and publication, the number of new scientists

Objectives

 The main objective of the activity is to develop a system that can provide information on the scientific performance of ESA’s operating missions by examining the publications and the observational data used to produce them.  This will be performed by providing: 

● a human user interface, allowing information from publications in the archive and the associated archival data to be presented.

● a human interface allowing the listed publications to be selected using various criteria which may be mission dependent

● a human interface to allow standard statistical sum- maries to be produced for the selected publications.

● a human interface that will allow the production of on-the-fly statistics on the scientific publications and any parameter in the associated archived data

● a machine interface that will allow the ESAC Science Archives to make the necessary queries to the system and retrieve relevant relation of observational data and papers to be shown within the archives

● contribution to the ADS tagging effort for Linking Literature and Data

High Level OverviewHigh Level Overview

1.Consumes● PDF files● Groups of zipped PDFs● Excel files with URLs of

PDFs

2.Classifies● Automatically detects

Observations in the PDFs● Requests Human

Intervention only if unsure about the detection

1. Reports● Web based Search page● Full text searching● Faceted searching● OLAP reports

2. Integrates● ESAC Mission Archives AIO● Machine to Machine

RESTfull API server

Architectural DesignArchitectural Design

● User uploads Publications● System tries to locate in the publication references to

Observation IDs● If none found, it tries to locate references to Observation Dates

● Dates are filtered to remove invalid matches● They are scored according to location and matching keywords

in the surrounding text● An aggregated score is calculated for each Date

● Top scoring Observation Dates are used to pull Observations performed in those Dates

● These Observations are scored according to references in the publication (instrument names, targets etc)

● The scored Observations are displayed in Mission Administrator's dashboard

● The Mission Administrator:● Approves or Rejects Observation suggestions● Can manual specify Observations missed by automatic

parsing.● Can specify threshold limits for scored Observations so

verification process can be automated

1. Project based on● Java● Spring Boot Framework

2. Front-end build with● Java● Google Window Toolkit (GWT)● Twitter Bootstrap

3. Indexing / Searching provided by● ElasticSearch

8. Charting● HighCharts

9. Development Stack● Eclipse IDE● Apache Tomcat● Maven● Jenkins CI● Trello Project management

10.Powered by● Lots and lots of developer love

4. PDF parsing● PDFBox● TIKA

5. Reporting – OLAP Cubes● JasperServer

6. Backend Datastore● PostgreSQL

7. Communication between services● RESTful calls with JSON payloads

Classification WorkflowClassification Workflow

Log output from Observation ID and Observation Date matching showing a list of the matches and their relative location in the surrounding text. (WIP)

using a mission, whether the data for a publication is obtained from an archive of via a successful response to an observing Announcement of Opportunity, the nationalities of the authors and so on. For multi-instrument missions the productivity of the different instruments and their different operation modes, if applicable, can be assessed. These insights are useful to a mission’s Project Scientist, management and those involved in the selection of ESA’s future science missions.

Some of the missions already provide links to their publications with some relevant information extracted (see e.g., http://herschel.esac.esa.int/hpt). In some cases the archives provide links from the literature to the observational data. (i.e. the XMM-Newton Science Archive, XSA, see http://archives.esac.esa.int/xsa/).

ESA has awarded a contract for under geo-returned countries to the consortium Planetek Hellas - National Observatory of Athens, for the building of a system that will allow to homogeneously extract and classify relation of paper published information with observational data from ESA space based missions. This poster presents the main characteristics of this system: the ESA SAPS (Science Archives Publications System).