nvo summer school, september 20061 data access layer servers nvo summer school, aspen sept. 2006...

38
NVO Summer School, September 2006 1 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) National Virtual Observatory

Upload: amber-mcleod

Post on 27-Mar-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 1

Data Access Layer ServersNVO Summer School, Aspen Sept. 2006

Doug Tody (NRAO)

US National Virtual Observatory

Page 2: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 2

Data Access Layer (DAL) Services

• Goals– Understand what the DAL services are … and what is involved to implement them

• Agenda– Review current and planned DAL services– Introduce options/issues faced in implementing

the DAL services

Page 3: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 3

Current and Planned DAL Services

• Dataset Generic dataset, complex data aggregatesand associations (proposed)

• Cone (SCS) Catalog data (released)• SIAP V1.0 Image data (released)• SSAP 1D Spectra (near PR; 2nd gen DAL prototype)• SLAP Spectral line lists (near PR)

• [S]TAP Table/Catalog access (proposed)• SSAP followon Spectral Energy Distributions (SEDs)• SSAP followon Time series• SIAP V2.0 Major upgrade - cube data etc.• SNAP Numerical Models / Theory data

Page 4: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 4

Major elements of a DAL service

• Discovery query (queryData)– Discover data matching query– Access metadata ("headers") for candidate datasets– Negotiate contract for virtual data generation

This is a web/database type operation

• Data access (getData; acref URL)– Retrieve selected datasets (URL-based)– May be archival data, or virtual data computed on the fly– In general dataset may be computed, like a CGI web page

This is numerical/scientific computing type operation

• Interface– RESTful; only parameter based currently available– Syntax-based query (ADQL/SQL) will be added as option– SOAP will be added but RESTful interface will be retained

Page 5: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 5

Simple Cone Search

• Summary– Simplest possible access to astronomical catalogs– By far the most widely implemented VO data service– Prototypical DAL service

• Query Parameters– RA, DEC Position on the sky (J2000, DDEG)– SR Search radius (DDEG)– VERB Verbosity (levels 1-3, optional)

• Query Response– VOTable UCDs describe columns

Page 6: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 6

Simple Image Access (SIA V1.0)

• Summary– Uniform access to 2+ dimensional images

• Basically 2-D, but data model and interface are more general

– Same service profile as Cone, but adds getData• The query is now used for data discovery instead of

data access as for Cone; data access is a separate operation

– Prototype for 2nd generation DAL interfaces• Data models, multiple output formats,

virtual data generation, etc.

Page 7: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 7

SIA Concepts

• Types of Services– Atlas Precomputed survey image (entire image)– Pointed Image from pointed observation (entire image)– Cutout Cutout existing image (pixels unchanged)– Mosaic Reprojected image (pixels resampled)

• Virtual Data– Data model mediation– Subsetting, filtering, transformation, etc. on the fly– Possible to view same data in different ways

• SIA data model is the familiar "astronomical image"– Generally this means a 2D sky projection, but cubes too– Data array is logically a regular grid of pixels– Encoded as a FITS image, GIF/JPEG, etc.

Page 8: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 8

SIA Input Parameters

• Required parameters– POS center of ROI (ra, dec decimal degrees ICRS)– SIZE width; or width, height– FORMAT ALL, GRAPHIC, image/fits, image/jpeg,

text/html,… FORMAT=metadata returns service metadata

• Optional parameters– INTERSECT values: covers, enclosed, center, overlaps– VERB table verbosity

• Service-defined parameters– used to further refine queries, but not yet standardized

• e.g., BAND, SURVEY, etc.

• Image generation parameters– NAXIS, CFRAME, EQUINOX, CRPIX, CRVAL, CDELT, ROTANG, PROJ

• used for cutout/mosaic services to specify image to be generated

Page 9: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 9

SIA Query Response

• Output is a VOTable– Must contain a RESOURCE element with tag="results",

containing the results of the query.

• The ‘results’ resource contains a single table– Each row of the table describes a single data object which can

be retrieved.

• The fields of the table describe the attributes of the dataset– These are the attributes of the SIA data model– In SIA 1.0, the UCD is used to identify the data model attribute

• e.g., POS_EQ_RA_MAIN, VOX:Image_Scale, etc.

Page 10: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 10

SIA Query Response

• Image metadata– Describes the image object (required)

• Coordinate system metadata– Image WCS

• Spectral bandpass metadata– Prototype data model describing spectral bandpass of image

• Processing metadata– Tells whether the service modified the image data

• Access metadata– Tells client how to access the dataset (required)

• Resource-specific metadata– Additional optional service-defined metadata describing image

Page 11: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 11

SIA Image Metadata (UCDs)

VOX:Image_Title Brief description of image

POS_EQ_RA_MAIN Ra (ICRS)POS_EQ_DEC_MAIN Dec (ICRS)INST_ID Instrument nameVOX:Image_MJDateObs MJD of observationVOX:Image_Naxes Number of image axesVOX:Image_Naxis Length of each axisVOX:Image_Scale Image scale, deg/pixVOX:Image_Format Image file format

Page 12: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 12

Page 13: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 13

Image Retrieval

• Retrieval is optional– Typically only a fraction of the available images are retrieved

• Based on query response– If an access reference is provided, the data can be retrieved– SIAP can also be used to describe data which is not online– The same data may be available in multiple formats

• Image retrieval– Very simple; access reference is a URL– Standard tools can be used to fetch the data

• (browser, wget, curl, i/o library, etc.)– Data is often computed on-the-fly– All retrieval is synchronous (currently)– No provision for restricting access (currently)

Page 14: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 14

Simple Spectral Access (SSA)

• Summary– Uniform access to 1-D spectra

• Can also handle spectral aggregates via association• Support for SEDs and time series will be added

– First of the 2nd generation DAL interfaces• Basic approach does not change (queryData, getData)• Query interface and metadata are generalized• SIA upgrade (etc.) will share the same basic interface

– Includes a standard data model for spectral datasets• Needed, as there is no standard way to represent spectra• Standard serializations are defined (VOTable, FITS, etc.)• Returned data is typically generated on the fly• External stored spectra may be in any form

Page 15: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 15

SSA Interface Overview

• Service Operations– queryData Discovery query– (getData) URL-based currently, as for SIA– (stageData) Reserved; used to asynchronously stage

data– getCapabilities Query service metadata and capabilities

• Complexity– Basic usage is quite simple

• queryData; examine VOTable• fetch data by access reference URL

– Basic Spectrum object• general metadata ("header")• spectral coordinate vector• flux vector• optional error vector

– Formats• VOTable, FITS, XML, etc.; user or service choice

Page 16: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 16

SSA Query Interface

• Mandatory query parameters– POS X, Y, [FRAME (ICRS)]– SIZE diameter (decimal degrees)– BAND spectral region (1-2 num or name)– TIME date1/date2 (ISO8601)– FORMAT VOTable, FITS, XML, text, graphics,

html, native

Page 17: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 17

SSA Query Interface

• Optional query parameters– specres minimum spectral resolution (L/dL)– spatres minimum spatial resolution (DDEG)– timeres minimum time resolution (seconds)

– SNR minimum SNR– redshift redshift interval (1-2 decimal values) – targetname target name, e.g., "mars"– targetclass target class, e.g., star, QSO, AGN, etc.

Page 18: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 18

SSA Query Interface

• Optional query parameters– pubDID publisherID string – creatorDID creatorID string– collection collection ID (shortName, minimum match)

– top max top-ranked entries to be returned– token continuation token for multipage querys– maxrec maximum records in query response– mtime create/modify time in given range

(ISO8601)– runid passed on to any other services– compress enable compression

Page 19: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 19

SSA Query Response

• Classes of Query Metadata– Query Describes the query itself– Association Logical associations (aggregation)– Access Access metadata for data retrieval

– Dataset General dataset metadata (type etc.)– DataID Dataset identification - what is it– Curation How data is published and made available– Target Astronomical target observed, if any– Derived Derived quantities (SNR, redshift, etc.)

– Char.Coverage Coverage of spatial, spectral, time axes– Char.Accuracy Calibration, resolution, sampling, errors– CoordSys Coordinate system reference frames (STC)

Page 20: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 20

SSA Query Response

• Query Metadata– Query.Score Degree of match to query params– Query.Token Step through large query response

• Association Metadata – Association.Type Type of association– Association.ID Instance ID linking associated records– Association.Key Unique key identifying each member

• Access Metadata– Access.Reference URL of data product to be retrieved– Access.ServiceDID DataID of virtual data product– Access.Format MIME type of dataset– Access.Size approximate dataset size (bytes)

Page 21: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 21

SSA Query Response

• DataID - Dataset Identification Metadata– DataID.Title One-line description of dataset (String)– DataID.Collection Collection name (shortName)– DataID.Creator Creator of dataset (String)– DataID.CreatorID Identifier for VO Creator (URI)– DataID.CreatorDID Dataset ID assigned by creator (URI)– DataID.CreatorLogo URL for Creator logo (URI)– DataID.Contributor Contributor (may be multiple instances)– DataID.Date Date last modified (ISO Date string)– DataID.Version Version of dataset instance (String)– DataID.Instrument Instrument description (String)– DataID.Bandpass Spectral bandpass, e.g., filter (String)– DataID.DataSource Original source of data (String)– DataID.CreationType How was dataset created (String)

Page 22: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 22

Some SSA Concepts

• DataSource– survey, pointed, theory, artificial

• CreationType– native, archival, cutout, filtered, mosaic, projection,

spectral extraction, catalog extraction, etc.

• Provenance– Where did this data come from?

• especially important for virtual data generated by service– DataID (Collection, CreatorDID, etc.) refers to original data– Curation (PublisherDID etc.) refer to data from service– CreationType indicates how the data was derived

Page 23: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 23

Some SSA Concepts

• Associations– Use association metadata to link related records (datasets)– An association is a “complex dataset”

• Data Models– Data models formalize the “content” of data or metadata– Container/component architecture

• Component data models aggregated in a container and associated logically (similar to a relational database)

– Dataset, Spectrum, Characterization, STC, etc.

• Characterization– Physically characterize the data

• Spatial, spectral, and temporal axes• Coverage, sampling, resolution, accuracy

– Applies to any dataset (not specific to spectra)

Page 24: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 24

Page 25: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 25

SIA Upgrade Preview (SIA V2.0)

• Main objectives– Upgrade metadata, query interface as for SSA

• standard generic dataset metadata• more powerful query interface• more comprehensive output metadat

– Precision image data access enhancements• e.g., cube data, image slicing, projection, filtering• (TBD whether this is folded into “basic SIA” or done as a

separate service class)

– Advanced service capabilities• versioning, metadata query• asynchronous data staging, authentication, VOStore

integration

Page 26: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 26

Cube Data

• Overview– Motivated primarily by radio data surveys (CGPS, Arecibo)– Many O/IR integral field unit (IFU) instruments coming online as

well– Challenge: datasets can be both large and complex

• Large datasets– Current data cubes are several hundred MB up to several GB– Future wide-field wide-band: 2048x2048x8192x4 = 128 GB– With polarization, multiple bands, could have 1/2 TB datasets!

• Complex datasets– e.g., CGPS: HI cube, CO cube, continuum, IQUV, IRAS same field– Multiple ways to view the same data– Multi-band surveys are a simpler example of this trend

• Use-Cases for recent study– CGPS, SGPS, GALFA (Arecibo), SINFONI (ESO IFU)

Page 27: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 27

Page 28: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 28

Cube Data

• Data access considerations– Network download of large cubes can be impractical– VO-style virtual data access to remote data is required

• subsetting, filtering (spectral or time regions), transformations (projections, spectrum extraction)

– Strategy: iteratively download data subset, visualize locally

• Typical access modes– Whole image– Spectrum extraction– Cutout 2D planes– Cutout 3D sub-cube (permits local full 3D analysis)– 2D projection along one axis– 3D projection (general 3D transformation)– 2D slice through 3D cube at arbitrary 3D pos,orientation

Page 29: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 29

Cube Data

• Typical access scenario– Discovery query to discover data, get access metadata– Access query to set up virtual data access (WCS based)– Data access, dynamically generating virtual data– Repeat for a different region or view

• Example: Compute 2D projection with spectral filtering– View 2D preview or projection, e.g., continuum– Extract 1D spectra in sky regions (SSA with synthetic aperture)– Analyze sky spectrum to determine night sky lines (SLAP)– Compute 2D projection of cube excluding sky emission,

absorption

• Other examples– Extract 3D sub-cube for full 3D analysis locally– 2D slice at arbitrary position and orientation

Page 30: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 30

Cube Examples

• Extract 2-D plane from cube, same orientation– queryData

• PubDID=<desired cube dataset>• POS=<center of 2-D plane>• SIZE=<spatial extent of 2-D plane>

– (cutout of smaller region also possible here)

• BAND=<spectral-coord of desired plane>• NAXES=2• FORMAT=FITS

Page 31: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 31

Cube Examples

• 2-D Projection with spectral filtering– queryData

• PubDID=<desired cube dataset>• POS=<center of 2-D plane>• SIZE=<spatial extent of 2-D plane>

– (cutout of smaller region also possible here)

• BAND=<range-list of “good” spectral regions>• NAXES=2• FORMAT=FITS

(in SINFONI case original cube is in Euro-3D format)

Page 32: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 32

Cube Examples

• Extract 3-D Sub-Cube– queryData

• PubDID=<desired cube dataset>• POS=<spatial center of region>• SIZE=<spatial extent of sub-cube> • BAND=3.45E-7/8.76E-6• NAXES=3• FORMAT=FITS

Page 33: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 33

Implementing DAL Services

• Overall Process– Determine what subclass of service to implement

• do we return whole files, cutouts, extract spectra, etc.?

– Select service technology• Java, dotNet/Mono, Ruby, etc.

– Implement• Reference code or a template would be useful here

– Test• Service verification tools

– Register• As soon as you do this you are online!

Page 34: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 34

Cone Search

• queryData operation– SQL “select” operation on a RDBMS– Transform output into VOTable format

• a VOTable package can be useful here

• Issues– May need to assign UCDs to your catalog fields

Page 35: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 35

Simple Image Access

• queryData operation– Select operation on a RDBMS– Compute SIA query response metadata– Transform output into VOTable format

• Issues– Computing the SIA query response metadata can be nontrivial

• e.g., for a cutout or mosaic• don't forget you should return WCS information

– Metadata generation• This is much easier if image metadata is cached in DBMS• For virtual data must compose access reference command

Page 36: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 36

Simple Image Access (cont’d)

• getData operation– Atlas, Pointed

• only input is an access URL pointing to the file• return FITS file

– Cutout, Mosaic• access URL is the command which generates the virtual data• may require significant, complex computation!

• getCapabilities– For SIA V1.0 this is FORMAT=metadata– Tells client service capabilities and any optional

parameters

Page 37: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 37

Implementing DAL Services

• Web Service Frameworks– LAMP - Linux, Apache, MySQL, Python/Perl/PHP etc.

• Apache Web server, Tomcat, Java servlets– dotNET/Mono

• Microsoft approach; SQL server, C#– Ruby on Rails

• Trendy new alternative

• Virtual Data Generation– Backend may require significant computation

• Re-use some science package (IRAF, IDL, AIPS, CASA, etc.)• Or at least CFITSIO, WCSTOOLS, and other libraries

Page 38: NVO Summer School, September 20061 Data Access Layer Servers NVO Summer School, Aspen Sept. 2006 Doug Tody (NRAO) US National Virtual Observatory

NVO Summer School, September 2006 38