nvo summer school, aspen 9-sep-20051 data access layer doug tody (nrao) us n ational v irtual o...

Post on 27-Mar-2015

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

NVO Summer School, Aspen 9-Sep-2005 1

Data Access LayerDoug Tody (NRAO)

US NATIONAL VIRTUAL OBSERVATORY

NVO Summer School, Aspen 9-Sep-2005 2

Data Access Layer

• What does it do?– Provides access to data

• data discovery• mediation to a standard model• data retrieval• on-demand data generation• server-side computation (subsetting, filtering)

• What is it for?– Supports client data analysis

• distributed, multiwavelength

• How does it work?– Object (dataset) oriented

• catalog, image, spectrum, time series, SED, etc.– Services

• cone search (also SkyNode), SIA, SSA

NVO Summer School, Aspen 9-Sep-2005 3

Cone Search

NVO Summer School, Aspen 9-Sep-2005 4

Cone Search

• Provides basic catalog access– Query by position and aperture (cone in space)– Query consists of base-URL (service endpoint) plus parameters

• e.g., http://base-url %RA=12.0&DEC=0.0&SR=1.0– Catalog returned as a VOTable

• Advantages– Simple but powerful, provides standard interface– Easy to implement and use

• Limitations– Catalog metadata is not defined– No data model support

• Future– Supplanted by basic SkyNode (Greene, Saturday)– Supports metadata discovery, SQL-like syntactical queries– We will continue to support the basic cone search query

however!

NVO Summer School, Aspen 9-Sep-2005 5

Simple Image Access

NVO Summer School, Aspen 9-Sep-2005 6

Simple Image Access (SIA)

• Basic Usage, Highest Level– Client queries Registry to find interesting

services– Each service is queried (in turn or

simultaneously) for data– Client collates and analyzes results– Selected datasets are retrieved

NVO Summer School, Aspen 9-Sep-2005 7

Simple Image Access (SIA)

• Basic Usage, Single Service– Query

• find data of interest from a single service• http://base-url

%POS=12.0,0.0&SIZE=0.2&FORMAT=image/fits

– Query response• VOTable, one row per candidate dataset• "access reference" (a URL) points to data

– Data selection• Performed by the client using query response metadata

– Dataset retrieval• Retrieve actual datasets, if any

NVO Summer School, Aspen 9-Sep-2005 8

Service Capabilities

• Types of Services– Atlas Precomputed survey image (entire image)– Pointed Image from pointed observation (entire

image)– Cutout Cutout existing image (pixels unchanged)– Mosaic Reprojected image (pixels resampled)

• Virtual Data– Data model mediation– Subsetting, filtering, etc. on the fly– Possible to view same data in different ways

• Interface– RESTful interface currently (HTTP GET)– Document oriented (VOTable, FITS, JPEG, etc.)

NVO Summer School, Aspen 9-Sep-2005 9

Data Model

• SIA data model is the familiar "astronomical image"– Generally this means a 2D sky projection– Data array is logically a regular grid of pixels– Encoded as a FITS image, GIF/JPEG, etc.

• Standardized dataset metadata– Provenance– Image geometry– Scale– Format– Position, WCS– Time of observation– Spectral bandpass– Access information

NVO Summer School, Aspen 9-Sep-2005 10

Input Parameters

• Required parameters– POS center of ROI (ra, dec decimal degrees ICRS)– SIZE width; or width, height– FORMAT ALL, GRAPHIC, image/fits, image/jpeg,

text/html,…

• Optional parameters– INTERSECT values: covers, enclosed, center, overlaps– VERB table verbosity

• Service-defined parameters– used to further refine queries, but not yet standardized

• e.g., BAND, SURVEY, etc.

• Image generation parameters– NAXIS, CFRAME, EQUINOX, CRPIX, CRVAL, CDELT, ROTANG, PROJ

• used for cutout/mosaic services to specify image to be generated

NVO Summer School, Aspen 9-Sep-2005 11

Query Response

• Output is a VOTable– Must contain a RESOURCE element with tag="results",

containing the results of the query.

• The ‘results’ resource contains a single table– Each row of the table describes a single data object which can

be retrieved.

• The fields of the table describe the attributes of the dataset– These are the attributes of the SIA data model– In SIA 1.0, the UCD is used to identify the data model attribute

• e.g., POS_EQ_RA_MAIN, VOX:Image_Scale, etc.

NVO Summer School, Aspen 9-Sep-2005 12

Query Response

• Image metadata– Describes the image object (required)

• Coordinate system metadata– Image WCS

• Spectral bandpass metadata– Prototype data model describing spectral bandpass of image

• Processing metadata– Tells whether the service modified the image data

• Access metadata– Tells client how to access the dataset (required)

• Resource-specific metadata– Additional optional service-defined metadata describing image

NVO Summer School, Aspen 9-Sep-2005 13

Image Metadata

VOX:Image_Title Brief description of image

POS_EQ_RA_MAIN Ra (ICRS)POS_EQ_DEC_MAIN Dec (ICRS)INST_ID Instrument nameVOX:Image_MJDateObs MJD of observationVOX:Image_Naxes Number of image axesVOX:Image_Naxis Length of each axisVOX:Image_Scale Image scale, deg/pixVOX:Image_Format Image file format

NVO Summer School, Aspen 9-Sep-2005 14

NVO Summer School, Aspen 9-Sep-2005 15

Image Retrieval

• Completely optional– Typically only a fraction of the available images are retrieved

• Query response– If an access reference is provided, the data can be retrieved– SIAP can also be used to describe data which is not online– The same data may be available in multiple formats

• Image retrieval– Very simple; access reference is a URL– Standard tools can be used to fetch the data

• (browser, wget, curl, i/o library, etc.)– Data is often computed on-the-fly– All retrieval is synchronous (currently)– No provision for restricting access (currently)

NVO Summer School, Aspen 9-Sep-2005 16

Service Registration

NVO Summer School, Aspen 9-Sep-2005 17

Future Development

• SIA V1.1– Based on work done on SSA– Expanded query interface

• no longer limited to positional queries– Much richer query response

• generic dataset identification, characterization, etc.• metadata extension mechanism

– Selected features• VOTable 1.1 with UCD 1+, GROUP, UTYPE• query response can be ordered by "score"• logical groupings of related query records• compression support

– Versioning• required to make protocol upgrades manageable

NVO Summer School, Aspen 9-Sep-2005 18

NVO Summer School, Aspen 9-Sep-2005 19

NVO Summer School, Aspen 9-Sep-2005 20

Future Development

• Service verification– for testing at development time– when registered; level of compliance metric

• Grid capabilities – Data staging

• asynchronous image generation (long running jobs)• batch generation of images (multiple images)

– Data management• support for single sign-on authentication, authorization• network data caching, third party delivery (VOStore etc.)

– Web service interface• resource metadata• service availability (etc.)

• ADQL integration– Capability to use query language for queries

NVO Summer School, Aspen 9-Sep-2005 21

Simple Spectral Access

NVO Summer School, Aspen 9-Sep-2005 22

Simple Spectral Access (SSA)

• What is it?– Provides access to 1D spectra, time series, SEDs– Tabular spectrophotometric data (photometry points)– Represents second generation, data model-based DAL

interfaces

• Status– Draft V0.9 query interface reviewed in Kyoto (May 05)– Revisions in progress; draft PR targeted for Madrid (Oct 05)– Much work on data models however still being revised– Some initial prototypes already exist (services, client apps)

• IVOA/Madrid discussions will be held immediately after the ADASS and are open to all

NVO Summer School, Aspen 9-Sep-2005 23

Basic Usage

• SSA specification may be complex, but basic usage is simple

• Simple query– POS, SIZE, FORMAT - like cone search, SIA– Possibly refined by spectral or time bandpass, etc.– Most metadata in query response is optional

• Data retrieval– Simple retrieval is again URL-based– Get back a dataset "document" (VOTable, FITS, JPEG, etc.)– In simplest case could be wavelength, flux as text (for Spectrum)– Pass-through of external data is permitted

• Data Analysis– Standard data model isolates application from quirks of– external project data

NVO Summer School, Aspen 9-Sep-2005 24

Concepts - Dataset-oriented

• Data object type– Spectrum, TimeSeries, SED

• Dataset creation type– Atlas Whole datasets, uniform survey data– Pointed Whole datasets, variable instrumental data– Cutout Subset, data samples are not modified– Resampled Subset, data samples computed by service

• Dataset derivation– Observed An observation– Composite Combination of several observations– Simulated Simulated observation made from real data– Synthetic Data from a theoretical model

NVO Summer School, Aspen 9-Sep-2005 25

Data Models

• Data models used in SSA– Spectral data Spectrum, TimeSeries, SED– Dataset Generic dataset descriptor– Target Astronomical target observed– Curation Origin of data– Characterization Physical characteristics of data– Provenance Instrument which generated the data

• User defined data models– Metadata extension mechanisms

• additional data model attributes (table fields)• additional resources in VOTable, linked back to main table

– Provide a mechanism to "subclass" dataset to tailor it for a given data collection

NVO Summer School, Aspen 9-Sep-2005 26

Spectral Data (SED)

spectrum segment

Photometry point

NVO Summer School, Aspen 9-Sep-2005 27

Spectral/SED Data Model

NVO Summer School, Aspen 9-Sep-2005 28

NVO Summer School, Aspen 9-Sep-2005 29

Query Interface

• Mandatory query parameters– POS RA, DEC (ICRS)– SIZE diameter (decimal degrees)– TIME data1,date2 (epoch in

decimal years UTC) – BAND wave1,wave2 (meters in vacuum;

source or observer)– FORMAT VOTable, fits, xml, text, graphics,

html, external

NVO Summer School, Aspen 9-Sep-2005 30

Query Interface

• Recommended query parameters– APERTURE approx spatial resolution

(decimal degrees)– SPECRES spectral resolution (meters)– TOP number of top-ranked

records to return– OBJTYPE mandatory if service returns

multiple object types– COLLECTION data collection identifier

NVO Summer School, Aspen 9-Sep-2005 31

Query Interface

• Optional parameters– CREATORID creator-assigned dataset identifier (at most 1)– PUBID publisher-assigned dataset identifier (at most N)– COMPRESS enable compression (for both data _and_

queries?)

– SNR signal-to-noise ratio– REDSHIFT redshift range (dlambda/lambda)– TARGETCLASS star, galaxy, pulsar, PN, QSO, AGN, etc.

NVO Summer School, Aspen 9-Sep-2005 32

Query Response

• Classes of query metadata– Query metadata Describes the query itself– Dataset metadata Describes data object; object-specific– Target metadata Astronomical target– Curation metadata External identification of dataset– Characterization Coverage, Accuracy, Frame, etc.– Instrument metadata Service-defined; hard to

standardize– Access metadata Describes how to access the

dataset

NVO Summer School, Aspen 9-Sep-2005 33

Query Response

• Query Metadata– Query.Score How well object matches

query– Query.LName Logical name (identifier)– Query.LNameKey Logical name key (id-ref)

• Example: LName="MyObj123" LNameKey="server,format"

NVO Summer School, Aspen 9-Sep-2005 34

Query Response

• Dataset Metadata– Dataset.Type Spectrum, TimeSeries, SED, etc.– Dataset.DataModel DM name, e.g., "SSA-V0.90"– Dataset.Title Brief descriptive title of dataset– Dataset.SSA.NSamples Total samples in dataset

Dataset.SSA.Aperture Characteristic aperture diameter– Dataset.SSA.TimeAxis TimeCoord axis (external data)– .SSA.SpectralAxis SpectralCoord axis (external

data)– Dataset.SSA.FluxAxis Flux axis (external data)– Dataset.CreationType atlas, pointed, cutout,

resampled– Dataset.Derivation observed, composite,

simulated, synthetic

NVO Summer School, Aspen 9-Sep-2005 35

Query Response

• Target Metadata– Target.Name Name of astronomical object– Target.Class Target class (star, galaxy, QSO, etc.)– Target.SpectralClass Spectral class (e.g., 'O', 'B', etc.)– Target.Redshift Nominal redshift for object– Derived.VarAmpl Variability amplitude (fraction 0-

1)– Derived.SNR Observed signal to noise ratio

NVO Summer School, Aspen 9-Sep-2005 36

Query Response

• Curation Metadata– Curation.Collection Data collection name (identifier)– Curation.Creator Creator identify (identifier)– Curation.CreatorID Creator-assigned dataset

identifier– Curation.PublisherID Publisher-assigned dataset

identifier– Curation.Date Dataset creation date (ISO date string)– Curation.Version Dataset version (within same

ID)

NVO Summer School, Aspen 9-Sep-2005 37

Query Response

• Characterization1 - Coverage– .Location.Spatial Position (e.g., RA, DEC)– .Location.Time Observation time characteristic value– .Location.Spectral Spectral bandpass characteristic

value– .Location.Spectral.BandID Bandpass ID (band or filter name)– .Bounds.Spatial Aperture footprint (polygon on sky)– .Bounds.Time Low/High time values– .Bounds.Spectral Low/High spectral values– .Bounds.Flux Limiting flux, saturation limit (Jansky)– .Fill.Spatial Spatial sampling filling factor (0-1)– .Fill.Time Time sampling filling factor (0-1)– .Fill.Spectral Spectral sampling filling factor (0-1)

NVO Summer School, Aspen 9-Sep-2005 38

Query Response

• Characterization2 - Accuracy– Accuracy.*.Calibrated uncalibrated, relative,

absolute– Accuracy.*.Resolution Resolution of measured

signal

– Accuracy.*.StatErr Statistical error (measured)

– Accuracy.*.SysErr Systematic error (estimated)

('*' = Spatial, Time, Spectral, Flux)

NVO Summer School, Aspen 9-Sep-2005 39

Query Response

• Characterization3 - Reference Frames– Frame.Spatial.Type Coordinate frame (default ICRS)– Frame.Spatial.Equinox Coordinate system equinox

(J2000)– Frame.Time.System Timescale (TT)– Frame.Time.SIDim SI factor and dimension– Frame.Spectral.SIDim SI factor and dimension– Frame.Flux.SIDim SI factor and dimension– Frame.Flux.UCD UCD of flux value (flux type)

(These apply only to the query response)(SIDim metadata still under construction)

NVO Summer School, Aspen 9-Sep-2005 40

Query Response

• Instrument Metadata– Instrument.Name Instrument name (identifier)– Instrument.Exposure Total exposure time (seconds)– Instrument.<other> Service-defined

• Notes– Optional; provided for instrumental data collections– In general, Collection, Bounds.Time, etc. are preferred– In general Instrument metadata is service-defined– Use Observation model as a starting point

NVO Summer School, Aspen 9-Sep-2005 41

Query Response

• Access Metadata– Access.Reference Data access URL– Access.Format MIME type of returned

dataset– Access.Size Approximate dataset size (bytes)– Access.Server Server endpoint URL

• Staging support goes here in the future– e.g., will dataset access require asynchronous staging– estimated cost to construct dataset

NVO Summer School, Aspen 9-Sep-2005 42

Service Metadata

• Usage– Describe service type and capabilities– Characterize service (data resources served, coverage, etc.)– Describe interface (optional query parameters)

• Interface– Requires new service metadata query method– Returns resource metadata descriptor (XML)

• Format– Registry resource descriptor (XML)

NVO Summer School, Aspen 9-Sep-2005 43

Data Retrieval

• Based on GET as with SIA– Variety of formats available– Compression supported

• Data representation– Data model defines logical content of data– The same data object may be represented

in various formats– Hence we need to specify both the data

model, and the file format

NVO Summer School, Aspen 9-Sep-2005 44

Data Retrieval

• Data models– SSA data model for fully-compliant data– Provider-defined data model for external data

• Data formats– VOTable (a container), native XML (direct serialization)– FITS binary table (another container; uses FITS spectral

WCS)– Text, e.g., CSV– Graphics (JPEG etc.)– text/html (rendered into browser page)

top related