nvo summer school, aspen 9-sep-20051 data access layer doug tody (nrao) us n ational v irtual o...

of 44/44
NVO Summer School, Aspen 9-Sep-2005 1 Data Access Layer Doug Tody (NRAO) US NATIONAL VIRTUAL OBSERVATORY

Post on 27-Mar-2015

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Slide 1

NVO Summer School, Aspen 9-Sep-20051 Data Access Layer Doug Tody (NRAO) US N ATIONAL V IRTUAL O BSERVATORY Slide 2 NVO Summer School, Aspen 9-Sep-20052 Data Access Layer What does it do? Provides access to data data discovery mediation to a standard model data retrieval on-demand data generation server-side computation (subsetting, filtering) What is it for? Supports client data analysis distributed, multiwavelength How does it work? Object (dataset) oriented catalog, image, spectrum, time series, SED, etc. Services cone search (also SkyNode), SIA, SSA Slide 3 NVO Summer School, Aspen 9-Sep-20053 Cone Search Slide 4 NVO Summer School, Aspen 9-Sep-20054 Cone Search Provides basic catalog access Query by position and aperture (cone in space) Query consists of base-URL (service endpoint) plus parameters e.g., http://base-url %RA=12.0&DEC=0.0&SR=1.0http://base-url Catalog returned as a VOTable Advantages Simple but powerful, provides standard interface Easy to implement and use Limitations Catalog metadata is not defined No data model support Future Supplanted by basic SkyNode (Greene, Saturday) Supports metadata discovery, SQL-like syntactical queries We will continue to support the basic cone search query however! Slide 5 NVO Summer School, Aspen 9-Sep-20055 Simple Image Access Slide 6 NVO Summer School, Aspen 9-Sep-20056 Simple Image Access (SIA) Basic Usage, Highest Level Client queries Registry to find interesting services Each service is queried (in turn or simultaneously) for data Client collates and analyzes results Selected datasets are retrieved Slide 7 NVO Summer School, Aspen 9-Sep-20057 Simple Image Access (SIA) Basic Usage, Single Service Query find data of interest from a single service http://base-url %POS=12.0,0.0&SIZE=0.2&FORMAT=image/fitshttp://base-url Query response VOTable, one row per candidate dataset "access reference" (a URL) points to data Data selection Performed by the client using query response metadata Dataset retrieval Retrieve actual datasets, if any Slide 8 NVO Summer School, Aspen 9-Sep-20058 Service Capabilities Types of Services AtlasPrecomputed survey image (entire image) PointedImage from pointed observation (entire image) CutoutCutout existing image (pixels unchanged) MosaicReprojected image (pixels resampled) Virtual Data Data model mediation Subsetting, filtering, etc. on the fly Possible to view same data in different ways Interface RESTful interface currently (HTTP GET) Document oriented (VOTable, FITS, JPEG, etc.) Slide 9 NVO Summer School, Aspen 9-Sep-20059 Data Model SIA data model is the familiar "astronomical image" Generally this means a 2D sky projection Data array is logically a regular grid of pixels Encoded as a FITS image, GIF/JPEG, etc. Standardized dataset metadata Provenance Image geometry Scale Format Position, WCS Time of observation Spectral bandpass Access information Slide 10 NVO Summer School, Aspen 9-Sep-200510 Input Parameters Required parameters POScenter of ROI (ra, dec decimal degrees ICRS) SIZEwidth; or width, height FORMAT ALL, GRAPHIC, image/fits, image/jpeg, text/html, Optional parameters INTERSECTvalues: covers, enclosed, center, overlaps VERBtable verbosity Service-defined parameters used to further refine queries, but not yet standardized e.g., BAND, SURVEY, etc. Image generation parameters NAXIS, CFRAME, EQUINOX, CRPIX, CRVAL, CDELT, ROTANG, PROJ used for cutout/mosaic services to specify image to be generated Slide 11 NVO Summer School, Aspen 9-Sep-200511 Query Response Output is a VOTable Must contain a RESOURCE element with tag="results", containing the results of the query. The results resource contains a single table Each row of the table describes a single data object which can be retrieved. The fields of the table describe the attributes of the dataset These are the attributes of the SIA data model In SIA 1.0, the UCD is used to identify the data model attribute e.g., POS_EQ_RA_MAIN, VOX:Image_Scale, etc. Slide 12 NVO Summer School, Aspen 9-Sep-200512 Query Response Image metadata Describes the image object (required) Coordinate system metadata Image WCS Spectral bandpass metadata Prototype data model describing spectral bandpass of image Processing metadata Tells whether the service modified the image data Access metadata Tells client how to access the dataset (required) Resource-specific metadata Additional optional service-defined metadata describing image Slide 13 NVO Summer School, Aspen 9-Sep-200513 Image Metadata VOX:Image_TitleBrief description of image POS_EQ_RA_MAINRa (ICRS) POS_EQ_DEC_MAINDec (ICRS) INST_ID Instrument name VOX:Image_MJDateObsMJD of observation VOX:Image_Naxes Number of image axes VOX:Image_Naxis Length of each axis VOX:Image_Scale Image scale, deg/pix VOX:Image_Format Image file format Slide 14 NVO Summer School, Aspen 9-Sep-200514 Slide 15 NVO Summer School, Aspen 9-Sep-200515 Image Retrieval Completely optional Typically only a fraction of the available images are retrieved Query response If an access reference is provided, the data can be retrieved SIAP can also be used to describe data which is not online The same data may be available in multiple formats Image retrieval Very simple; access reference is a URL Standard tools can be used to fetch the data (browser, wget, curl, i/o library, etc.) Data is often computed on-the-fly All retrieval is synchronous (currently) No provision for restricting access (currently) Slide 16 NVO Summer School, Aspen 9-Sep-200516 Service Registration Slide 17 NVO Summer School, Aspen 9-Sep-200517 Future Development SIA V1.1 Based on work done on SSA Expanded query interface no longer limited to positional queries Much richer query response generic dataset identification, characterization, etc. metadata extension mechanism Selected features VOTable 1.1 with UCD 1+, GROUP, UTYPE query response can be ordered by "score" logical groupings of related query records compression support Versioning required to make protocol upgrades manageable Slide 18 NVO Summer School, Aspen 9-Sep-200518 Slide 19 NVO Summer School, Aspen 9-Sep-200519 Slide 20 NVO Summer School, Aspen 9-Sep-200520 Future Development Service verification for testing at development time when registered; level of compliance metric Grid capabilities Data staging asynchronous image generation (long running jobs) batch generation of images (multiple images) Data management support for single sign-on authentication, authorization network data caching, third party delivery (VOStore etc.) Web service interface resource metadata service availability (etc.) ADQL integration Capability to use query language for queries Slide 21 NVO Summer School, Aspen 9-Sep-200521 Simple Spectral Access Slide 22 NVO Summer School, Aspen 9-Sep-200522 Simple Spectral Access (SSA) What is it? Provides access to 1D spectra, time series, SEDs Tabular spectrophotometric data (photometry points) Represents second generation, data model-based DAL interfaces Status Draft V0.9 query interface reviewed in Kyoto (May 05) Revisions in progress; draft PR targeted for Madrid (Oct 05) Much work on data models however still being revised Some initial prototypes already exist (services, client apps) IVOA/Madrid discussions will be held immediately after the ADASS and are open to all Slide 23 NVO Summer School, Aspen 9-Sep-200523 Basic Usage SSA specification may be complex, but basic usage is simple Simple query POS, SIZE, FORMAT - like cone search, SIA Possibly refined by spectral or time bandpass, etc. Most metadata in query response is optional Data retrieval Simple retrieval is again URL-based Get back a dataset "document" (VOTable, FITS, JPEG, etc.) In simplest case could be wavelength, flux as text (for Spectrum) Pass-through of external data is permitted Data Analysis Standard data model isolates application from quirks of external project data Slide 24 NVO Summer School, Aspen 9-Sep-200524 Concepts - Dataset-oriented Data object type Spectrum, TimeSeries, SED Dataset creation type AtlasWhole datasets, uniform survey data Pointed Whole datasets, variable instrumental data Cutout Subset, data samples are not modified Resampled Subset, data samples computed by service Dataset derivation Observed An observation Composite Combination of several observations Simulated Simulated observation made from real data Synthetic Data from a theoretical model Slide 25 NVO Summer School, Aspen 9-Sep-200525 Data Models Data models used in SSA Spectral dataSpectrum, TimeSeries, SED Dataset Generic dataset descriptor Target Astronomical target observed Curation Origin of data CharacterizationPhysical characteristics of data Provenance Instrument which generated the data User defined data models Metadata extension mechanisms additional data model attributes (table fields) additional resources in VOTable, linked back to main table Provide a mechanism to "subclass" dataset to tailor it for a given data collection Slide 26 NVO Summer School, Aspen 9-Sep-200526 Spectral Data (SED) spectrum segment Photometry point Slide 27 NVO Summer School, Aspen 9-Sep-200527 Spectral/SED Data Model Slide 28 NVO Summer School, Aspen 9-Sep-200528 Slide 29 NVO Summer School, Aspen 9-Sep-200529 Query Interface Mandatory query parameters POSRA, DEC (ICRS) SIZEdiameter (decimal degrees) TIME data1,date2 (epoch in decimal years UTC) BANDwave1,wave2 (meters in vacuum; source or observer) FORMATVOTable, fits, xml, text, graphics, html, external Slide 30 NVO Summer School, Aspen 9-Sep-200530 Query Interface Recommended query parameters APERTURE approx spatial resolution (decimal degrees) SPECRES spectral resolution (meters) TOP number of top-ranked records to return OBJTYPEmandatory if service returns multiple object types COLLECTIONdata collection identifier Slide 31 NVO Summer School, Aspen 9-Sep-200531 Query Interface Optional parameters CREATORID creator-assigned dataset identifier (at most 1) PUBIDpublisher-assigned dataset identifier (at most N) COMPRESSenable compression (for both data _and_ queries?) SNR signal-to-noise ratio REDSHIFT redshift range (dlambda/lambda) TARGETCLASSstar, galaxy, pulsar, PN, QSO, AGN, etc. Slide 32 NVO Summer School, Aspen 9-Sep-200532 Query Response Classes of query metadata Query metadataDescribes the query itself Dataset metadataDescribes data object; object-specific Target metadata Astronomical target Curation metadata External identification of dataset Characterization Coverage, Accuracy, Frame, etc. Instrument metadata Service-defined; hard to standardize Access metadata Describes how to access the dataset Slide 33 NVO Summer School, Aspen 9-Sep-200533 Query Response Query Metadata Query.ScoreHow well object matches query Query.LNameLogical name (identifier) Query.LNameKeyLogical name key (id-ref) Example: LName="MyObj123" LNameKey="server,format" Slide 34 NVO Summer School, Aspen 9-Sep-200534 Query Response Dataset Metadata Dataset.Type Spectrum, TimeSeries, SED, etc. Dataset.DataModel DM name, e.g., "SSA-V0.90" Dataset.Title Brief descriptive title of dataset Dataset.SSA.NSamples Total samples in dataset Dataset.SSA.Aperture Characteristic aperture diameter Dataset.SSA.TimeAxis TimeCoord axis (external data) .SSA.SpectralAxis SpectralCoord axis (external data) Dataset.SSA.FluxAxis Flux axis (external data) Dataset.CreationType atlas, pointed, cutout, resampled Dataset.Derivation observed, composite, simulated, synthetic Slide 35 NVO Summer School, Aspen 9-Sep-200535 Query Response Target Metadata Target.NameName of astronomical object Target.Class Target class (star, galaxy, QSO, etc.) Target.SpectralClassSpectral class (e.g., 'O', 'B', etc.) Target.Redshift Nominal redshift for object Derived.VarAmpl Variability amplitude (fraction 0-1) Derived.SNR Observed signal to noise ratio Slide 36 NVO Summer School, Aspen 9-Sep-200536 Query Response Curation Metadata Curation.CollectionData collection name (identifier) Curation.Creator Creator identify (identifier) Curation.CreatorID Creator-assigned dataset identifier Curation.PublisherID Publisher-assigned dataset identifier Curation.Date Dataset creation date (ISO date string) Curation.Version Dataset version (within same ID) Slide 37 NVO Summer School, Aspen 9-Sep-200537 Query Response Characterization1 - Coverage .Location.SpatialPosition (e.g., RA, DEC) .Location.Time Observation time characteristic value .Location.Spectral Spectral bandpass characteristic value .Location.Spectral.BandID Bandpass ID (band or filter name) .Bounds.Spatial Aperture footprint (polygon on sky) .Bounds.Time Low/High time values .Bounds.Spectral Low/High spectral values .Bounds.Flux Limiting flux, saturation limit (Jansky) .Fill.Spatial Spatial sampling filling factor (0-1) .Fill.Time Time sampling filling factor (0-1) .Fill.Spectral Spectral sampling filling factor (0-1) Slide 38 NVO Summer School, Aspen 9-Sep-200538 Query Response Characterization2 - Accuracy Accuracy.*.Calibrateduncalibrated, relative, absolute Accuracy.*.Resolution Resolution of measured signal Accuracy.*.StatErr Statistical error (measured) Accuracy.*.SysErr Systematic error (estimated) ('*' = Spatial, Time, Spectral, Flux) Slide 39 NVO Summer School, Aspen 9-Sep-200539 Query Response Characterization3 - Reference Frames Frame.Spatial.TypeCoordinate frame (default ICRS) Frame.Spatial.Equinox Coordinate system equinox (J2000) Frame.Time.System Timescale (TT) Frame.Time.SIDim SI factor and dimension Frame.Spectral.SIDim SI factor and dimension Frame.Flux.SIDim SI factor and dimension Frame.Flux.UCD UCD of flux value (flux type) (These apply only to the query response) (SIDim metadata still under construction) Slide 40 NVO Summer School, Aspen 9-Sep-200540 Query Response Instrument Metadata Instrument.NameInstrument name (identifier) Instrument.Exposure Total exposure time (seconds) Instrument. Service-defined Notes Optional; provided for instrumental data collections In general, Collection, Bounds.Time, etc. are preferred In general Instrument metadata is service-defined Use Observation model as a starting point Slide 41 NVO Summer School, Aspen 9-Sep-200541 Query Response Access Metadata Access.ReferenceData access URL Access.Format MIME type of returned dataset Access.Size Approximate dataset size (bytes) Access.Server Server endpoint URL Staging support goes here in the future e.g., will dataset access require asynchronous staging estimated cost to construct dataset Slide 42 NVO Summer School, Aspen 9-Sep-200542 Service Metadata Usage Describe service type and capabilities Characterize service (data resources served, coverage, etc.) Describe interface (optional query parameters) Interface Requires new service metadata query method Returns resource metadata descriptor (XML) Format Registry resource descriptor (XML) Slide 43 NVO Summer School, Aspen 9-Sep-200543 Data Retrieval Based on GET as with SIA Variety of formats available Compression supported Data representation Data model defines logical content of data The same data object may be represented in various formats Hence we need to specify both the data model, and the file format Slide 44 NVO Summer School, Aspen 9-Sep-200544 Data Retrieval Data models SSA data model for fully-compliant data Provider-defined data model for external data Data formats VOTable (a container), native XML (direct serialization) FITS binary table (another container; uses FITS spectral WCS) Text, e.g., CSV Graphics (JPEG etc.) text/html (rendered into browser page)