science data managemenast.noao.edu/sites/default/files/uc20140617_sdm-d3.pdf · 2014-06-16 · •...

20
Science Data Management Betty Stobie & Mark Dickinson NOAO Users Committee, Tucson, June 2014 -- D3 1 Calibrated DECam stack (credit: F. Valdes)

Upload: others

Post on 05-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

Science Data Management Betty Stobie & Mark Dickinson

NOAO Users Committee, Tucson, June 2014 -- D3 1

Calibrated DECam stack (credit: F. Valdes)

Page 2: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

Science Data Management (SDM)

•  SDM supports data-taking, data transport, reduction, archiving, distribution, and community use of data from NOAO telescopes and instruments (past, present & future)

•  Major activities include –  Data transport, capture, and storage infrastructure –  NOAO Science Archive –  Wide-field imager calibration pipelines

•  Mosaic, NEWFIRM, DECam, ODI –  IRAF support and development –  Virtual Astronomical Observatory (VAO) support and

development (ending during FY14) –  Mega-object catalog services (under development)

NOAO Users Committee, Tucson, June 2014 -- D3 2

Page 3: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

NOAO Science Archive Recent service upgrades

•  New login and security system replaced NVO Single Sign-On (SSO) as primary user authentication –  NVO SSO will be replaced by VAO SSO by year-end

•  New Archive Download Manager –  Simplifies and accelerates data downloads (coming soon, see

detailed slides) –  Thanks to UC members for outside testing!

•  New file naming convention –  See backup slides

•  Working on major “under the hood” modernization –  Many subsystems use outmoded versions of software –  Essential to update to current standards in order to ensure that

Archive and Portal can be maintained by a reduced staff in the future.

NOAO Users Committee, Tucson, June 2014 -- D3 3

Page 4: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

Wide-field imager pipelines Recent developments

•  Mosaic & NEWFIRM pipelines operational –  KPNO NEWFIRM data from 2008 – 2009 being reprocessed –  All calibrated images available in NOAO Science Archive

•  DECam Community Pipeline operational –  Current version: 3.0 (see backup slides for details) –  Excellent collaboration with DESDM team @ NCSA –  Fine-tuning continues

•  WIYN partial One Degree Imager (pODI) semi-operational –  Current version: 0.7.4 (see backup slides for details) –  Collaboration between WIYN, NOAO, and U. Indiana –  Data being manual processed, then archived at Indiana –  More work needed

NOAO Users Committee, Tucson, June 2014 -- D3 4

Page 5: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

IRAF

•  V2.16.1 patch released October 2013 •  ONEDSPEC and APEXTRACT development to enable use of

spectra stored as FITS tables, an increasingly popular spectral format, as well as text files

•  As resources permit, some development continues –  Top community priorities: Python bindings, CL enhancements –  See backup slides for more details

NOAO Users Committee, Tucson, June 2014 -- D3 5

Page 6: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

Virtual Astronomical Observatory (VAO) NASA/NSF program ending in FY14

•  NOAO participant in US VAO project •  Ongoing participation in IVOA (standards development) •  US VAO project ending on 30 Sept 2014 •  Specific projects (see backup slides)

–  VOClient (desktop tools released, further library development in progress)

–  VOSpace (Virtual Machine (VM) instances in development) –  Table Access Protocol (TAP, client services to catalog tools) –  VAO tool pre-release testing –  VAO help desk

NOAO Users Committee, Tucson, June 2014 -- D3 6

Page 7: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

Towards catalog-based services

•  SDM has focused on file-based (pixel) services in the past •  SDM must move towards catalog-based (object) services •  Overall framework is NOAO Data Lab project (under development)

–  Still in concept development phase (see Olsen presentation) •  Services need

–  Query services for catalogs with 10s to 100s of millions of objects –  Collaborative workspaces – user-defined sharable access to centralized

storage –  Analysis services – simple image cutouts thru to full photometry, catalog

science to limit transfers –  VM distributions to users – software (e.g. LSST stack), services (e.g.

storage to sync w/ DL), analysis environments/APIs (python, R, etc)

NOAO Users Committee, Tucson, June 2014 -- D3 7

Page 8: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

Detail slides

NOAO Users Committee, Tucson, June 2014 -- D3 8

Page 9: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

Archive Operations Statistics

•  Automated capture of data from 35 instruments on 11 telescopes on 3 sites (CT, CP, KP) –  Operating since 2004

•  Data stored at 3 sites (Tucson, La Serena, KP) •  NOAO Archive Operations

–  Science Archive contents •  Total raw files: 4,718,238, 131.2 TB

–  In last year raw files: 816,104 files (13% DECam) •  Total reduced files: 2,247,481, 93.1 TB

–  In last year reduced files: 619,789 files (55% DEC) •  Rice Compression using fpack

–  Raw compression lossless, factor of 2.2 –  Reduced compression lossy, factor of 6

–  Survey Archive contents •  21 surveys, 85,000 files, 6 TB

NOAO Users Committee, Tucson, June 2014 -- D3 9

Page 10: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

NOAO Archive Download Manager

•  New option for data transfer instead of ftp or lftp.

•  Query, find, and stage data sets as usual.

•  Then, from the staging area, select “Launch Download Manager” instead of using command line ftp.

NOAO Users Committee, Tucson, June 2014 -- D3 10

Page 11: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

NOAO Archive Download Manager •  Java webstart application •  Allows parallel threaded downloads •  Pause and resume •  Robust against interruptions (user can restart after crash) •  Shows download progress

Tested by internal & external users – thank you Users Committee members! Will be deployed soon in Archive v2.1.

NOAO Users Committee, Tucson, June 2014 -- D3 11

Page 12: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

New File Naming Convention

•  Raw file names are composed of multiple fields that define the origin of the data

–  For example: c4d_140603_101536_ori.fits.fz represents a raw object image file (ori) taken with DECam on the CTIO 4m (c4d) on UTC date and time 2014-0603, 10:15:36 (14063_101536).

•  Pipeline-reduced data share the root name with the raw data from which it was derived

–  For example the associated reduced data products are: •  c4d_140603_101536_ooi_g_v1.fits.fz object instcal image in g

filter and file version 1 •  c4d_140603_101536_ood_g_v1.fits.fz object instcal dqmask •  c4d_140603_101536_oow_g_v1.fits.fz object instcal weight map •  c4d_140603_101536_opi_g_v1.fits.fz object projected image •  c4d_140603_101536_opd_g_v1.fits.fz object projected dqmask •  c4d_140603_101536_opw_g_v1.fits.fz object projected weight map

NOAO Users Committee, Tucson, June 2014 -- D3 12

Page 13: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

DECam processing and archiving

Statistics: Number of DECam proposals processed: 102 Number of data sets (multi-night contiguous blocks of data) processed (many 2x or more): 166 Number of observing nights processed: 330 Approx. # of raw images fed to pipeline (many 2x or more): 82,603 Number of products available to PIs: 392,122 Dark Energy Survey (DES) data products: •  NOAO will archive reduced data products delivered by DES •  First year survey data (after science verification) go public during Sept.

2014 -Febr. 2015 –  Only “survey-worth” data. NOAO will evaluate whether to process other “unworthy”

data with CP.

NOAO Users Committee, Tucson, June 2014 -- D3 13

Page 14: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

DECam Community Pipeline (CP)

•  CP version 3.0 based on DES-DM 2.2.3 with NOAO updates/improvements •  Two major distributions – 2.2.2, 2.2.3 – were delivered by DESDM-CP,

installed at NOAO, and used on community data. –  Several interim individual component updates were provided by DESDM-CP –  Calibration file updates provided by DESDM-CP as available

•  NOAO (F. Valdes) modified or replaced parts of the workflow, esp.: –  Coadding (incl. multi-image cosmic ray removal); pupil and background subtraction;

secondary dark sky flat fielding (aka illumination correction)

•  NOAO made minor tweaks to infrastructure to improve efficiency. •  NOAO is satisfied with the quality of CP data products, and community user

feedback shows strong interest in using CP data products. –  NOAO formally accepted the CP at the time the Dark Energy Survey started taking non-

verification observations (as of 30 August 2013)

•  NOAO and DESDM-CP continue to have an excellent working relationship: –  Regular update meetings –  Feedback and new developments needed by DESDM offered to NOAO –  NOAO participation in calibration working groups and data management meetings

NOAO Users Committee, Tucson, June 2014 -- D3 14

Page 15: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

DECam dark sky illumination flat

NOAO Users Committee, Tucson, June 2014 -- D3 15

Individual z’-band exposures were corrected for dome flat, star flat, pupil, and large scale background, and then scaled and stacked with object masking, rejection, and median spatial filtering to remove sources. Residual flatfield variations have RMS < 0.5% (but larger peak-to-peak amplitude), and indicate small but real systematics, some of unknown origin. This “secondary illumination correction” is now applied in the NOAO DECam community pipeline.

Page 16: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

Moving objects

NOAO Users Committee, Tucson, June 2014 -- D3 16

Potentially hazardous near-earth object, first detected by NEOWISE, later recovered with DECam. DECam’s wide field of view and faint limiting magnitude have proven to make it an exceptional tool for recovering NEOWISE asteroids with large orbital uncertainties and low albedos. Credits: Amy Mainzer (JPL/NEOWISE); Dark Energy Survey (Josh Friedman) for ToO observation; Frank Valdes for pipeline processing, moving object detection software

Page 17: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

One Degree Imager Portal, Pipeline & Archive (PPA)

•  ODI Pipeline, Portal & Archive (PPA) is a collaboration between WIYN, NOAO (esp. Rob Swaters), and U.Indiana Pervasive Technology Institute (PTI)

•  Pipeline (AuCaP) Version 0.7.4 deployed in December 2013: –  Some file system issues at PTI, so NOAO is currently processing user data locally (Tucson) –  Includes: overscan subtraction; bias + dark subtraction; dome & twilight flat fielding;

astrometric solution; photometric characterization (vs. USNO-B, currently); reprojection & stacking; nonlinearity correction; saturation & persistence masking; masking of vignetted regions; generation of source catalogs

•  Under development: –  Level-2 pupil ghost subtraction from dome flats –  Improved background subtraction –  Dark sky flat-fielding

•  Planned development: –  Transient removal (eventually, 2-pass process in stacked data) –  Level-2 pupil ghost subtraction from science data –  Handling level-1 and level-3 pupil ghosts –  More accurate catalogs for photometric calibration

•  Longer term: –  Coherent guiding; binning; weight maps; bad data rejection; ODI-5x5

NOAO Users Committee, Tucson, June 2014 -- D3 17

Page 18: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

pODI AuCaP pipeline processing example

NOAO Users Committee, Tucson, June 2014 -- D3 18

Full frame Zoom

I-band (with fringe removal), stacked texp = 9300s

Page 19: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

IRAF development roadmap

•  IRAF development roadmap for 2014-2015 posted (October 2013) for community input (iraf.noao.edu/IRAF_Dev.pdf)

•  Envisioned as a means to modernize IRAF prior to FY16 NOAO fiscal restrictions.

•  Prioritized developments (as resources permit) 1.  Spectral package enhancements (in process) – handle spectra stored in table

format (wavelengths, fluxes, errors) in addition to traditional IRAF onedspec format

2.  Python task interface – enable re-use of IRAF science tasks in Python environment, and/or development of IRAF tasks written in Python

3.  CL language enhancements – improvements to enable more modular tasks 4.  In-memory CL image operators – store and operate on images in memory,

rather than via FITS I/O from disk. 5.  Parallel execution – optimize use of multi-core/multi-CPU systems for parallel

data processing

•  Demands for the data lab in FY15 will limit the effort available for IRAF development.

NOAO Users Committee, Tucson, June 2014 -- D3 19

Page 20: Science Data Managemenast.noao.edu/sites/default/files/UC20140617_SDM-D3.pdf · 2014-06-16 · • Pause and resume • Robust against interruptions (user can restart after crash)

Virtual Astronomical Observatory (VAO) NASA/NSF program ending in FY14

•  VOCLient –  Libraries and command-line tools for

•  Data Access (catalog, images, spectra) •  Resource discovery (keyword search to find data collections) •  Desktop Messaging (e.g. image/table load of Topcat tool) •  Integrated into IRAF to provide VO functionality

•  VOSpace –  Cloud storage protocol used by Data Lab –  Capabilities to allow searching of data, Views for fmt conversion

•  TAP (Table Access Protocol) –  Protocol for accessing general table data –  ADQL variant of SQL for complex queries (table JOIN, etc) –  Sync/Async job control

NOAO Users Committee, Tucson, June 2014 -- D3 20