powerpoint presentationw.astro.berkeley.edu/~jwang/docs/posters/2015lyot_data...title powerpoint...

1
The GPIES Data Cruncher: An Automated Data Processing System for the Gemini Planet Imager Exoplanet Survey Jason J. Wang, Pauline Arriaga, Marshall D. Perrin, Dmitry Savransky, James R. Graham, Christian Marois, Julien Rameau, Jean-Baptise Ruffio, and the GPI Team c Summary: The Data Cruncher can automatically process all science and calibration data from the GPI Exoplanet Survey and more Sensitivity curves and multiple PSF subtraction products are produced one hour after the data are available The Super Data Cruncher can also run on a supercomputing cluster and reprocess the entire campaign in a few hours Acknowledgements: This research was supported in part by NASA NNX15AD95G, NASA NNX11AD21G, NSF AST-0909188, and the University of California LFRP-118057. The GPI project has been supported by Gemini Observatory, which is operated by AURA, Inc., under a cooperative agreement with the NSF on behalf of the Gemini partnership: the NSF (USA), the National Research Council (Canada), CONICYT (Chile), the Australian Research Council (Australia), MCTI (Brazil) and MINCYT (Argentina). References: Marois, C., Correia, C., Galicher, R., et al. 2014, Proc SPIE, 9148. Perrin, M. D., Maire, J. , Ingraham, P., et al. 2014, Proc SPIE, 9147. Wang, J. J., Ruffio, J.-B., De Rosa, R. J., et al. 2015, Astrophysics Source Code Library, record ascl:1506.001. Crunchable Data GPI Exoplanet Survey Science 1 hour H-band integral field spectroscopy planet search 10 minute H-band snapshot broadband imaging polarimetry 1 hour H-band deep broadband imaging polarimetry GPIES Follow-up Multi-epoch deep follow-up observations in multiple bands GPI Queue Programs All coronagraphic data taken for GPIES members’ queue programs Calibrations All calibration data taken by GPI (which are publically available) Architecture Super Data Cruncher Reduced Data Products 0 0.5 1 1.5 2 2.5 3 0 5 10 15 20 25 30 Runtime (Hours) # of Datsets and Nodes Weak Scaling Summit Taken Dropbox Stored & Synced MySQL DB Logged Quality Checked All data products produced within ~1 hour of the data being available All data are synced to Dropbox for accessibility Datacubes PSF Subtracted Images Contrast Curves Calibrations pyKLIP: ADI+SDI pyKLIP: ADI pyKLIP: ADI+SDI w/ methane cADI Runs on NERSC’s Edison supercomputer (5576 nodes, 133,824 cores, 357 TB RAM) Uses MPI for inter-node communication < 100 lines of code needed to implement the Super Data Cruncher Reprocesses the entire campaign in a few hours Spectral Cube Polarimetry Cube Processing Backend Network Interface Web Socket or MPI Processing Controller High-level Python logic that controls dataflow through the various pipelines Uses queues to communicate between threads and monitors for synchronization GPI DRP (Perrin et al. 2014) TLOCI (Marois et al. 2014) pyKLIP (Wang et al. 2015) cADI (UdeM pipeline) Realtime Scanner Queues new datasets for processing and updates the GPIES Wiki Reprocessor Queries database to find and process existing datasets on demand Update Wiki Written in Python with some pipeline components written in IDL Highly modularized, multithreaded, and asynchronous New Files Save Reduced Data Products Send Commands Query for data Check for bad files Data Flow [email protected]

Upload: others

Post on 20-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • The GPIES Data Cruncher:An Automated Data Processing System for the Gemini Planet

    Imager Exoplanet SurveyJason J. Wang, Pauline Arriaga, Marshall D. Perrin, Dmitry Savransky, James R. Graham,Christian Marois, Julien Rameau, Jean-Baptise Ruffio, and the GPI Team

    c

    Summary: • The Data Cruncher can automatically process all science and calibration data from the GPI Exoplanet Survey and more• Sensitivity curves and multiple PSF subtraction products are produced one hour after the data are available• The Super Data Cruncher can also run on a supercomputing cluster and reprocess the entire campaign in a few hours

    Acknowledgements: This research was supported in part by NASA NNX15AD95G, NASA NNX11AD21G, NSF AST-0909188, and the Universityof California LFRP-118057. The GPI project has been supported by Gemini Observatory, which is operated by AURA, Inc., under a cooperativeagreement with the NSF on behalf of the Gemini partnership: the NSF (USA), the National Research Council (Canada), CONICYT (Chile), theAustralian Research Council (Australia), MCTI (Brazil) and MINCYT (Argentina).

    References:Marois, C., Correia, C., Galicher, R., et al. 2014, Proc SPIE, 9148. Perrin, M. D., Maire, J. , Ingraham, P., et al. 2014, Proc SPIE, 9147.Wang, J. J., Ruffio, J.-B., De Rosa, R. J., et al. 2015, Astrophysics Source Code Library, record ascl:1506.001.

    Crunchable Data

    GPI Exoplanet Survey Science• 1 hour H-band integral field spectroscopy planet search• 10 minute H-band snapshot broadband imaging polarimetry • 1 hour H-band deep broadband imaging polarimetryGPIES Follow-up• Multi-epoch deep follow-up observations in multiple bandsGPI Queue Programs• All coronagraphic data taken for GPIES members’ queue programsCalibrations• All calibration data taken by GPI (which are publically available)

    Architecture

    Super Data Cruncher

    Reduced Data Products

    0

    0.5

    1

    1.5

    2

    2.5

    3

    0 5 10 15 20 25 30

    Runti

    me

    (Hours

    )

    # of Datsets and Nodes

    Weak Scaling

    SummitTaken

    DropboxStored & Synced

    MySQL DBLogged

    Quality Checked

    • All data products produced within ~1 hour of the data being available

    • All data are synced to Dropbox for accessibility

    Datacubes PSF Subtracted Images Contrast Curves

    CalibrationspyKLIP: ADI+SDI

    pyKLIP: ADI

    pyKLIP: ADI+SDIw/ methane

    cADI

    • Runs on NERSC’s Edison supercomputer (5576 nodes, 133,824 cores, 357 TB RAM)• Uses MPI for inter-node communication• < 100 lines of code needed to implement the Super Data Cruncher• Reprocesses the entire campaign in a few hours

    Spectral Cube

    Polarimetry Cube

    Processing BackendNetwork Interface

    Web Socket or MPI

    Processing Controller

    High-level Python logic that controls dataflow through the various pipelines

    Uses queues to communicate between threads and monitors for

    synchronization

    GPI DRP(Perrin et al. 2014)

    TLOCI(Marois et al. 2014)

    pyKLIP(Wang et al. 2015)

    cADI(UdeM pipeline)

    Realtime Scanner

    Queues new datasets for processing and updates

    the GPIES Wiki

    Reprocessor

    Queries database to find and process existing datasets on demand

    Update Wiki

    • Written in Python with some pipeline components written in IDL• Highly modularized, multithreaded, and asynchronous

    New Files

    Save Reduced

    Data Products

    SendCommands

    Query for data

    Check for bad files

    Data Flow

    [email protected]