distributed pipeline programming for mosaics or mario tips’n’tricks

24
Distributed Pipeline Distributed Pipeline Programming for Programming for Mosaics Mosaics Or Or Mario Mario Tips’N’Tricks Tips’N’Tricks

Upload: carmella-gladys-payne

Post on 18-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Distributed Pipeline Distributed Pipeline Programming for MosaicsProgramming for Mosaics

OrOr

Mario Tips’N’TricksMario Tips’N’Tricks

Page 2: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

NOAO Mosaic PipelineNOAO Mosaic Pipeline

Page 3: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Major Features and GoalsMajor Features and Goals

Data products for NOAO archive and NVO nodeData products for NOAO archive and NVO node Data products for observersData products for observers Pipeline for NOAO and mosaic communityPipeline for NOAO and mosaic community Basic CCD mosaic calibrationsBasic CCD mosaic calibrations Advanced time-domain data productsAdvanced time-domain data products Real-time data quality assessment and monitoringReal-time data quality assessment and monitoring High performance, data parallel systemHigh performance, data parallel system LSST testbedLSST testbed Fairly generic pipeline infrastructure (NEWFIRM, …)Fairly generic pipeline infrastructure (NEWFIRM, …) Automated operationAutomated operation Thorough processing history and data documentationThorough processing history and data documentation

Page 4: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

MARIOMARIOMMosaic osaic AAutomatic utomatic RReduction eduction IInfrastructure and nfrastructure and OOperationsperations

(i.e. a pipeline)(i.e. a pipeline)

Page 5: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Key Concepts (Tips’N’Tricks)Key Concepts (Tips’N’Tricks) sub-pipelines - “meta pipeline sub-pipelines - “meta pipeline

programming”programming” indirect filesindirect files load balancing using trigger filesload balancing using trigger files stay-alive modulestay-alive module parallelization of algorithms over mosaicparallelization of algorithms over mosaic shared monitoringshared monitoring network filenamesnetwork filenames image processing language (CL)image processing language (CL)

Page 6: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

What is a pipeline?What is a pipeline?

collection of processing modulescollection of processing modules connected by dependency rulesconnected by dependency rules modules may run concurrently on modules may run concurrently on

different data objectsdifferent data objects

Infrastructure to manage processesInfrastructure to manage processes Infrastructure to manage dependenciesInfrastructure to manage dependencies Infrastructure to monitor processes and Infrastructure to monitor processes and

processingprocessing

Page 7: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

OPUSOPUSOOperations perations PPipeline ipeline UUnified nified

SSystemystem

Page 8: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

OpusOpus

Triggers (dependency rules)Triggers (dependency rules) file, osf, timefile, osf, time

BlackboardBlackboard PollingPolling

Monitors and ManagersMonitors and Managers

Page 9: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Distributed Pipeline IssuesDistributed Pipeline Issues

data vs. functional parallelismdata vs. functional parallelism shared file system vs. local file systemshared file system vs. local file system heterogeneous vs. homogenous heterogeneous vs. homogenous

processorsprocessors parasitic processingparasitic processing push vs. pull push vs. pull load balancingload balancing master-worker vs. peer-to-peermaster-worker vs. peer-to-peer

Page 10: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

MARIO ChoicesMARIO Choices

data parallelismdata parallelism local file system (w/ shared blackboard)local file system (w/ shared blackboard) heterogeneous processorsheterogeneous processors push AND pull push AND pull load balancing by number of data load balancing by number of data

objectsobjects peer-to-peerpeer-to-peer

Page 11: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

MARIO Architecture ConceptMARIO Architecture Concept

Multiple CPUs but no dependency on NMultiple CPUs but no dependency on N Multiple types of sub-pipelines by functionMultiple types of sub-pipelines by function

One for operations over all mosaic elementsOne for operations over all mosaic elements One for operations on individual elementsOne for operations on individual elements One for catalogingOne for cataloging One for image differencingOne for image differencing

All types on all CPUs: no master!All types on all CPUs: no master! Sub-pipelines triggered by filesSub-pipelines triggered by files

Page 12: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

““Meta Pipeline Meta Pipeline Programming”Programming”

Build a pipeline out of sub-pipelinesBuild a pipeline out of sub-pipelines Form a distributed web of sub-pipelinesForm a distributed web of sub-pipelines Sub-pipelines play role of subroutinesSub-pipelines play role of subroutines

Need equivalents of:Need equivalents of: objectsobjects call and returncall and return node assignmentnode assignment library of standard moduleslibrary of standard modules

start, call, return, done, obs, runstart, call, return, done, obs, run

Page 13: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

What is a sub-pipelineWhat is a sub-pipeline

primarily operates on one type of primarily operates on one type of objectobject

operates on one nodeoperates on one node data is maintained locallydata is maintained locally multiple stages but limited multiple stages but limited

functionalityfunctionality

Page 14: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Example of Sub-pipelinesExample of Sub-pipelines

NGTCAL

SCL

MEF

SIF

DTS

multiextension

single images

Page 15: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Sub-pipelinesSub-pipelines

NGT: Nights worth of dataNGT: Nights worth of data Group, Zero, Dome Flat, Objects, DoneGroup, Zero, Dome Flat, Objects, Done

CAL: Calibration sequence (MEF)CAL: Calibration sequence (MEF) Setup, Split, DoneSetup, Split, Done

SCL: Calibration sequence (SIF)SCL: Calibration sequence (SIF) Setup, CCDPROC, Combine, DoneSetup, CCDPROC, Combine, Done

MEF: Process objects (MEF)MEF: Process objects (MEF) Setup, Split, DoneSetup, Split, Done

SIF: Process objects (SIF)SIF: Process objects (SIF) Setup, CCDPROC, DoneSetup, CCDPROC, Done

Page 16: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Network of Sub-pipelines and Network of Sub-pipelines and CPUsCPUs

Pipeline

CPUCPU

CPU

CPU

CPU

MEF

SIF

SIF

MEF

SIF

MEF

MEFCPU

SIF

MEF

SIF

MEF

SIF

MEF: pipeline for operations over all mosaic extensions; eg crosstalk, global WCS correction

SIF: pipeline for single CCD images; eg ccdproc, masking

Page 17: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Example Processing StatusExample Processing StatusOBJECT NAME PIPELINE NODE STAGESOBJECT NAME PIPELINE NODE STAGES

anight1 ngt dhcp-4-152 cccw_ct4m20030102T183424S cal dhcp-4-152 cccd_ct4m20030102T183424S_01 scl archive2 ccccdct4m20030102T183424S_02 scl dhcp-4-152 ccccdct4m20030102T183424S_03 scl archive2 ccccdct4m20030102T183424S_04 scl dhcp-4-152 ccccdct4m20030102T191558S cal dhcp-4-152 cccd_ct4m20030102T191558S_01 scl archive2 ccccdct4m20030102T191558S_02 scl vmware ccccdct4m20030102T191558S_03 scl archive2 ccccdct4m20030102T191558S_04 scl dhcp-4-152 ccccdct4m20030103T084044 mef dhcp-4-152 ccw__ct4m20030103T084044_01 sif archive2 ccd__ct4m20030103T084044_02 sif archive2 cp___ct4m20030103T084044_03 sif vmware p____ct4m20030103T084044_04 sif archive2 _____ct4m20030103T084307 mef dhcp-4-152 cccd_ct4m20030103T084307_01 sif archive2 ccd__ct4m20030103T084307_02 sif vmware ccd__ct4m20030103T084307_03 sif archive2 ccd__ct4m20030103T084307_04 sif archive2 ccd__

Page 18: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Calling a Sub-pipelineCalling a Sub-pipeline

Data is setup either locally or on target Data is setup either locally or on target nodenode

File with path for returned result File with path for returned result written to target pipelinewritten to target pipeline

File with paths of returned results File with paths of returned results written in calling pipelinewritten in calling pipeline

Trigger file written to target pipelineTrigger file written to target pipeline

Page 19: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Returning ResultsReturning Results

Return module in target pipeline Return module in target pipeline looks for return filelooks for return file

Results are written to trigger file for Results are written to trigger file for calling pipeline specified in the return calling pipeline specified in the return filefile

Calling pipeline triggers on return fileCalling pipeline triggers on return file

Page 20: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Call/ReturnCall/Return

A -> HA -> HNN!B/data/abc!B/data/abcNN (derived from abc)(derived from abc)

A -> HA -> HNN!B/return/abc!B/return/abcNN [H!A/abc [H!A/abcNN.btrig].btrig]

A -> H!A/abc.b [abcA -> H!A/abc.b [abc11.btrig,abc.btrig,abc22.btrig,…].btrig,…]

A -> HA -> HNN!B/abc!B/abcNN.btrig [H.btrig [HNN!B/data/abc!B/data/abcNN]]

HHNN!B -> H!A/abc!B -> H!A/abcNN.btrig [results].btrig [results]

return checks H!A/abc.b for all donereturn checks H!A/abc.b for all done

Page 21: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Indirect FilesIndirect Files

anight1.ngttrig: anight1.ngttrig:

anight1.list:anight1.list:

Distribute data files across a networkDistribute data files across a network Move references and only move data as Move references and only move data as

neededneeded Pipeline objects: standard form, variable Pipeline objects: standard form, variable

contentcontent Act as triggers and meta-data containersAct as triggers and meta-data containers

anight1.ngttrig

anight1.list

Page 22: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Pipeline

DataDirectory

TriggerDirectory

Module

obj123.fits obj123.trig GO

File TriggersFile Triggers

Contains reference to

data

Data Trigger(DRA, user, or

pipeline module)

Tape Disk

DTS Process

Page 23: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Data Flow Networking: Data Flow Networking: ExampleExample

Host0:

Crosstalk

Host1:

Obj456.1

Obj321.2

Host2:

Obj567.2

Host3:

Obj123Obj123.2

Obj123.1Host3!Obj123.1

Host2!Obj123.2

Host4:

DOWN

Page 24: Distributed Pipeline Programming for Mosaics Or Mario Tips’N’Tricks

Data Parallel ModulesData Parallel Modules

Some algorithms may need to be re-Some algorithms may need to be re-implemented specifically for a data parallel implemented specifically for a data parallel pipeline.pipeline.

One type is where measurements are made One type is where measurements are made across the mosaic for a global calibration.across the mosaic for a global calibration.

Rather than requiring all pieces to be in one Rather than requiring all pieces to be in one pipeline arrange for measurements made in pipeline arrange for measurements made in parallel to be collected for the global parallel to be collected for the global calibration and then apply the global calibration and then apply the global calibration to the pieces in parallel.calibration to the pieces in parallel.