pipeline basics jared crossley nrao nrao. what is a data pipeline? one or more programs that...
TRANSCRIPT
Pipeline BasicsPipeline BasicsPipeline BasicsPipeline Basics
Jared CrossleyJared Crossley
NRAONRAOJared CrossleyJared Crossley
NRAONRAO
What is a data pipeline?What is a data pipeline?What is a data pipeline?What is a data pipeline?
One or more programs that One or more programs that perform a task with reduced user perform a task with reduced user interaction.interaction.
May be developed as an extension May be developed as an extension of a more general and more of a more general and more interactive software system.interactive software system.
One or more programs that One or more programs that perform a task with reduced user perform a task with reduced user interaction.interaction.
May be developed as an extension May be developed as an extension of a more general and more of a more general and more interactive software system.interactive software system.
Why use it?Why use it?Why use it?Why use it?
Saves timeSaves time Especially with large (repetitive) data setsEspecially with large (repetitive) data sets Interactive data reduction may take a lot of time Interactive data reduction may take a lot of time
(even for an expert)(even for an expert)
ConsistencyConsistency Increased accessibility of a data reduction Increased accessibility of a data reduction
systemsystem You don’t have to be an “expert” to use a You don’t have to be an “expert” to use a
pipeline.pipeline. A good learning tool -- with good documentationA good learning tool -- with good documentation
Saves timeSaves time Especially with large (repetitive) data setsEspecially with large (repetitive) data sets Interactive data reduction may take a lot of time Interactive data reduction may take a lot of time
(even for an expert)(even for an expert)
ConsistencyConsistency Increased accessibility of a data reduction Increased accessibility of a data reduction
systemsystem You don’t have to be an “expert” to use a You don’t have to be an “expert” to use a
pipeline.pipeline. A good learning tool -- with good documentationA good learning tool -- with good documentation
Building a Pipeline: Start Building a Pipeline: Start simplesimple
Building a Pipeline: Start Building a Pipeline: Start simplesimple
Build a pipeline in layers.Build a pipeline in layers. The lowest level of the pipeline should still The lowest level of the pipeline should still
be interactive.be interactive. For example:For example:
Level 1: allow the user the specify input Level 1: allow the user the specify input parameters needed by the following tasks.parameters needed by the following tasks.
Level 2: find the best Level 2: find the best defaultdefault parameter parameter values for most data sets.values for most data sets.
Given these default values, most data can be Given these default values, most data can be processed with little interaction.processed with little interaction.
Focus on a subset of input data.Focus on a subset of input data.
Build a pipeline in layers.Build a pipeline in layers. The lowest level of the pipeline should still The lowest level of the pipeline should still
be interactive.be interactive. For example:For example:
Level 1: allow the user the specify input Level 1: allow the user the specify input parameters needed by the following tasks.parameters needed by the following tasks.
Level 2: find the best Level 2: find the best defaultdefault parameter parameter values for most data sets.values for most data sets.
Given these default values, most data can be Given these default values, most data can be processed with little interaction.processed with little interaction.
Focus on a subset of input data.Focus on a subset of input data.
Building a Pipeline: Building a Pipeline: continuedcontinued
Building a Pipeline: Building a Pipeline: continuedcontinued
The pipeline will evolve with timeThe pipeline will evolve with time Parameter dependencies will reveal Parameter dependencies will reveal
themselvesthemselves Data processing algorithms will become Data processing algorithms will become
apparent to the user. When well apparent to the user. When well defined, add it to the pipeline.defined, add it to the pipeline.
Acquire metadata when possible. Acquire metadata when possible. This can be used to initialize This can be used to initialize parameters.parameters.
The pipeline will evolve with timeThe pipeline will evolve with time Parameter dependencies will reveal Parameter dependencies will reveal
themselvesthemselves Data processing algorithms will become Data processing algorithms will become
apparent to the user. When well apparent to the user. When well defined, add it to the pipeline.defined, add it to the pipeline.
Acquire metadata when possible. Acquire metadata when possible. This can be used to initialize This can be used to initialize parameters.parameters.
Areas of concernAreas of concernAreas of concernAreas of concern
1.1. How much control should the How much control should the user be given?user be given?
Depends on the target audience. Depends on the target audience. Experts want more control than Experts want more control than novices.novices.
A compromise is lots of controls, but A compromise is lots of controls, but most of them pre-set to good initial most of them pre-set to good initial conditions.conditions.
1.1. How much control should the How much control should the user be given?user be given?
Depends on the target audience. Depends on the target audience. Experts want more control than Experts want more control than novices.novices.
A compromise is lots of controls, but A compromise is lots of controls, but most of them pre-set to good initial most of them pre-set to good initial conditions.conditions.
Areas of concernAreas of concernAreas of concernAreas of concern
2.2. How many output diagnostics How many output diagnostics should the pipeline produce?should the pipeline produce?
Varies by processing goal and user Varies by processing goal and user preference.preference.
If possible, include a pipeline If possible, include a pipeline parameter determines the amount parameter determines the amount of diagnostics.of diagnostics.
2.2. How many output diagnostics How many output diagnostics should the pipeline produce?should the pipeline produce?
Varies by processing goal and user Varies by processing goal and user preference.preference.
If possible, include a pipeline If possible, include a pipeline parameter determines the amount parameter determines the amount of diagnostics.of diagnostics.
More on OutputMore on OutputMore on OutputMore on Output
In addition to the primary output In addition to the primary output product, consider outputting product, consider outputting calibrated data and log files.calibrated data and log files.
This allows advanced users to build This allows advanced users to build upon what the pipeline has doneupon what the pipeline has done
And, this allows for quick And, this allows for quick “upgrades” to data products. “upgrades” to data products.
In addition to the primary output In addition to the primary output product, consider outputting product, consider outputting calibrated data and log files.calibrated data and log files.
This allows advanced users to build This allows advanced users to build upon what the pipeline has doneupon what the pipeline has done
And, this allows for quick And, this allows for quick “upgrades” to data products. “upgrades” to data products.
Validating OutputValidating OutputValidating OutputValidating Output
This is job is necessarily This is job is necessarily interactive.interactive.
However, a pipeline can simplify However, a pipeline can simplify the process by…the process by… Providing an easy way to view output, Providing an easy way to view output,
including diagnosticsincluding diagnostics And an easy way to delete (or flag) And an easy way to delete (or flag)
unacceptable output.unacceptable output.
This is job is necessarily This is job is necessarily interactive.interactive.
However, a pipeline can simplify However, a pipeline can simplify the process by…the process by… Providing an easy way to view output, Providing an easy way to view output,
including diagnosticsincluding diagnostics And an easy way to delete (or flag) And an easy way to delete (or flag)
unacceptable output.unacceptable output.
The VLA (AIPS) PipelineThe VLA (AIPS) PipelineThe VLA (AIPS) PipelineThe VLA (AIPS) Pipeline
DescriptionDescriptionDescriptionDescription The pipeline is a script (AIPS run file) that The pipeline is a script (AIPS run file) that
automates automates Editing,Editing, Calibration,Calibration, And ImagingAnd Imaging
of VLA continuum data. May also process of VLA continuum data. May also process spectral line data.spectral line data.
Emulates an AIPS taskEmulates an AIPS task Takes input parametersTakes input parameters Outputs images and calibration plotsOutputs images and calibration plots
Suggested default parameters contained in AIPS Suggested default parameters contained in AIPS memo.memo.
The pipeline is a script (AIPS run file) that The pipeline is a script (AIPS run file) that automates automates Editing,Editing, Calibration,Calibration, And ImagingAnd Imaging
of VLA continuum data. May also process of VLA continuum data. May also process spectral line data.spectral line data.
Emulates an AIPS taskEmulates an AIPS task Takes input parametersTakes input parameters Outputs images and calibration plotsOutputs images and calibration plots
Suggested default parameters contained in AIPS Suggested default parameters contained in AIPS memo.memo.
To use the AIPS pipeline: load data into AIPS; split out To use the AIPS pipeline: load data into AIPS; split out different frequencies.different frequencies.
To use the AIPS pipeline: load data into AIPS; split out To use the AIPS pipeline: load data into AIPS; split out different frequencies.different frequencies.
Demo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) Pipeline
Set the VLARUN Set the VLARUN input parameters.input parameters.
Set the VLARUN Set the VLARUN input parameters.input parameters.
Demo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) Pipeline
Flagging control
Pause during calibration
Diagnostic plots
Imaging control
Self-cal (fragile)
Image output by pipeline Image output by pipeline (axes and wedge added)(axes and wedge added) Image output by pipeline Image output by pipeline (axes and wedge added)(axes and wedge added)
Demo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) PipelineDemo: VLA (AIPS) Pipeline
Demo of VLA Pipeline Demo of VLA Pipeline
System: (System: (Imaging the VLA Imaging the VLA
Archive)Archive)
Demo of VLA Pipeline Demo of VLA Pipeline
System: (System: (Imaging the VLA Imaging the VLA
Archive)Archive)
DescriptionDescriptionDescriptionDescription
The VLA Pipeline The VLA Pipeline SystemSystem is an is an extension of the AIPS pipeline.extension of the AIPS pipeline.
IncludesIncludes1.1. Data acquisition, and preparation for Data acquisition, and preparation for
processingprocessing
2.2. Data processing (Data processing (AIPS pipelineAIPS pipeline))
3.3. Image finalization, and exportImage finalization, and export
4.4. ArchivingArchiving
5.5. Easy interactive data validationEasy interactive data validation
The VLA Pipeline The VLA Pipeline SystemSystem is an is an extension of the AIPS pipeline.extension of the AIPS pipeline.
IncludesIncludes1.1. Data acquisition, and preparation for Data acquisition, and preparation for
processingprocessing
2.2. Data processing (Data processing (AIPS pipelineAIPS pipeline))
3.3. Image finalization, and exportImage finalization, and export
4.4. ArchivingArchiving
5.5. Easy interactive data validationEasy interactive data validation
At a high level of pipeline automation, initial user At a high level of pipeline automation, initial user interaction takes place only on the command line.interaction takes place only on the command line.
The user can query the raw data archive via a The user can query the raw data archive via a Perl script:Perl script:
At a high level of pipeline automation, initial user At a high level of pipeline automation, initial user interaction takes place only on the command line.interaction takes place only on the command line.
The user can query the raw data archive via a The user can query the raw data archive via a Perl script:Perl script:
Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline
Next, select data files for download and filling.Next, select data files for download and filling. Next, select data files for download and filling.Next, select data files for download and filling.
Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline
Select files
Download
A Unix shell script waits to be called by cron.A Unix shell script waits to be called by cron. A Unix shell script waits to be called by cron.A Unix shell script waits to be called by cron.
Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline
Start AIPS
Execute AIPS Pipeline
After processing, the output is archived via After processing, the output is archived via scripts invoked by cron.scripts invoked by cron.
The data is now available online.The data is now available online. The final step is image validation…The final step is image validation…
After processing, the output is archived via After processing, the output is archived via scripts invoked by cron.scripts invoked by cron.
The data is now available online.The data is now available online. The final step is image validation…The final step is image validation…
Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline
A web-based validation tool allows for A web-based validation tool allows for validation.validation.
A web-based validation tool allows for A web-based validation tool allows for validation.validation.
Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline
Images and diagnostics can be viewed together and flagged for Images and diagnostics can be viewed together and flagged for removal.removal.
Images and diagnostics can be viewed together and flagged for Images and diagnostics can be viewed together and flagged for removal.removal.
Demo: VLA Demo: VLA PipelinePipelineDemo: VLA Demo: VLA PipelinePipeline
For more infoFor more infoFor more infoFor more info About AIPS Pipeline (VLARUN):About AIPS Pipeline (VLARUN):
AIPS Memo 112, by L. Sjouwerman. AIPS Memo 112, by L. Sjouwerman. http://www.aips.nrao.edu/aipsmemo.htmlhttp://www.aips.nrao.edu/aipsmemo.html
VLARUN “online” documentation. From the VLARUN “online” documentation. From the AIPS prompt type AIPS prompt type explain VLARUNexplain VLARUN
About Pipeline System and NVAS:About Pipeline System and NVAS: See the NVAS web page. See the NVAS web page.
http://www.aoc.nrao.edu/~vlbacaldhttp://www.aoc.nrao.edu/~vlbacald
For data acquisition scripts, see J. Crossley’s For data acquisition scripts, see J. Crossley’s web page. web page. http://www.aoc.nrao.edu/~jcrossle/http://www.aoc.nrao.edu/~jcrossle/
About pipeline basics:About pipeline basics: See notes on J. Crossley’s web page.See notes on J. Crossley’s web page.
About AIPS Pipeline (VLARUN):About AIPS Pipeline (VLARUN): AIPS Memo 112, by L. Sjouwerman. AIPS Memo 112, by L. Sjouwerman.
http://www.aips.nrao.edu/aipsmemo.htmlhttp://www.aips.nrao.edu/aipsmemo.html
VLARUN “online” documentation. From the VLARUN “online” documentation. From the AIPS prompt type AIPS prompt type explain VLARUNexplain VLARUN
About Pipeline System and NVAS:About Pipeline System and NVAS: See the NVAS web page. See the NVAS web page.
http://www.aoc.nrao.edu/~vlbacaldhttp://www.aoc.nrao.edu/~vlbacald
For data acquisition scripts, see J. Crossley’s For data acquisition scripts, see J. Crossley’s web page. web page. http://www.aoc.nrao.edu/~jcrossle/http://www.aoc.nrao.edu/~jcrossle/
About pipeline basics:About pipeline basics: See notes on J. Crossley’s web page.See notes on J. Crossley’s web page.