ten years of software sustainability at the infrared processing and analysis center
DESCRIPTION
Ten Years of Software Sustainability at The Infrared Processing and Analysis Center. G. Bruce Berriman and John Good NASA Exoplanet Science Institute, Infrared Processing and Analysis Center, Caltech, USA Ewa Deelman Information Sciences Institute, University of Southern California, USA - PowerPoint PPT PresentationTRANSCRIPT
Ten Years of Software Sustainability
at The Infrared Processing and
Analysis Center
G. Bruce Berriman and John GoodNASA Exoplanet Science Institute,
Infrared Processing and Analysis Center, Caltech, USAEwa Deelman
Information Sciences Institute, University of Southern California, USA
Anastasia AlexovAstronomical Institute Anton Pannekoek, Amsterdam,
Netherlands
Presentation at AHM 2010, Cardiff, September 2010.
The Role of IPAC in Astronomy
http://www.ipac.caltech.edu
Long-term archive
Curation of data
Dissemination to the community
Size and Usage Have Grown
Archives contain data from 30 missions and projects
Space based, ground based and knowledge based
Archives Built on a Common Hardware And Software ArchitectureArchives Built on a Common Hardware And Software Architecture
85 million queries
3 TB/month downloaded
A Common Software Architecture
Application is usually a CGI program
Each component is a module with a standard interface that communicates with other components and fulfills one general functionModules are stand-alone portable ANSI-C toolsComponents plugged together & controlled by an executive library Executive starts components as child services and parses return values
Application is usually a CGI program
Each component is a module with a standard interface that communicates with other components and fulfills one general functionModules are stand-alone portable ANSI-C toolsComponents plugged together & controlled by an executive library Executive starts components as child services and parses return values
Applications are generally simple web forms or Web services that search for data The “smarts” are on the server side;
optimize complex queries on large data sets
Component based architecture which enables strong re-use and adaptation Optimized for astronomical spatial
searches and complex, general queries regardless of wavelength and type of mission
All services are integrated into the Infrared Science Information System (ISIS)
Components are generic; minimize dependencies on third-party software or environments
Avoid shared memories or system calls All database queries are performed in
one module 300 KLOC
New projects automatically inherit functionality Supports efficient development and
controls maintenance costs
Engage Your Users! Concerted program of user engagement to
attract new users and build a user community
Method
User Surveys
End User Group(drawn from the community)
Exhibits and demos
Coffee pot conversations
Advertize in newsletters
Number of end-users has increased to 18,000
12% of peer-reviewed papers cited IPAC archives or data
Actively seek feedback, e.g.
Watch users as they try services; see where they get stuck
User Surveys ask respondents to write down their views rather than answer questions
Listen to the advice you don’t want to hear
Listen to the advice you don’t want to hear
Speed Is King In An Archive Image data sets becoming very large: Spitzer
Space Telescope will deliver over 100 million images, with varying footprints on the sky.
Searches for spatially extended images are slow: a scan of Spitzer images can take 2,000 s
… results pages are becoming more complex.
What matters more – fast access? Or interactivity? Speed won hands down.
R-tree Indexing Uses hierarchically
nested minimum bounding boxes
Performance scales as log(N)
Performance gain of x1000 over table scan
Memory-mapped files Parallelization / cluster processing REST-based web services
Segment of virtual memory is assigned a byte for byte correlation with part of a file.
Modernization of Scanpi Written in 1983, Scanpi co-adds scans from the far-infrared
IRAS survey. 15 papers per year on average by 2007.
Sensitivity gain of x5 over survey data products
Improve spatial resolution of extended or confused sources
User panel strongly recommended modernization because of its value in supporting interpretation of data from current IR missions Spitzer and Herschel.
But it was coughing up blood and was a classic legacy program
Written in F66, it had become a patchwork of scripts and bug fixes and was a maintenance nightmare.
Dependent modules for data compression etc. no longer supported.
Stranded on Solaris 2.8
Developer retiring
Scanpi Workflow
Co-registerscans
Co-add all scans
Re-usable Components
plotting
background
table manipulationbulk download
coordinate transformation
Sourcefitting
Back-ground fitting
Output:Results and
files on Web
Get scansInput:Source
info
Rewritten from ground up in C
Developed as a workflow application that gives visibility into the processing steps
Calls existing components, reduce code base to 21 KLOC cf. 102 KLOC
1.25 FTE development cf. 0.5 FTE for maintenance
The Montage Image Mosaic Engine
Montage Workflow (http://montage.ipac.caltech.edu)
Reprojection Background Rectification Co-addition OutputInput
BgModel
Project
Project
Project
Diff
Diff
Fitplane
Fitplane
Background
Background
Background
Add
Image1
Image2
Image3
Creates science-grade image mosaics
Scalable, modular design
ANSI-C code (300 MB) runs on all common *nix platforms – desktops, clusters, grids and supercomputers.
Processes 40 million 2MASS pixels in 32 min on 128 nodes of 1.2 GHz Linux cluster
Creates science-grade image mosaics
Scalable, modular design
ANSI-C code (300 MB) runs on all common *nix platforms – desktops, clusters, grids and supercomputers.
Processes 40 million 2MASS pixels in 32 min on 128 nodes of 1.2 GHz Linux cluster
How Is It Used? Science Analysis
Support Production of Data Sets, Data Products and Preview Products
Incorporate into Workflows and Pipelines
Spitzer Space Telescope teams
Quality Assurance of data products
5,000 downloads by bona-fide astronomers
Users now contributing to the project
Scripts for generating mosaics
Python front ends
MPI version
Contributed Script (Dr. Inseok Song)
Development of Cyber Infrastructure
Task scheduling in distributed environments (performance focused)
Designing job schedulers for the grid
Designing fault tolerance techniques for job schedulers
Exploring issues of data provenance in scientific workflows
Exploring applicability of scientific applications running on Clouds
Developing high-performance workflow restructuring techniques
Developing application performance frameworks
Developing workflow orchestration techniques
Cost of running workflows on Amazon EC2 cloud
Best Practices for Software Sustainability
Design for sustainability, extensibility, re-use and portability
Build an engaged user community that encourages users to contribute to sustainability
Be careful about new technologies – do a cost benefit analysis before adopting them
Use rigorous software engineering practices to ensure well-organized and well-documented code.
Control your and manage your interfaces.
Make source code and test and validation data available
✔
✔
✔