perspectives on scientific software recovery and

Post on 15-Oct-2021

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Perspectives on Scientific Software Recoveryand Revitalization.

Bob ApthorpeMarch 30, 2018

Acorvid Technical Services Corporation

Overview.

Characteristics of Scientific Software

Properties of scientific software include

• Numerically sophisticated• Developed principally or exclusively by subject matterexperts (SMEs)

• Favors accuracy ≥ efficiency > elegance• Long lifespan

1

Common Themes in Older Scientific Software Projects

Many mature scientific software projects have a commonlifecycle:

• Often for internal use only• Resources allocated for development, not maintenance• May be maintained by one person or a small group• Software can outlive its creator• Maintenance may fall to junior personnel unfamiliar withthe code

Search Stack Overflow for the phrase ``I recently inherited aFORTRAN application…''

2

Software Development Lifecycle

Source: https://www.gethow.org/importance-of-software-lifecycle-management 3

Reasons for Software Recovery

Inactive or retired software may need to be put back in service

• Legal or regulatory requirements• Confirmation of prior results• Renewed interest in subject matter

Examples include:

• Recovering fast reactor safety analysis codes to supportGenIV reactor development (SPRAY, SOFIRE-II)

• Extending support code which generated aerosol particlesize distributions for severe reactor accident code

• Needed chemical equilibrium analysis code built andtested to ASME NQA-1 standard (``nuclear grade'')

4

Software Recovery Issues

Software recovery involves issues similar to those with legacycode:

• Original or cognizant developers may not be available• Source code may not be machine-readable• Limited test cases and documentation• Original development environment may not exist

• Different hardware• Different operating system• Language differences; proprietary extensions• Compilation flags• Missing dependencies (libraries)

5

Pragmatic Questions

When encountering new software, basic questions are:

• What does this do?• How do I know it works?• How do I make it work?• How does it work?• How do I change it without breaking it?

Relevant to a new user, developer, or manager

6

Case Study: Analysis of Flow in PipeNetworks.

Case Study: Analysis of Flow in Pipe Networks

7

Background

• Question on Stack Overflow about a missing libraryneeded by a pipe network flow solver

• Source code published in [Jep74] and [Jep76] but not inmachine-readable format

• Code was in fair to poor condition; missing proprietarylibrary

• Short, tractable problem ideal for illustrating coderecovery techniques

• As of March 2018, Google Scholar shows at least 330citations to [Jep76]

• Code has historical significance [Orm08]

8

Problem Domain: Flow in Pipe Networks

9

Basic Theory i

Bernoulli's equation:

z1 +p1ρg +

V212g +

E1g = z2 +

p2ρg +

V222g +

E2g

Newton's Law of Viscosity:

τ = µdvdy

Darcy-Weisbach equation:

∆p = fDLDH

ρ⟨v⟩22

10

Basic Theory ii

11

Basic Theory iii

Conserve mass and energy

Assumptions:

• constant material properties• isothermal• incompressible flow• continuity; Kirchoff's laws• momentum can be ignored

Reasonable approximation of steady-state single-phaseincompressible flow

Suitable for design and analysis of water distribution systems

12

Empirical Correlations for Friction Factor

13

Known Challenges

14

Known Challenges

• Code was originally written for UNIVAC 1108;• Uses matrix solver GJR from proprietary Sperry MATH-PACKlibrary

• May use proprietary extensions to FORTRAN IV

• OCR'd source code will be full of scanning artifacts• Illegible source code in [Jep74]; [Jep76] is legible however…• No build instructions• Few test cases

15

Source Code Legibility

16

Proprietary Libraries and Extensions

17

Source Code Errors

18

Ambiguous Code

19

Test Errors

20

Development Plan.

Recovery Phases

Refactoring effort split into three distinct phases:

• Resuscitate: Make code run correctly• Update: Replace problematic coding constructs withmodern equivalents

• Modularize: Extract common elements to modules sharedamong applications

21

Resuscitate: Goals

Primary Goal: Make code run correctly

• Convert OCR'd text to well-formed source code• Set liberal compile and link options• Set up test cases with sample input and expected putout• Create build scripts• Put project under revision control• Set up auto-documentation system (doxygen)• Note sections of code which may be easy or difficult tomodernize in next phase

22

Resuscitate: Deliverables

Large number of artifacts created in this phase:

• `working' executable code• Well-formed, managed source code

• Original code• Replacement for GJR from Sperry MATH-PACK

• Build scripts• Test cases• Documentation

• Build• Test• Code

Deliverables should be correct but may not be complete or`perfect'.Goal is a Minimum Viable Product

23

Update: Goals i

Primary Goal: Replace problematic coding constructs withmodern equivalents

• Convert code to indented free-format• Replace DO/CONTINUE with DO/ENDDO• Replace IF/GOTO with IF/ELSEIF/ELSE/ENDIF• Replace bare GOTO with DO WHILE, CYCLE, or EXIT• Replace common numeric literals with named constants• Replace old-style conditional operators with modernequivalents (e.g. convert .GE. to >=)

• Force all variables to be declared by using IMPLICITNONE

24

Update: Goals ii

Note repetitive sections of code for isolation into sharedmodules in next phase

Defer complex refactoring until next phase

Structure becomes apparent as simple refactorings are applied

25

Update: Deliverables

Fewer new deliverables; more changes to existing work:

• `Working' executable code• Readable, modernized source code• New shared utility routines• Improved build scripts with common dependencies• Updated documentation

26

Modularize: Goals

Goal: Extract common elements to modules shared amongapplications

• Add unit tests• Create cohesive modules with sensible grouping ofcomponents

27

Modularize: Deliverables

More new deliverables and substantial changes to existingwork:

• `Working' executable code• Source code separated into applications and commondependencies

• Unit tests on dependencies• Improved build scripts with common dependencies• Updated documentation

28

Summary.

Summary

• Full project detailed in [Apt18]• Source archive available at https://bitbucket.org/apthorpe/jeppson_pipeflow

29

Conclusions and Insights i

• Source code recovery went about at well as anticipated• OCR'd text was surprisingly usable• Took several passes to re disambiguate visually similarcharacters

• Did not anticipate errors in published code or test cases -errors exist everywhere!

• Phased approach worked well• Maintained very clear focus on outcome and deliverables• Simplified decision-making when new issues arose• Short iterations limited `scope creep' to manageable levels

30

Conclusions and Insights ii

• Reconstruction of UNIVAC MATH-PACK routine was aboutas complex as expected due to oddness of LAPACKinterfaces

• Build and test infrastructure reconstitution was valuable• Transition from sh to make to CMake was worth theannoyance

• ftncheck is very usable as a simple cross-platform unittest framework; has quirks

• Still looking into CTest and CDash integration• Still not sold on CMake; haven't found anything better, butsystem is baroque and unintuitive

• Jupyter Notebook invaluable for prototyping,benchmarking, and test data construction

31

Questions?.

Thank You!

32

References.

References i

Robert Apthorpe, Recovery of jeppson pipe networkanalysis software, Tech. Report wp-20180209-jeppson,Acorvid Technical Services Corporation, March 2018,http://www.acorvid.com/wp-content/uploads/2018/03/wp-20180307-jeppson-1.pdf.Roland W. Jeppson, Steady flow analysis of pipe networks:An instructional manual, Tech. Report 300, Utah WaterResearch Laboratory, September 1974, https://digitalcommons.usu.edu/water_rep/300/.

, Analysis of flow in pipe networks, Ann ArborScientific Publishers, Inc., Ann Arbor, MI, 1976.

33

References ii

Lindell E. Ormsbee, The history of water distributionnetwork analysis: The computer age, Water DistributionSystems Analysis Symposium 2006, 2008, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.510.2637&rep=rep1&type=pdf, pp. 1--6.

34

For More Information i

• Stack Overflow question ``Univac Math pack subroutines inold-school FORTRAN (pre-77)''https://stackoverflow.com/questions/48265245/WP-20180209-jeppson/

univac-math-pack-subroutines-in-old-school-fortran-pre-77

• Citation count for [Jep76] via Google Scholar:https://scholar.google.com/scholar?cites=15501786215094582199&as_sdt=5,44&sciodt=0,44&hl=en

• Jeppson pipe flow code archive available at https://bitbucket.org/apthorpe/jeppson_pipeflow

35

For More Information ii

• Doxygen documentation generation tool,http://www.stack.nl/~dimitri/doxygen/

• findent source code reformatter,https://sourceforge.net/projects/findent/

• FLIBS, open Fortran utility libraries,https://sourceforge.net/projects/flibs/

• CMake build automation system, https://cmake.org/

36

top related