scientific software: sustainability, skills & sociology

Post on 10-May-2015

248 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Scientific Software: sustainability, skills & sociologyNeil Chue Hong, N.ChueHong@software.ac.ukDirector, Software Sustainability InstituteUS/IAEA Workshop on Software Sustainability for Safeguards Instrumentation, Viennawww.software.ac.uk

The Software Sustainability Institute

A national facility for cultivating world-class research through software• Better software enables better research• Software reaches boundaries in its

development cycle that prevent improvement, growth and adoption

• Providing the expertise and services needed to negotiate to the next stage

• Developing the policy and tools tosupport the community developing andusing research software

Supported by EPSRC Grant EP/H043160/1

www.software.ac.uk

Anatomy of my talk

www.software.ac.uk

SOFT

WAR

E is

……

are IMPO

RTANT

everywhere

hard to define

long-lived

context

reasons

people

Software is everywhere(even where you expect it)

www.software.ac.uk

Factories

Services Cinema

Writing

Software is pervasive

Tamiflu binding to mutant influenza

A water-swap reaction coordinate for the calculation of absolute protein-ligand binding free energiesWoods CJ, Malaisree M, Hannongbua S, Mulholland AJJ. Chem. Phys. (2011) vol. 134, pp. 054114http://dx.doi.org/10.1063/1.3519057

Favouring of disease risk alleles

Selection at pleiotropic loci underlies disease co-occurrence in human populations. Navarro, Haley, Karosas et al. Submitted to Nature Genetics

Behind every great piece of science…#go through each SNP of interestfor(my $x = 0; $x < scalar @pos; $x++){ #and then each downstream SNP of interest for(my $y = $x+1; $y < scalar @pos; $y++) { #if SNPs within our chosen distance (500kb) and both present in the haplotypes file if((!($trait[$x] eq $trait[$y])) && (abs($pos[$x] - $pos[$y]) <= 500000) && (exists($legArrayPos{$pos[$x]})) && (exists($legArrayPos{$pos[$y]}))) { my $snp1ArrayPos = "”; my $snp2ArrayPos = "”; my $snp1All = "”; my $snp2All = "”;

#create output file for this SNP pair my $filename = "ConditionedResults2/$chr[$x].$pos[$x]-$pos[$y].EHH.GBR.2.txt”; print "$filename\n”; unless (-e $filename) { open(OUT, ">$filename");

#####################CHANGE THESE IF NOT FOCUSING ON SECOND SNP######################### my $start = $pos[$y]-500000; if ($start < 1) { $start = 1; } my $end = $pos[$y]+500000; if ($end > $chrLengths{$chr[$x]}) { $end = $chrLengths{$chr[$x]}; }

Software is long-lived(and outlasts computational hardware)

www.software.ac.uk

Architectural Dominance

www.software.ac.uk

Image courtesy PDES IncSlide from Sean Barker, BAE SYSTEMS, DPC Designed to Last

13

Computational Chemistry - CASTEP

From the first implementation of a DFT algorithm to a completely new code to community supported software

• Individual• Group• Consortium• W/ industry• Community• Active

Software advances< hardware speedup http://www.castep.org/

www.software.ac.uk

LOTAR: storing aeronautical models

Life of CAD System: 10 years

Time between CAD Versions: 6 months

Life of Product: 70 years +

time

Production

CAD Obsolete CAD Forgotten

Services

Legal Liability

Modifications

10 years 20 30 40 50 60

Spares

Image courtesy PDES IncSlide from Sean Barker, BAE SYSTEMS, DPC Designed to Last

www.software.ac.uk

So we have to maintain it…

• “The modification of a software product after delivery to correct faults, to improve performance or other attributes, or to adapt the product to a modified environment” – IEEE defn.– Corrective maintenance: fixing faults– Adaptive maintenance: adapting to changes in

environment– Perfective maintenance: meeting new/different user

requirements– Preventative maintenance: increasing maintainability

www.software.ac.uk

… because we cannot change this with process and practice alone …

• “Many of us have tried to discover ways to prevent code from becoming legacy. But … prevention is imperfect. Even the most disciplined development team, knowing the best principles, using the best patterns, and following the best practices will create messes from time to time. The rot still accumulates. It’s not enough to prevent the rot – you have to be able to reverse it.”

www.software.ac.uk

… so we work with what we have

• Identify change points• Find test points• Break dependencies• Write tests• Make changes and refactor

Testing, infrastructure, documentation are key

www.software.ac.uk

Software is hard to define(and thus hard to sustain)

www.software.ac.uk

What do we sustain:- Workflow?- Software that runs workflow?- Software referenced by workflow?

Novel reuse of public sector datahttp://www.mysociety.org

What do we sustain:- Map?- Software that creates map?

21

Sustaining Function or FormWhat do we sustain:- Function?- Form?

Context is important(otherwise all you have is an object)

www.software.ac.uk

Comb badge, Museum of London

• Without context, objects have no meaning

What’s this item?

32x28mm, lead alloy, late Medieval 14-15th century

What about repositories?

re pos i tor y⋅ ⋅ ⋅ ⋅

/noun/ [ri-poz-i-tawr-ee] • 1. a receptacle or place where things are

deposited, stored, or offered for sale.

• 2. a burial place; sepulchre.

www.software.ac.uk

The Zombie Effect

• Software not always fully alive when you reanimate it!

• Complex set of dependencies– Significant Properties of Software– Purposes and benefits of

software preservation

http://www.jisc.ac.uk/media/documents/programmes/preservation/significantpropertiesofsoftware-final.doc

http://softwarepreservation.jiscinvolve.org/wp/

Reasons are important(so you take the right approach)

www.software.ac.uk

Why are you considering software sustainability?

Achieve legal compliance

Create heritage value

Enable continued access to data and services

Encourage software reuse

Purpose

www.software.ac.uk

How are you going to choose the right approach?

Preservation (techno-centric)

Emulation (data-centric)

Migration (functionality-centric)

Transition (process-centric)

Hibernation (knowledge-centric)

Approach

www.software.ac.uk

Preservation vs sustainability

Image courtesy of RGB Kew – not for reuse

Image courtesy of London Permaculture under CC-by-nc-sa license

Preservation?

Sustainability?

www.software.ac.uk

People are important(people are infrastructure too)

www.software.ac.uk

Sustainable Communities

• Cohesion and Identity: Creating a community

• Tolerance and Diversity: Smart growth through collaboration

• Efficient use of resources: Leveraging infrastructure

• Adaptability to change: Governing sustainably

www.software.ac.uk

34

Cultivate Contributors – R project

• Basics: Website, mailing list, code repository, issue resolution

• Remove barriers to participation, increase efficiency

• 1993: First public release; 2 devs• 1995: Code open sourced; 3 devs• 1996: r-testers list set up• 1997: lists split: r-announce, r-help,

r-devel; public CVS; 11 devs• 2000: CRAN split and mirror• 2001: BioConductor• 2003: Namespaces• 2005: I8n, L8n• 2007: R-Forge• Today: BioConductor (33 core devs),

R-Forge (532 projects, 1562 devs), CRAN (1400+ packages)

http://cran.r-project.org/doc/html/interface98-paper/paper_2.html

www.software.ac.uk

We under-appreciate training

• Basic training for kitchen chef: 3-4 years

• Head chef: 10 years

• Basic training for s/w engineer: 3-4 years

• Architect: 10 years

Phot

o by

Zag

atBu

zz

• Training in S/W Dev in UG Physics: 140 hours• Training in S/W Dev in UG Geography: 0 hours

www.software.ac.uk

Software Carpentry

• Lab skills for scientific computing– http://software-carpentry.org– International initiative to teach

basics of software engineering to researchers• The “why” more than

the “how”

– We ran 13 workshopsin 2013 to 600+ learners

Incentives are important

www.software.ac.uk

Courtesy of James Howison and James HerbslebIncentives and Integration In Scientific Software Production

Rewrite by original team: address fragility

Fork to add specific functionalityMaintained separately

Optimised for hardware Facilitate hardware

sales

Exploit new techniques / architectures

And money isn’t everything

www.software.ac.uk

Fund

ing

/ St

affing

Time

Next expt. running

ExperimentRunning

Analysis ofData

New experimentdesign starts

Maintenance of software to process data from

physics experiment

So beware your bus factor

www.software.ac.uk

Summary of my talk

www.software.ac.uk

SOFT

WAR

E is

……

are IMPO

RTANT

everywhere

hard to define

long-lived

context

reasons

people

Take home messages

www.software.ac.uk

No-one sets out to write unsustainable software

Software sustainability is importantbecause it has to happen

People need the skills and incentivesto maintain software through its lifetime

Work with us – www.software.ac.uk

www.software.ac.uk

top related