the alma computing project update and management approach

31
ICALEPCS’2005 - Geneva The ALMA Computing The ALMA Computing Project Project Update and Management Update and Management Approach Approach Brian Glendenning (1) [email protected] Gianni Raffi (2) [email protected] (1) National Radio Astronomy Observatory (NRAO), Socorro, NM, USA (2) European Southern Observatory (ESO), Munich, Germany

Upload: mahsa

Post on 12-Jan-2016

15 views

Category:

Documents


0 download

DESCRIPTION

The ALMA Computing Project Update and Management Approach. Brian Glendenning (1) [email protected] Gianni Raffi (2) [email protected] (1) National Radio Astronomy Observatory (NRAO), Socorro, NM, USA (2) European Southern Observatory (ESO), Munich, Germany. ALMA partner organizations. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva

The ALMA Computing ProjectThe ALMA Computing ProjectUpdate and Management ApproachUpdate and Management Approach

Brian Glendenning (1) [email protected] Raffi (2) [email protected]

(1) National Radio Astronomy Observatory (NRAO), Socorro, NM, USA(2) European Southern Observatory (ESO), Munich, Germany

Page 2: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA partner organizations

Page 3: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA Project in Summary

• 64 x 12m antennas , 30-950 GHz=> Reality check: 50 antennas proposed for the time being• Array configurations:150 m-14 Km• Near S. Pedro de Atacama, Chile at 5000 m• EU and North America as equal partners Japan will add Compact Array:

12 x 7m + 4 x 12m antennas and extra correlator, receivers• 2 prototype antennas (in Socorro, NM)• Construction phase 2003-2011• Early Science foreseen for 2009

Page 4: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA Antenna Configurations

Page 5: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA Computing requirements

• Control of antennas and receivers • Correlator control/ data acquisition (input: 96 Gb/s

per antenna, output to archive up to 64 MB/s)

• On-line Pipeline(quicklook, flagging, images), Off-line Data Reduction, Telescope Calibration

• Archiving (Data rate >10MB/s - 300 TB/year)

• Observing Preparation, Scheduling– Support of novice science intent to get Sched. Blocks– Dynamic scheduling to take advantage of weather

Page 6: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

Software Scope

• From the cradle… – Proposal Preparation– Proposal Review– Program Preparation– Dynamic Scheduling of Programs– Observation– Calibration & Imaging– Data Delivery & Archiving

• Afterlife: – Archival Research & VO Compliance

Page 7: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMAManagement

B. GlendenningG. Raffi

K.Tatematsu

Science Software Requirements

R. Lucas

Hi Level Analysis

J. Schwarz

Software EngM.Zamparelli

Common SWG. Chiozzi

ExecutiveP. Grosbol

ControlA. Farris

ArchivingA. Wicenec

Observation Preparation

A. Bridger

Operations Support

M. Chavan

OfflineJ. McMullen

PipelineL. Davis

Telescope Calibration

R. Lucas

CorrelatorJ. Pisano

IntegrationP. Sivera

SchedulerA.Farris

ACAM.Watanabe

Trilateral Computing IPT Organisation

Total Bilateral staff now: 40 FTEs

Total trilateral staff now: 65 FTEs

Page 8: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA Computing

• Large but extremely distributed team• 40 Full Time Equivalent for whole E2E sw Total development effort to 2011 ~280 FTE-years• The fundamental output of the CIPT will be a ~2M SLOC “end to

end” software system running on over 200 computers on 4 continents.– (2M figure does not include comments, tests, documentation, or

adopted/modified products like AIPS++, NGAS, ATM, etc).

• Staff in 14 Institutions Europe/North America/Japan Japanese Computing fully integrated. It includes:

Staff in Japan working on ACA ~ 30 FTE-years Staff and cash for developments in Europe, US ~ 60 FTE-years

Page 9: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ObservationPreparation

Scheduling

Data ReductionPipeline

Archive

Executive

ALMA Common Software

PrincipalInvestigator

1. Create observing project

2. Store observingproject

3. Get projectdefinition

4. Dispatch scheduling block id

6. Start data reduction

8. Notify PI

7.1. Get raw data & meta-data

7.2. Store science results

9. Get projectdata

ArchiveResearcher

TelescopeOperator

f. Get science data

d. Notifyof

SpecialCondition

e. StartStop

Configure

c. Alter Schedule / Override action

Control System

Correlator

Calibration Pipeline

Quick Look Pipeline

5. Execute scheduling block

5.2 Setup correlator

5.3. Storeraw data

5.4. Storemeta-data

5.6. Store calibration results

5.7. Store quick-look results

Primary functional paths Additional functions ALMA software subsystem external agent

Real-time

a. Monitorpoints

b. Monitorpoints

5.5b. Access raw data & meta-data

h. Store admin data

g. breakpointresponse

5.5a. Access raw data & meta-data

5.1. Get SB

Software Architecture

Page 10: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

AOS Network 1 Gb fibers from Antenna pads

Terminal PCs(Diskless + RFI quiet)

IP-Telephony

16 CDP Beowulf nodes

10 Gb fibers to OSF

CDP Master

SRST-Router

CCC Computer

Computer Room Office

Area

Patch Panel

Patch Panel

ARTM, GPS .. (Diskless computers)

Correlator RoomPatch Panel Room

Structured copper cabling

X 64

X 250

fiber

copper

10 Gb

Page 11: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA software development process

• Software to be developed in two main phases:Array sw by 2008, Observatory sw by 2011

• Incremental synchronized development via 6 monthly Releases at FIXED dates

allows adjusting priorities to status– We consider a fixed-date development pacing to be crucial in our

distributed environment• Monthly integration tags (end-of-month) and inter-subsystem interface freezes

(middle of month)• Releases every 6 months (alternating major/minor)

– We believe development of an integrated system requires integrations from the beginning to avoid the well-known “integration hell” problem

• Non regression- + User (Test Cases)-Tests (Goal:20% effort)

Page 12: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA software approach

We have requirements since the beginning:• Science + Operation Requirements => Architecture => We are tracking them (vs Features, Tests, Delivery

time) (using Telelogic’s DOORS)Prototypes were done (using ACS – see below) • Software for prototype antennas, first correlatorCommon infrastructure (software rather than rules): • ALMA Common Software (ACS), started very

early and now getting more and more stable.• S/w engineering procedures, integration, tests

Page 13: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ACS Concepts

Component-Container• Supports Separation of

Concerns between technology and specific applications.

• Same idea as .NET, EJB, CCM Clien

t

...

Co

ntain

er

Component 1

Component 2

Component 3

ACS Entity objectsStructured data, e.g. Scheduling Blocks to be passed between componentsdefined & serialized with XML

Page 14: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA Computing Project Management & Oversight

• Oversight– Yearly reviews– Assignment of “subsystem scientists”– Subsystem contact meetings

• Planning, ControlPlan coming year in some detail (high-level requirements

decomposed into granular features), place remaining features in a backlog, to be drawn in priority order

• Verify (trace) feature completion via user end tests

Page 15: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

Planning: R3 Master Test Plan

Page 16: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

Computing Group Communications and Reporting

Yearly Incremental Design Reviews, Review Plans revised every 6 months

TWiki is used/useful for orderly discussions Contact meetings with subsystems and among

subsytem leads Yearly subsystem leads meetings (design and

interface discussions)People meet by working together at each other’s site Videoconf more troublesome than telecons

Page 17: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

Tests will grade full/partial requirements. SSR sign off on a requirement as ‘Adequate’ by grading requirements as shown in example below.

Overall Grade Test Grades

Page 18: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

Status

• Passed external PDR (2003) and CDR2 (‘04) and internal CDR1(’04), CDR3 (‘05)

• Delivered R0-R3 release (+Rx.1 Releases)• Prototype control/correlator used with

prototype antennas• Every subsystem has a dedicated

astronomer, who checks developed features twice per year (release validation).

Page 19: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

Status (cont.)

• Most subsystems have substantial development with infrastructure in place, external interfaces defined and implemented, and some functionality.– Most subsystems have had external user tests – Integrated tests with simulated/elementary data has taken place – internal testing of the system at the VLA site early 2006

• Antenna evaluation required significant software, but was done essentially via scripting of control components

• ACA (Japanese compact array) and Observatory Support software still in early design

Page 20: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

SLOC per Subsystem over Releases

215,148

111,422

49,156

24,753

52,383

16,537 15,388

28,828

113,625

159,644

4,687

42,783

0

50,000

100,000

150,000

200,000

250,000

Subsystems

Sin

gle

lin

es o

f C

od

e

R2.0

R2.1

Aug 2005

(~850 kSLOCs Oct.05)

In-kind contributions (NGAS, AIPS++, ATM) not included

Test Interferometer Control Software prototype

Page 21: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

Lessons learned

Geographical distribution with this size & pace is difficult (*): – Computing Subsystems mixed across continents (sometimes, it was inevitable)– Acceptance of common software (optimized for system, not for everybody’s taste &

mandatory. In general OK) => Requires team spirit.– Stability of interfaces among subsystems => No last minute changes– Difficulty of Integration. Subsystems tend to give priority to own development vs.

stability of system (but we are still in the early phases).=> Takes two months for an integrated system. Continuous integration remains a goal (dream?)

– In front of problems finger-pointing to “the others” occurs too quickly.– Some inefficiency has to be accepted (balanced by more discussion, better design)

We gave some thought to Agile developments.. but are at wrong end of spectrum (vs local small team).

At least: Light doc.+ Some form of emergency “pair programming” at integration time.(*) Not a statement against collaborations (typically among labs with different projects). We

believe to be a very good example of a collaborative project (Hopefully we will also have a successful software to show at the end as well).

Page 22: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

Prototype Antennas at the VLA Site (New Mexico)

Vertex/RSI Alcatel/EIE

Evaluated using prototype control software (with ACS)

Page 23: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

First Operator

GUI

Page 24: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

OperationSupport

Facility (OSF)

ALMA Sites in Chile

60 MB/s(peak)

6 MB/s(average)

Antenna Operations Site (AOS)

Santiago Central Office (SCO)

Page 25: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

Earthwork for the OSF Technical Facilities

Page 26: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA Operation Site Facility today

Page 27: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA Operation Site Facility (2900m – Atacama desert)

ALMA operated from here up to 2009

Page 28: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

Antenna Operation SiteTechnical Building Concept

Page 29: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA Santiago Office

Support operation from Santiago with:• Final master archive • Pipeline monitoring

ALMA Regional Centers inEurope, US, Japan • Wide area network connectivity • Copies of archive data• Support of users in proposal prep. & final data reduction

Page 30: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA Related Papers and Posters at ICALEPCS’2005

Sat.-Sun: ALMA Common Software (ACS) Workshop

http://almasw.hq.eso.org/almasw/bin/view/ACS/ACSWorkshop2005

WE1.4-4: Advanced Hardware Technology in ALMA Back End and Correlator, F. Biancat Marchet etc.

WE4A.2-5: A generic software interface simulator for ALMA common software, D. Fugate etc.

WE2.4-6 : The ALMA Common Software ACS Status and Developments, G.Chiozzi etc.

WE3A.3-6: The ALMA Telescope Control System, A. Farris etc.

PO1.012-1: Development of the control system for the 40m radiotelescope of the OAN using the Alma Common Software, P. de Vicente etc.

PO1.032-6: Transmitting huge amounts of data design implementation and performance of the bulk data transfer mechanism in ALMA ACS, P. Di Marcantonio etc.

PO2.067-4 : ALMA Correlator Real-Time Data Processor, J.Pisano etc.

PO1.100-8 : Migration from ACS 1.1 to ACS 4 at ANKA, I.Križnar etc.

Page 31: The ALMA Computing Project Update and Management Approach

ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi

ALMA Sites: Chajnantor +

www.alma.info