the alma computing project update and management approach
DESCRIPTION
The ALMA Computing Project Update and Management Approach. Brian Glendenning (1) [email protected] Gianni Raffi (2) [email protected] (1) National Radio Astronomy Observatory (NRAO), Socorro, NM, USA (2) European Southern Observatory (ESO), Munich, Germany. ALMA partner organizations. - PowerPoint PPT PresentationTRANSCRIPT
ICALEPCS’2005 - Geneva
The ALMA Computing ProjectThe ALMA Computing ProjectUpdate and Management ApproachUpdate and Management Approach
Brian Glendenning (1) [email protected] Raffi (2) [email protected]
(1) National Radio Astronomy Observatory (NRAO), Socorro, NM, USA(2) European Southern Observatory (ESO), Munich, Germany
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA partner organizations
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Project in Summary
• 64 x 12m antennas , 30-950 GHz=> Reality check: 50 antennas proposed for the time being• Array configurations:150 m-14 Km• Near S. Pedro de Atacama, Chile at 5000 m• EU and North America as equal partners Japan will add Compact Array:
12 x 7m + 4 x 12m antennas and extra correlator, receivers• 2 prototype antennas (in Socorro, NM)• Construction phase 2003-2011• Early Science foreseen for 2009
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Antenna Configurations
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Computing requirements
• Control of antennas and receivers • Correlator control/ data acquisition (input: 96 Gb/s
per antenna, output to archive up to 64 MB/s)
• On-line Pipeline(quicklook, flagging, images), Off-line Data Reduction, Telescope Calibration
• Archiving (Data rate >10MB/s - 300 TB/year)
• Observing Preparation, Scheduling– Support of novice science intent to get Sched. Blocks– Dynamic scheduling to take advantage of weather
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
Software Scope
• From the cradle… – Proposal Preparation– Proposal Review– Program Preparation– Dynamic Scheduling of Programs– Observation– Calibration & Imaging– Data Delivery & Archiving
• Afterlife: – Archival Research & VO Compliance
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMAManagement
B. GlendenningG. Raffi
K.Tatematsu
Science Software Requirements
R. Lucas
Hi Level Analysis
J. Schwarz
Software EngM.Zamparelli
Common SWG. Chiozzi
ExecutiveP. Grosbol
ControlA. Farris
ArchivingA. Wicenec
Observation Preparation
A. Bridger
Operations Support
M. Chavan
OfflineJ. McMullen
PipelineL. Davis
Telescope Calibration
R. Lucas
CorrelatorJ. Pisano
IntegrationP. Sivera
SchedulerA.Farris
ACAM.Watanabe
Trilateral Computing IPT Organisation
Total Bilateral staff now: 40 FTEs
Total trilateral staff now: 65 FTEs
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Computing
• Large but extremely distributed team• 40 Full Time Equivalent for whole E2E sw Total development effort to 2011 ~280 FTE-years• The fundamental output of the CIPT will be a ~2M SLOC “end to
end” software system running on over 200 computers on 4 continents.– (2M figure does not include comments, tests, documentation, or
adopted/modified products like AIPS++, NGAS, ATM, etc).
• Staff in 14 Institutions Europe/North America/Japan Japanese Computing fully integrated. It includes:
Staff in Japan working on ACA ~ 30 FTE-years Staff and cash for developments in Europe, US ~ 60 FTE-years
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ObservationPreparation
Scheduling
Data ReductionPipeline
Archive
Executive
ALMA Common Software
PrincipalInvestigator
1. Create observing project
2. Store observingproject
3. Get projectdefinition
4. Dispatch scheduling block id
6. Start data reduction
8. Notify PI
7.1. Get raw data & meta-data
7.2. Store science results
9. Get projectdata
ArchiveResearcher
TelescopeOperator
f. Get science data
d. Notifyof
SpecialCondition
e. StartStop
Configure
c. Alter Schedule / Override action
Control System
Correlator
Calibration Pipeline
Quick Look Pipeline
5. Execute scheduling block
5.2 Setup correlator
5.3. Storeraw data
5.4. Storemeta-data
5.6. Store calibration results
5.7. Store quick-look results
Primary functional paths Additional functions ALMA software subsystem external agent
Real-time
a. Monitorpoints
b. Monitorpoints
5.5b. Access raw data & meta-data
h. Store admin data
g. breakpointresponse
5.5a. Access raw data & meta-data
5.1. Get SB
Software Architecture
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
AOS Network 1 Gb fibers from Antenna pads
Terminal PCs(Diskless + RFI quiet)
IP-Telephony
16 CDP Beowulf nodes
10 Gb fibers to OSF
CDP Master
SRST-Router
CCC Computer
Computer Room Office
Area
Patch Panel
Patch Panel
ARTM, GPS .. (Diskless computers)
Correlator RoomPatch Panel Room
Structured copper cabling
X 64
X 250
fiber
copper
10 Gb
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA software development process
• Software to be developed in two main phases:Array sw by 2008, Observatory sw by 2011
• Incremental synchronized development via 6 monthly Releases at FIXED dates
allows adjusting priorities to status– We consider a fixed-date development pacing to be crucial in our
distributed environment• Monthly integration tags (end-of-month) and inter-subsystem interface freezes
(middle of month)• Releases every 6 months (alternating major/minor)
– We believe development of an integrated system requires integrations from the beginning to avoid the well-known “integration hell” problem
• Non regression- + User (Test Cases)-Tests (Goal:20% effort)
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA software approach
We have requirements since the beginning:• Science + Operation Requirements => Architecture => We are tracking them (vs Features, Tests, Delivery
time) (using Telelogic’s DOORS)Prototypes were done (using ACS – see below) • Software for prototype antennas, first correlatorCommon infrastructure (software rather than rules): • ALMA Common Software (ACS), started very
early and now getting more and more stable.• S/w engineering procedures, integration, tests
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ACS Concepts
Component-Container• Supports Separation of
Concerns between technology and specific applications.
• Same idea as .NET, EJB, CCM Clien
t
...
Co
ntain
er
Component 1
Component 2
Component 3
ACS Entity objectsStructured data, e.g. Scheduling Blocks to be passed between componentsdefined & serialized with XML
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Computing Project Management & Oversight
• Oversight– Yearly reviews– Assignment of “subsystem scientists”– Subsystem contact meetings
• Planning, ControlPlan coming year in some detail (high-level requirements
decomposed into granular features), place remaining features in a backlog, to be drawn in priority order
• Verify (trace) feature completion via user end tests
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
Planning: R3 Master Test Plan
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
Computing Group Communications and Reporting
Yearly Incremental Design Reviews, Review Plans revised every 6 months
TWiki is used/useful for orderly discussions Contact meetings with subsystems and among
subsytem leads Yearly subsystem leads meetings (design and
interface discussions)People meet by working together at each other’s site Videoconf more troublesome than telecons
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
Tests will grade full/partial requirements. SSR sign off on a requirement as ‘Adequate’ by grading requirements as shown in example below.
Overall Grade Test Grades
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
Status
• Passed external PDR (2003) and CDR2 (‘04) and internal CDR1(’04), CDR3 (‘05)
• Delivered R0-R3 release (+Rx.1 Releases)• Prototype control/correlator used with
prototype antennas• Every subsystem has a dedicated
astronomer, who checks developed features twice per year (release validation).
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
Status (cont.)
• Most subsystems have substantial development with infrastructure in place, external interfaces defined and implemented, and some functionality.– Most subsystems have had external user tests – Integrated tests with simulated/elementary data has taken place – internal testing of the system at the VLA site early 2006
• Antenna evaluation required significant software, but was done essentially via scripting of control components
• ACA (Japanese compact array) and Observatory Support software still in early design
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
SLOC per Subsystem over Releases
215,148
111,422
49,156
24,753
52,383
16,537 15,388
28,828
113,625
159,644
4,687
42,783
0
50,000
100,000
150,000
200,000
250,000
Subsystems
Sin
gle
lin
es o
f C
od
e
R2.0
R2.1
Aug 2005
(~850 kSLOCs Oct.05)
In-kind contributions (NGAS, AIPS++, ATM) not included
Test Interferometer Control Software prototype
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
Lessons learned
Geographical distribution with this size & pace is difficult (*): – Computing Subsystems mixed across continents (sometimes, it was inevitable)– Acceptance of common software (optimized for system, not for everybody’s taste &
mandatory. In general OK) => Requires team spirit.– Stability of interfaces among subsystems => No last minute changes– Difficulty of Integration. Subsystems tend to give priority to own development vs.
stability of system (but we are still in the early phases).=> Takes two months for an integrated system. Continuous integration remains a goal (dream?)
– In front of problems finger-pointing to “the others” occurs too quickly.– Some inefficiency has to be accepted (balanced by more discussion, better design)
We gave some thought to Agile developments.. but are at wrong end of spectrum (vs local small team).
At least: Light doc.+ Some form of emergency “pair programming” at integration time.(*) Not a statement against collaborations (typically among labs with different projects). We
believe to be a very good example of a collaborative project (Hopefully we will also have a successful software to show at the end as well).
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
Prototype Antennas at the VLA Site (New Mexico)
Vertex/RSI Alcatel/EIE
Evaluated using prototype control software (with ACS)
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
First Operator
GUI
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
OperationSupport
Facility (OSF)
ALMA Sites in Chile
60 MB/s(peak)
6 MB/s(average)
Antenna Operations Site (AOS)
Santiago Central Office (SCO)
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
Earthwork for the OSF Technical Facilities
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Operation Site Facility today
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Operation Site Facility (2900m – Atacama desert)
ALMA operated from here up to 2009
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
Antenna Operation SiteTechnical Building Concept
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Santiago Office
Support operation from Santiago with:• Final master archive • Pipeline monitoring
ALMA Regional Centers inEurope, US, Japan • Wide area network connectivity • Copies of archive data• Support of users in proposal prep. & final data reduction
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Related Papers and Posters at ICALEPCS’2005
Sat.-Sun: ALMA Common Software (ACS) Workshop
http://almasw.hq.eso.org/almasw/bin/view/ACS/ACSWorkshop2005
WE1.4-4: Advanced Hardware Technology in ALMA Back End and Correlator, F. Biancat Marchet etc.
WE4A.2-5: A generic software interface simulator for ALMA common software, D. Fugate etc.
WE2.4-6 : The ALMA Common Software ACS Status and Developments, G.Chiozzi etc.
WE3A.3-6: The ALMA Telescope Control System, A. Farris etc.
PO1.012-1: Development of the control system for the 40m radiotelescope of the OAN using the Alma Common Software, P. de Vicente etc.
PO1.032-6: Transmitting huge amounts of data design implementation and performance of the bulk data transfer mechanism in ALMA ACS, P. Di Marcantonio etc.
PO2.067-4 : ALMA Correlator Real-Time Data Processor, J.Pisano etc.
PO1.100-8 : Migration from ACS 1.1 to ACS 4 at ANKA, I.Križnar etc.
ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi
ALMA Sites: Chajnantor +
www.alma.info