na61/na49 virtualisation: status and plans dag toppe larsen cern 08.10.2012

22
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

Upload: jonas-terry

Post on 17-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

NA61/NA49 virtualisation:status and plans

Dag Toppe LarsenCERN 08.10.2012

Page 2: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 2

Outline Quick reminder of CERNVM and installation Tasks

Each task in detail Roadmap Input needed

Page 3: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 3

CERNVM CERNVM is a Linux-distribution

Designed specifically for virtual machines (VMs) Based on SLC (currently SLC5) Compressed image size ~300MB Both 32-bit and 64-bit versions

Addition software “Standard” software via Conary package manager Experiment software via CVMFS

Contextualisation: images adapted to experiment requirements during boot

Data preservation: all images are permanently preserved

Page 4: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 4

CVMFS Distributed read-only file system for CERNVM (i.e. the

same as AFS for LXPLUS) Can also be used by “real” machines (e.g. LXPLUS, grid) Files compressed and distributed via HTTP

Global availability Central server, site replication via standard HTTP proxies Files decompressed and cached on (CERNVM) computer

Can run without Internet access if all needed files are cached

Mainly for experimental software, but also other “static” data (e.g. calibration data)

Each experiment has a repository to store all versions of software

Common software (e.g. ROOT) available from SFT repository

Page 5: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 5

Data preservation As technology evolves, no longer possible to

run legacy software on modern platforms Must be preserved and accessible:

Experiment data Experiment software Operating environment (operating system, libraries,

compilers, hardware) Just preserving data and software is not

enough Virtualisation may preserve operating environment

Page 6: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 6

CERNVM data preservation “Solution”:

Experiment data stored on Castor Experiment software versions stored on CVMFS

HTTP “lasting” technology Operation environments stored as CERNVM image

versions Thus, a legacy version of CERNVM can be

started as a VM, running a legacy version of experiment software

Forward-looking approach (we start preserving now)

Page 7: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 7

CernVM for development CernVM makes it possible to run production

version of legacy software/shine on laptop without local install

Also possible to compile Shine from SVN on CernVM “out of the box” when the proper NA61 environment is set up

Page 8: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 8

CernVM installation on laptop Install a hypervisor of your choice, e.g. Virtualbox:

https://www.virtualbox.org/ Download a matching CernVM desktop image:

http://cernvm.cern.ch/portal/downloads Open http://<ipaddress>:8004 in your web browser (user=admin,

password=password) Select NA61 and PH-SFT software repositories Reboot You are now ready to use NA61 software in CernVM on your

laptop! More information: http://cernvm.cern.ch/portal/cvmconfiguration,

https://twiki.cern.ch/twiki/bin/viewauth/NA61/NewOFInstallation (CernVM section)

Page 9: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 9

Tasks Make experiment software available Facilitate batch processing Validate outputs On-demand virtual clusters Reference cloud cluster Data (re)production scripts Production reconstruction Data production web interface

Page 10: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 10

Make experiment software available NA61/NA49 software must be available on

CVMFS for CernVM to process data NA61

Legacy software chain installed Changes to be fed back to SVN

SHINE software installed ROOT and other dependencies provided via CVMFS SVN checkout compiles “out of the box” Using 32-bit CernVM image

NA49 Software has been installed

Page 11: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 11

Facilitate batch processing LXPLUS uses PBS batch system CernVM uses Condor batch system “Philosophical” differences

PBS has one job script per job Condor has common job description file with

parameters for each job Existing PBS scripts have been ported to

Condor

Page 12: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 12

Output validation – status Run 8688 has been processed on both

CernVM/CVMFS and LXPLUS/AFS, using software version v2r7g According to analysis by Grzegorz, there are

relatively small discrepancies Despite gap TPC not running on CernVM/CVMFS,

even if same set-up file and working on LXBATCH/CVMFS

When bug has been found, should repeat CernVM/CVMFS, LXBATCH/CVMFS and LXBATCH/AFS comparison

Page 13: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 13

On-demand virtual clusters A cluster may need VMs of different

configurations, depending on type of jobs Memory, CernVM version, experiment SW, etc.

Thus, need for dynamic creation/destruction of virtual cluster

Created command-line script for creating virtual clusters Later to be controlled by data production web

interface

Page 14: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 14

Test production reconstruction To run on private cloud and LXCLOUND

Currently, the private cloud has more resources, LXCLOUD the final target, important to do testing

on it Data can currently be processed “by hand” Have tested the (re)production scripts, some

modifications need Output should be compared/validated to the

output from normal LXBATCH production Once this successful, request more LXCLOUD

resources

Page 15: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 15

Private reference cloud cluster The virtual machines require a cluster of

physical hosts A reference cloud cluster has been created

Private cloud Currently 24 cores Set-up may be replicated on other sites wishing to

provide cloud/CernVM resources

Page 16: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 16

Cloud cluster The virtual machines require a cluster of

physical hosts A LXCLOUD cloud cluster has been created

Provided by CERN IT New service, currently “experimental”

Currently allocated 4 virtual machines May be expanded to include more VMs Will push for this once complete processing chain is

ready

Page 17: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 17

Data processing web interface A web interface for processing of the data to be

created Interface to bookkeeping system to extract

runs/chunks belonging to reactions List all existing raw/processed data with status (e.g.

software versions used for processing) Easy selection of data for (re)processing with

selected OS and software version A virtual on-demand cluster is created After processing, data written back to Castor

Using EC2 interface for the cloud management Allows for great flexibility of processing site

Page 18: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 18

Data processing scripts Created script for submitting reaction for processing

Input: Reaction name Software version Global key (CernVM version)

Needs some “tuning” (e.g. better create set-up files from global key) Needs some improvement of job description files (include SHOE

formats, PSD data) Created script for resubmit failed jobs

Failed jobs identified from: Non-existing/empty/small output DSPACK, SHOE, ROOT files Failed/exited/terminated chunks/events After resubmitting fixed number of times (3?), give up

Mostly working OK, but a small number false positives (short runs with only 1 or 2 “empty” events)

Page 19: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 19

Data processing web interface & scripts

Data processing web interface a front-end to the data processing scripts

Reaction list from bookkeeping system Reaction run list from bookkeeping system Software list from CVMFS directory tree Global key list from local data base? User selects data and parameters, and click

“process”.

Page 20: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 20

RoadmapTask Status/done Remaining Expected

NA61 software installation

OK Gap TPC not running November?

NA49 software installation

OK Data validation November?

Facilitate batch system

OK OK November?

Validate outputs In progress Rerun after fixing gap TPC

November?

On-demand virtual cluster

OK OK OK

Production reconstruction

Cluster ready Some improvements to scripts

October

Reference cloud cluster

OK Documentation November

Data processing web interface

Created scripts for data (re)processing

Create web interface November

Page 21: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 21

Next steps Parallel task 1

Understand source of GAP TPC not running

Rerun validation

Parallel task 2

Finalise data processing scripts

Run large-scale processing using scripts from command line

Request larger LXCLOUD

Transfer to NA61

Parallel task 3

Create web interface

Test web interface

Page 22: NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN 08.10.2012

08.10.2012 NA61/NA49 meeting, CERN 22

Input needed NA49 validation NA61 gap TPC Please keep virtualisation (CernVM/CVMFS) in

mind when making plans ...