2015-10-17 atlas@home wenjing wu andrej filipčič david cameron eric lancon claire adam bourdarios...
TRANSCRIPT
23/4/20
ATLAS@home Wenjing Wu
Andrej FilipčičDavid Cameron
Eric LanconClaire Adam Bourdarios
& others
ATLAS : Elementary Particle PhysicsOne of the biggest experiment at CERNtrying to understand the origin of mass which completes the standard model2012 , ATLAS and CMS discovered Higgs Boson
23/4/20
23/4/20
23/4/20
data processing flow in ATLAS
Why ATLAS@home
• It's free! Well, almost.• Public outreach – volunteers want to know more
about the project they participate• Good for ATLAS visibility• Can add a significant computing power to WLCG• A brief history
– Started end of 2013, at a test instance at IHEP, Beijing– Migrated to CERN and officially launched June 2014– are continuously running.
23/4/20
ATLAS@home• Goal: to run ATLAS simulation jobs on volunteer com
puters. • Challenges:
– Big ATLAS software base, ~10GB, and very platform dependant , runs on Scientific Linux
– Volunteer computing resources, should be integrated into the current Grid Computing infrastructure. In other words, all the volunteer computers should appear as a WLCG site, and Jobs are submited from PanDA(ATLAS Grid Computing Portal).
– Grid Computing relies heavily on personal credentials, but these credential should not be put on volunteer computers
Solutions
• Use VirtualBox+vmwrapper to virtualize volunteer hosts • Use network file system CVMFS to distribute ATLAS software,
as CVMFS supports on-demand file caching, it helps to reduce the image size.
• In order to avoid placing credential on the volunteer hosts, Arc CE is introduced in the architecture together with BOINC– Arc CE is grid middleware, it interacts with ATLAS Central Grid Services,
and manages different LRMS (Local Resource Management System), such as Condor, PBS by specific LRMS plugins
– A BOINC plugin is developped, to forward “Grid Jobs” to the BOINC server, and convert the job results into Grid format.
Architecture
23/4/20
ATLAS Workload Management System
BOINC ARC plugin(1)• Converts a ARC CE job into a BOINC job• The Plugin includes:
– Submit/scan/cancel job – Information provider (total CPUs, CPU usages, job status)
• Submit– ARC CE job: All input files into one tar.gz file– Copy the input file from ARC CE session directory into BOINC internal
directory– Setup BOINC environment and call BOINC command to generate a job
based on job templates/input files– Wrote the jobid back to ARC CE job control directory. – Upon job finishing, BOINC services put the desired output files back to
the ARC CE session directory
BOINC ARC CE plugin(2)• Scan
– Scan the job diag file (in session directory), get the exit code, upload output files to designated SE, update ARC CE job status.
• Cancel– Cancel a BOINC job
• Information provider– Query BOINC DB, get information concerning total CPU number, CPU usage,
status of each job
Current Statusgained CPU hours: 103,355 daily resource: 3% of grid computing
Current Status:
the Whole ATLAS Computing
ATLAS jobs• Full ATLAS simulation jobs
– 10 evts/job initially– Now 100 evts/job
• A typical ATLAS simulation job– 40~80MB Input data– 10~30MB output data– on average, 92 minutes CPU time, 114 minutes elapsed time
• CPU efficiency lower than on grid– Slow home network → significant– initialization time– CPUs not available all the time
• Jobs run in an SLC5 64-bit->upgraded to SLC6 (Ucernvm)• virtualization on Windows, Linux, Mac• ANY kind of job could run onATLAS@HOME
23/4/20
How Grid People see ATLAS@home
• Volunteers want to earn the credits for their contribution, they want their PCs to work optimally
– This is true for the grid sites as well, at least it should be– But volunteers are better shifters then we are
• Different to what we are used to:– On grid: jobs are failing, please fix the sites!– On Boinc: jobs suck, please fix your code!
• ATLAS@HOME is the first Boinc project massive I/O demands, even for less intensive jobs
– Server infrastructure needs to be carefully planned to cope with a high load Credentials must not be passed to PCs
• Jobs can be in the execution mode for a long time, depending on the volunteer computer preferences, not suitable for high priority tasks
23/4/20
ATLAS outreach• outreach website: https://atlasphysathome.web.cern.ch/ • feedback mail list: [email protected]
Future Effort (1)
• Customize the VM image to reduce the network traffic and speed up the initialization
• Optimize the file transfers, server load and job efficiency on the PCs
• Test and migrate to LHC@home infrastructure• Test if BOINC can replace the small Grid Sites• Investigation of the use of BOINC on local batch
clusters to run ATLAS jobs. • Investigation of running various worflows (longer
jobs, multi-core jobs) on virtual machines
23/4/20
Future Effort(2)• provide an event display & possibly screen saver that would let people see
what they are running.
Acknowledgements
• David and Rom for all the supports and suggestions.
• CERN IT for providing Servers and Storage resources for ATLAS@home, working on integrating ATLAS@home with LHC@home