2015-10-17 atlas@home wenjing wu andrej filipčič david cameron eric lancon claire adam bourdarios...

24
22/6/27 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

Upload: bethany-merritt

Post on 04-Jan-2016

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

23/4/20

ATLAS@home Wenjing Wu

Andrej FilipčičDavid Cameron

Eric LanconClaire Adam Bourdarios

& others

Page 2: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

ATLAS : Elementary Particle PhysicsOne of the biggest experiment at CERNtrying to understand the origin of mass which completes the standard model2012 , ATLAS and CMS discovered Higgs Boson

Page 3: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

23/4/20

Page 4: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

23/4/20

Page 5: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

23/4/20

data processing flow in ATLAS

Page 6: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others
Page 7: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others
Page 8: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others
Page 9: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others
Page 10: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

Why ATLAS@home

• It's free! Well, almost.• Public outreach – volunteers want to know more

about the project they participate• Good for ATLAS visibility• Can add a significant computing power to WLCG• A brief history

– Started end of 2013, at a test instance at IHEP, Beijing– Migrated to CERN and officially launched June 2014– are continuously running.

23/4/20

Page 11: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

ATLAS@home• Goal: to run ATLAS simulation jobs on volunteer com

puters. • Challenges:

– Big ATLAS software base, ~10GB, and very platform dependant , runs on Scientific Linux

– Volunteer computing resources, should be integrated into the current Grid Computing infrastructure. In other words, all the volunteer computers should appear as a WLCG site, and Jobs are submited from PanDA(ATLAS Grid Computing Portal).

– Grid Computing relies heavily on personal credentials, but these credential should not be put on volunteer computers

Page 12: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

Solutions

• Use VirtualBox+vmwrapper to virtualize volunteer hosts • Use network file system CVMFS to distribute ATLAS software,

as CVMFS supports on-demand file caching, it helps to reduce the image size.

• In order to avoid placing credential on the volunteer hosts, Arc CE is introduced in the architecture together with BOINC– Arc CE is grid middleware, it interacts with ATLAS Central Grid Services,

and manages different LRMS (Local Resource Management System), such as Condor, PBS by specific LRMS plugins

– A BOINC plugin is developped, to forward “Grid Jobs” to the BOINC server, and convert the job results into Grid format.

Page 13: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

Architecture

23/4/20

ATLAS Workload Management System

Page 14: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

BOINC ARC plugin(1)• Converts a ARC CE job into a BOINC job• The Plugin includes:

– Submit/scan/cancel job – Information provider (total CPUs, CPU usages, job status)

• Submit– ARC CE job: All input files into one tar.gz file– Copy the input file from ARC CE session directory into BOINC internal

directory– Setup BOINC environment and call BOINC command to generate a job

based on job templates/input files– Wrote the jobid back to ARC CE job control directory. – Upon job finishing, BOINC services put the desired output files back to

the ARC CE session directory

Page 15: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

BOINC ARC CE plugin(2)• Scan

– Scan the job diag file (in session directory), get the exit code, upload output files to designated SE, update ARC CE job status.

• Cancel– Cancel a BOINC job

• Information provider– Query BOINC DB, get information concerning total CPU number, CPU usage,

status of each job

Page 16: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

Current Statusgained CPU hours: 103,355 daily resource: 3% of grid computing

Page 17: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

Current Status:

Page 18: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

the Whole ATLAS Computing

Page 19: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

ATLAS jobs• Full ATLAS simulation jobs

– 10 evts/job initially– Now 100 evts/job

• A typical ATLAS simulation job– 40~80MB Input data– 10~30MB output data– on average, 92 minutes CPU time, 114 minutes elapsed time

• CPU efficiency lower than on grid– Slow home network → significant– initialization time– CPUs not available all the time

• Jobs run in an SLC5 64-bit->upgraded to SLC6 (Ucernvm)• virtualization on Windows, Linux, Mac• ANY kind of job could run onATLAS@HOME

23/4/20

Page 20: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

How Grid People see ATLAS@home

• Volunteers want to earn the credits for their contribution, they want their PCs to work optimally

– This is true for the grid sites as well, at least it should be– But volunteers are better shifters then we are

• Different to what we are used to:– On grid: jobs are failing, please fix the sites!– On Boinc: jobs suck, please fix your code!

• ATLAS@HOME is the first Boinc project massive I/O demands, even for less intensive jobs

– Server infrastructure needs to be carefully planned to cope with a high load Credentials must not be passed to PCs

• Jobs can be in the execution mode for a long time, depending on the volunteer computer preferences, not suitable for high priority tasks

23/4/20

Page 21: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

ATLAS outreach• outreach website: https://atlasphysathome.web.cern.ch/ • feedback mail list: [email protected]

Page 22: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

Future Effort (1)

• Customize the VM image to reduce the network traffic and speed up the initialization

• Optimize the file transfers, server load and job efficiency on the PCs

• Test and migrate to LHC@home infrastructure• Test if BOINC can replace the small Grid Sites• Investigation of the use of BOINC on local batch

clusters to run ATLAS jobs. • Investigation of running various worflows (longer

jobs, multi-core jobs) on virtual machines

23/4/20

Page 23: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

Future Effort(2)• provide an event display & possibly screen saver that would let people see

what they are running.

Page 24: 2015-10-17 ATLAS@home Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others

Acknowledgements

• David and Rom for all the supports and suggestions.

• CERN IT for providing Servers and Storage resources for ATLAS@home, working on integrating ATLAS@home with LHC@home