open science grid: more compute power alan de smet [email protected]

13
Open Science Grid: More compute power Alan De Smet [email protected]

Upload: chelsi

Post on 24-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Open Science Grid: More compute power Alan De Smet [email protected]. CHTC Cores In Use. (CPU days each day averaged over one month). 1,500. OSG Cores In Use. (CPU days each day averaged over one month). 60,000. Open Science Grid. CHTC and OSG usage. (CPU days each day). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

Open Science Grid:More compute power

Alan De Smet [email protected]

Page 2: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

(CPU days each day averaged over one month)

CHTC Cores In Use

1,500

Page 3: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

(CPU days each day averaged over one month)

OSG Cores In Use

60,000

Page 4: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

Open Science Grid

Page 5: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

CHTC and OSG usage

(CPU days each day)

Page 6: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

Challenges Solved

We worry about all of this.

You don’t have to.

›Authentication X.509 certificates, certificate authorities, VOMS

›Interface Globus, GridFTP, Grid universe

›Validation Linux distribution, glibc version, basic libraries

Page 7: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

Using OSG

› Before

universe = vanilla

executable = myjob

log = myjob.log

queue

Page 8: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

Using OSG

› After

universe = vanilla

executable = myjob

log = myjob.log

+WantGlidein = true

queue

Page 9: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

Challenge: Opportunistic

› OSG computers go away without notice

› Solutions Condor restarts automatically Sub-hour jobs Self-checkpointing Automated checkpointing

• Condor’s standard universe

• DMTCPhttp://dmtcp.sourceforge.net/

Page 10: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

Challenge: Local Software

Page 11: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

Challenge: Local Software

› Bare-bones Linux systems

› Solution Bring everything with you CHTC provided MATLAB and R packages

• RunDagEnv/mkdag

Page 12: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

Challenge: Erratic Failures

› Complex systems fail sometimes

› Solution Expect failures and automatically

retry DAGMan for retries DAGMan POST scripts to detect

problems• RunDagEnv/mkdag

Page 13: Open Science Grid: More compute power Alan De  Smet  chtc@cs.wisc

chtc.cs.wisc.edu

Challenge: Bandwidth

› Solutions Only send what you need Store large, shared files in our web

cache Read small amounts of data on the fly

• Condor’s standard universe• Parrot

http://www.cse.nd.edu/~ccl/software/parrot/