using condor glide-ins and gcb to run on the grid the cdf ...€¦ · european condor week 2006 -...
TRANSCRIPT
![Page 1: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/1.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 1
European Condor Week 2006European Condor Week 2006
Using Condor Glide-Ins and GCB to run on the Grid
The CDF experience
Elliot Lipeles, Matthew Norman, Frank Würthwein,University of California, San Diego
Subir Sarkar, Igor Sfiligoi,INFN
![Page 2: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/2.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 2
Traditional use of Condor in CDFTraditional use of Condor in CDF
Single Point of Submission system● CAF Daemons accept and authenticate user jobs, handle output.● Condor Schedd does the real job
CAF (Portal)
Worker Node (WN)
Worker Node (WN)
Startd
Startd
Collector
Schedd
SubmitterDaemons Negotiator
User jobsUser jobs
Legacyprotocol
MonitoringDaemons
CDF Analysis Farm (CAF)...
![Page 3: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/3.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 3
Had to move to the GridHad to move to the Grid
“The Grid”
●HEP moving to the Grid●Nobody wants to finance dedicated nodes, anymore
●Need to move●Want to preserve user interface
![Page 4: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/4.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 4
Why not plain Condor-G?Why not plain Condor-G?
Works, but...●No central matchmaking●No control over priorities●Site selection problem●Black holes can eat most of the user jobs
Grid Pool
Globus
SubmitterDaemons
Grid Pool
Globus
Batch queue
CAF (Portal)
User jobsUser jobs
Schedd
Batch queue
Batch Slot
User Job
![Page 5: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/5.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 5
All jobsdone
CDF decided to go with Condor glide-insCDF decided to go with Condor glide-ins
Grid Pool
Globus
SubmitterDaemons
Grid Pool
Globus
Batch queue
CAF (Portal)
User jobsUser jobs
Schedd
Batch queue
Collector/Neg
ScheddGlid
eke
eper
Monit
or
Monit
or
Glid
ein
sG
lidein
s Batch Slot
Glidein
User Job
Glide-ins solve all the problems
![Page 6: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/6.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 6
What is the GlidekeeperWhat is the Glidekeeper
● Just a simple script– Checks number jobs and number glide-ins– Makes sure there is always at least a glide-
in per CE, when idle jobs in the queue● Does not need to be complex
– Just blindly submit to all the CEs– But could be made as complex as needed
![Page 7: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/7.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 7
What is a glide-inWhat is a glide-in
condor_config
DAEMON_LIST = MASTER,STARTDNEGOTIATOR_HOST = $(HEAD_NODE)COLLECTOR_HOST = $(HEAD_NODE)
TMOUT=288000MaxJobRetirementTime=$(TMOUT)SHUTDOWN_GRACEFUL_TIMEOUT=$(TMOUT)
# How long will it wait in an unclaimed stateSTARTD_NOCLAIM_SHUTDOWN = 1200
HEAD_NODE = cdfhead.fnal.gov
glidein_startup.sh
validate_node()local_config()./condor_master -r $mins -dyn -f
● Glide-ins are properly configured startds– Same old binaries used
● CDF uses a startup script to validatethe node before starting startd– Prevent job failure
![Page 8: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/8.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 8
CDF using glide-ins for a year, now CDF using glide-ins for a year, now
CNAF one year history plot: Note over a thousand running jobs for long period of time.
SDSC six months history plot
CNAF (Bologna, Italy), the first GlideCAF SDSC (San Diego, CA, USA)
Fermilab (Batavia, IL, USA) IN2P3 (Lyon, France) MIT (Boston, MA, USA)
1k1k
![Page 9: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/9.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 9
Problems found along the roadProblems found along the road
● File delivery● Firewalls● Security concerns
![Page 10: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/10.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 10
Problems found along the roadProblems found along the road
● File delivery● Firewalls● Security concerns
![Page 11: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/11.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11
Condor binaries deliveryCondor binaries delivery
glidein.submit
Universe = GlobusGlobusScheduler = mysite/jobmanager-mybatch
Executable = glidein_startup.shtransfer_Input_files = condor.tgz
Queue
Only the executable guaranteedto be transfered on EGEE.
Other files need gLite tools.Works fine on OSG
Cannot useCondor-G
transfer mechanism!
![Page 12: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/12.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 12
glidein_startup.sh
validate_node()local_config()tar -czf condor.tgz./condor_master -r $retmins -dyn -f
HTTPd for file transferHTTPd for file transfer
glidein_startup.sh
validate_node()wget http://cdfhead.fnal.gov/condor.tgzsha1sum knownSHA1 condor.tgzif [ $? -eq 0 ]; then tar -xzf condor.tgz local_config() ./condor_master -r $retmins -dyn -ffi
GlideCAF (Portal)
Collector/Neg
Main Schedd
Glidekeeper Glide-inSchedd
HTTPd (1)
(2)
(3)
(4)SubmitterDaemons
(2)
(3)
HTTPd serves files
![Page 13: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/13.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 13
Add HTTP Proxy to reduce loadAdd HTTP Proxy to reduce load
GlideCAF (Portal)
Collector/Neg
Main Schedd
Glidekeeper Glide-inSchedd
HTTPd (1) (2)
(3)
(4)SubmitterDaemons
glidein_startup.sh
validate_node()env http_proxy=proxy1.fnal.gov wget http://cdfhead.fnal.gov/condor.tgzsha1sum knownSHA1 condor.tgzif [ $? -eq 0 ]; then tar -xzf condor.tgz local_config() ./condor_master -r $retmins -dyn -ffi
HTTP Proxy(once)
![Page 14: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/14.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 14
Problems found along the roadProblems found along the road
● File delivery● Firewalls● Security concerns
![Page 15: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/15.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 15
Working with remote Grid sites - DumbWorking with remote Grid sites - Dumb
Collector
MainSchedd
Available Batch Slot
StartdGlobus
WAN ??
UDP packetsoften lostover WAN
Available Batch Slot
Startd
Firewall/NAT
Firewalls and NATsmake it impossible
GlobusGlide-inSchedd
Available Batch Slot
Startd
Globus
UD
P
UD
P
User Job User Job
![Page 16: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/16.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 16
Working with remote Grid sites - SmartWorking with remote Grid sites - Smart
Collector
MainSchedd
Available Batch Slot
StartdGlobus
WAN
TCPTC
P
Available Batch Slot
Startd
GlobusGlide-inSchedd
Available Batch Slot
Startd
Globus
UD
P
UD
P
User Job User Job
UDPis fast
TCP trafficis reliable
Firewall/NAT
Use GCB to bridge firewalls
GCB
Outgoing
TCP User Job
![Page 17: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/17.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 17
Generic Connection Brokering (GCB)Generic Connection Brokering (GCB)
● Condor proxy service– Fully integrated in
Condor– Just a configuration
parameter● For scalability, use
multiple instances– Point different
glide-ins to different GCBs
condor_config
DAEMON_LIST = MASTER,STARTDNEGOTIATOR_HOST = $(HEAD_NODE)COLLECTOR_HOST = $(HEAD_NODE)
TMOUT=288000MaxJobRetirementTime=$(TMOUT)SHUTDOWN_GRACEFUL_TIMEOUT=$(TMOUT)
# How long will it wait in an unclaimed stateSTARTD_NOCLAIM_SHUTDOWN = 1200
HEAD_NODE = cdfhead.fnal.gov
# GCB configurationNET_REMAP_INAGENT = cdfgcb1.fnal.govNET_REMAP_ENABLE = trueNET_REMAP_SERVICE = GCB
![Page 18: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/18.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 18
How GCB worksHow GCB works
Firewall/NAT
Collector
Schedd
Grid Pool
GCB Broker
Startd Globus
4) Schedd address forwarded
3) Send own address
6) A job can be sent to the startd
GCB must be on a public network
TCP
2) Advertise itself and the GCB connection
5) Establish a TCP connection to the schedd
Just a proxy
User Job
1) Establish a persistent TCP connection to a GCB
TCP
![Page 19: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/19.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 19
The use of GCB at CDFThe use of GCB at CDF
● Using a pre-production version of GCB– Waiting for Condor v6.7.20
● N.American CAF just opened to users– Condor collector and schedds at Fermilab– Using 2 GCBs, both at Fermilab– Gliding into Fermilab, MIT and San Diego– Rest of US and Canada Grid sites to follow
![Page 20: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/20.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 20
Problems found along the roadProblems found along the road
● File delivery● Firewalls● Security concerns
![Page 21: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/21.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 21
Security in the pull modelSecurity in the pull model
● Real user known only after glide-in starts– Cannot use user proxy when submitting to CE– Real user proxy delivered by Condor
● User job can use it for further authentication
● All glide-ins use a single service proxy– Site admin does not know about the real user– All jobs potentially run under the same UID
● No system protection between users
● Glide-in and job run under the same UID– User can steal service proxy
![Page 22: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/22.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 22
Security problem at a glanceSecurity problem at a glance
SubmitterDaemons
Grid Pool
Globus
CAF (Portal)
User jobsUser jobs
ScheddBatch queue
Collector/Neg
ScheddGlid
eke
eper
Monit
or
Monit
or
Glid
ein
sG
lidein
s
Batch Slot
Glidein
User Job
Batch Slot
Glidein
User Job
Node
uidXProxyA
uidXProxyB
Pro
xyX
![Page 23: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/23.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 23
Use gLExec to authenticate on the WN Use gLExec to authenticate on the WN
● Authenticate on WN before starting the user job– Can change UID on the fly
● Grid admin knows who is running
GlobusBatch queue
Batch Slot
GlideinuidX
Pro
xyX
User Job ProxyAuidA
gLExecPro
xyA
Pro
xyA
Schedd
Collector/Neg
Schedd
GridPool
ProxyX
![Page 24: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/24.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 24
Status of gLExecStatus of gLExec
● Current gLExec works only on CE– Cannot use it as-is
● Work in progress to make it usable for the glideins– See later talk
● Condor team promised to make use of gLExec once available
![Page 25: Using Condor Glide-Ins and GCB to run on the Grid The CDF ...€¦ · European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 11 Condor binaries delivery](https://reader033.vdocument.in/reader033/viewer/2022060518/604b35cc3a6b3629f7579792/html5/thumbnails/25.jpg)
European Condor Week 2006 - CDF experience with Condor glide-ins and GCB - Igor Sfiligoi 25
SummarySummary
● Glide-ins and GCB took CDF to the Grid– Without leaving the Condor environment
● Using the pull model made our life easy– No need to select a site in advance– Matchmaking done at a global level– Policies kept in CDF hands
● No need for any add-on at Grid sites– Will use what is found
● gLExec and a local Squid desirable