infso-ri-508833 enabling grids for e-science glexec deployment models local credentials and grid...

24
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org glexec deployment models local credentials and grid identity mapping in the presence of complex schedulers David Groep NIKHEF

Upload: melinda-miles

Post on 19-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

glexec deployment models

local credentials and grid identity mapping in the presence of complex schedulers

David Groep

NIKHEF

Page 2: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 2

Enabling Grids for E-sciencE

INFSO-RI-508833

What is glexec?

glexec

a thin layerto change unix credentials

based on grid identity and attribute information

you can think of it as:• ‘a replacement for the gatekeeper’

• ‘a griddy version of Apache’s suexec(8)’

• ‘a program wrapper around LCAS, LCMAPS or GUMS’

Page 3: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 3

Enabling Grids for E-sciencE

INFSO-RI-508833

What glexec does

Input1. a certificate chain, possibly with VOMS extensions2. a user program name & arguments to run

Action1. check authorization (LCAS, GUMS)

• user credentials, proper VOMS attributes, executable name

2. acquire local credentials– local (uid, gid) pair, possibly across a cluster

3. enforce the local credential on the process

Result1. user program is run with the mapped credentials

Page 4: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 4

Enabling Grids for E-sciencE

INFSO-RI-508833

Why was glexec devised?

• gatekeeper and other schedulers are complex, and need not be run with root privileges all the time– take an example from Apache httpd, where user cgi scripts can

be run under their own identity, but without the web server itself having to run as root

– to accomplish this, a small, program is needed with setuid(2) power: ‘suexec(8)’

• variety in grid job submission systems is increasing– need a common way of obtaining and enforcing site policy and

credential mapping– without the need to modify each and every system

– as such, glexec in this deployment mode is an alternative to having authorization and mapping call-outs in each system

Page 5: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 5

Enabling Grids for E-sciencE

INFSO-RI-508833

glexec traditional deployments

There are three ‘traditional’ deployment models, where glexec has a role in two of these

1. direct per-user job submission to a ‘gatekeeper’ running with root privileges (GT2GK, today’s model)

2. a non-privileged dedicated CE or scheduler, accepting authenticated user jobs and submitting to the batch system

3. on-demand CE, submitted by VO or user to a front-end system, that then receives user jobs and submits these to the batch system

Submitting user’s identity & job

VO identity/process or VO placeholder manager

Site managed and trusted services

Page 6: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 6

Enabling Grids for E-sciencE

INFSO-RI-508833

Jobs submission today (GT2 GK)

• Deployment model without glexec (‘mode GT2GK’)– jobs are submitted with an identity (hopefully the original user’s

one) to the site Gatekeeper running as root– one job manager is run for each user on the head node– with the user’s (uid,gid) as set by the gatekeeper

Page 7: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 7

Enabling Grids for E-sciencE

INFSO-RI-508833

Glexec in a one-per-site mode

• Deployment model with a CE ‘service’– running in a non-privileged account or– with a CE run (maybe one per VO) on a single front-end per site

examples• CREAM• GT4 WS-GRAM

Page 8: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 8

Enabling Grids for E-sciencE

INFSO-RI-508833

glexec with an on-demand CE

• Deployment model with on-demand CEs (‘mode on-demand CEs’)– The user or the VO start their own scheduler on a front-end

system– All these on-demand schedulers are resource-limited by a site-

managed master scheduler (via a GT2GK or Condor)– the on-demand schedulers eat jobs for their VO or user– and set the proper identity before the job gets submitted to the

site batch system

Page 9: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 9

Enabling Grids for E-sciencE

INFSO-RI-508833

glexec with on-demand CE

• Deployment model with on-demand CEs (‘mode on-demand for VOs’ with native interface)

Page 10: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 10

Enabling Grids for E-sciencE

INFSO-RI-508833

glexec with an on-demand CE

• Deployment model with on-demand CEs (‘mode on-demand for VOs’ with legacy interface)

Page 11: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 11

Enabling Grids for E-sciencE

INFSO-RI-508833

Traditional model summary

• In all three models, the submission of the user job to the batch system is done with the original job owner’s mapped (uid, gid) identity

• grid-to-local identity mapping is done only on the front-end system (CE)

• batch system accounting provides per-user records• inspection of Unix process on worker nodes are per-user

Page 12: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 12

Enabling Grids for E-sciencE

INFSO-RI-508833

Pilot jobs

A pilot job is basically just • a small script which downloads a real job • from a repository once it starts executing, hence • it is not committed to any particular task, or perhaps even a

particular user, until that point. • If there are no tasks waiting the pilot job exits immediately. • In principle, if the time limits on the queue are long enough

a single pilot job could run more than one real job, although I'm not sure if anyone is actually doing that at the moment.

(thanks to Stephen Burke, on LCG-ROLLOUT)

Page 13: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 13

Enabling Grids for E-sciencE

INFSO-RI-508833

From the VO side

Background: some large VOs develop and prefer to use their own scheduling & job management framework

• late binding of jobs to job slots– first establishing an overlay network– subsequent scheduling and starting of jobs is faster

• hide details between the various grid flavours• implement VO priorities• full use of allocated slots, up to max wall clock time

but these VOs will need their ‘own’ scheduler– some of them do have it already,– but then others don’t and most never will, so the use of pilots

should not be the only option (or even the default) way of things

Page 14: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 14

Enabling Grids for E-sciencE

INFSO-RI-508833

Situation today

• ‘VO-type’ pilot jobs submitted as if regular user jobs– run with the identity of one or a few individuals from a VO– obtain jobs from any user (within the VO) and run that payload

on the WN allocated– site ‘sees’ only a single identity, not the true owner of the

workload

– no effective mechanisms today can deny this use model

• note that this does not apply to the regular ‘per-user’ pilot jobs

Page 15: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 15

Enabling Grids for E-sciencE

INFSO-RI-508833

Issues

Issues that drove the original glexec-on-WN scenario:

• VO supplied pilot jobs must observe and honour – the same policies the site uses for normal job execution

preferably– without requiring alternate mechanisms to describe the policies– be continuously in synch with the site policies

again, ‘per-user’ pilot jobs satisfy these rules by design

Page 16: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 16

Enabling Grids for E-sciencE

INFSO-RI-508833

Pieces of the solution

Three pieces that go together:

• glexec on the worker-node deployment– mechanism for pilot job to submit themselves and their payload

to site policy control– give incontrovertible evidence of who is running on which node

at any one time needed at selected sites for regulatory compliance ability to nail individual culprits by requiring the VO to present a valid delegation from each user

– VO should want this to keep user jobs from interfering with each other honouring site ban lists for individuals may help in not banning the

entire VO in case of an incident

Page 17: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 17

Enabling Grids for E-sciencE

INFSO-RI-508833

Pieces of the solution

• glexec on the worker-node deployment• way to keep the pilot jobs submitters to their word

– system-level auditing of the pilot jobs, to see they are not doing the user job by themselves or evading the controls

– relies on advanced auditing features of the OS (from EAL3+)– but auditing data on the WN is useful for incident investigations only

• internal accounting should be done by the VO– the regular site accounting mechanisms are via the batch system, and

will see the pilot job identity– the site can easily show from those logs the usage by the pilot job

(for which wall-clock-time accounting should be used)– making a site do accounting based glexec jobs is non-standard,

requires effort, may be intrusive, and messes up normal accounting– ‘a VO capable of writing their own submission framework, ought to be

able to write their own accounting system as well …’

Page 18: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 18

Enabling Grids for E-sciencE

INFSO-RI-508833

glexec on WN deployment model

• VO submits a pilot job to the batch system– the VO ‘pilot job’ submitter is responsible for the pilot behaviour

this might be a specific role in the VO, or a locally registered ‘badged’ user at each site

• Pilot job is subject to normal site policies for jobs• Pilot job obtains the true user job,

and presents the user credentials and the job (executable name) to the site (glexec) to request a decision

Submitting user’s identity & job

VO identity/process or VO placeholder manager

Site managed and trusted services

Page 19: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 19

Enabling Grids for E-sciencE

INFSO-RI-508833

VO pilot job on the node

Note: proper uid change by Gatekeeper or Condor-C/BLAHP on head node should remain default

• On success: the site will set the uid/gid of the new user’s job• On failure: glexec will return with an error, and pilot job can terminate or obtain other job

Page 20: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 20

Enabling Grids for E-sciencE

INFSO-RI-508833

What is needed in this model?

1. Agreement on the three ingredients• deployment of glexec on the WN to do setuid• detailed auditing on the head node and the WNs• site accounting done at the VO (i.e. pilot job) level

2. glexec• needs feature enhancements compared to single-CE version• see status of glexec on the next slide

3. Inspection of the audit logs• detect abuse patterns in the system-call auditing logs

4. Grid job logging capabilities• glexec will log (uid, user/system/real time usage) via syslog• credential mapping framework (LCMAPS) will log mapping

(also via syslog)• centralisation of glexec mappings, e.g. via JobRepository

Page 21: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 21

Enabling Grids for E-sciencE

INFSO-RI-508833

Status today

• Status of ‘glexec’ today– implementation ready & tested,

based off the Apache HTTP suexec code base– uses the LCAS and LCMAPS for enforcement and mapping in their

library-based implementation– new modules have been added

LCAS: RSL (executable path) constraints validation of cert chain and proxy lifetime

– restrictions policy should be located on local posix-accessible file systems policy transport should be ‘trustworthy’

• Needed specifically for the –on-WN model– make the credential acquisition process (LCAS/LCMAPS) work

with a site-central policy engine enforcement will have to stay local

– changeover to standard callouts for both are needed– needs more site-sysadmin configuration capabilities

Page 22: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 22

Enabling Grids for E-sciencE

INFSO-RI-508833

Needed components, procedures

• Auditing the VO placeholder job/scheduler on the WN– check number of ‘fork-execs’ done by the placeholder with the

number of glexec invocationsa discrepancy means the VO is cheating on you

– check the VO placeholder job is not using too much CPUthe CPU-time / Walltime should be close to zero

• credential mapping auditing/logging– ‘JobRepository’ fits the bill

schema allows for recording and retrieving all aspects of credential mapping

records both user identity and any VO attributes retains the credential mapping for each ‘job’ or glexec invocation

– JR is part of the stack, but not widely deployed yet

Page 23: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 23

Enabling Grids for E-sciencE

INFSO-RI-508833

Notes and alternatives

• glexec, like any site-managed ingress point, trusts the submitter not to have mixed up the user credentials and the jobs– we trust the RB today do this correctly, and RBs are unknown

quantities to the receiving site

• a longer term solution is to have the job request singed by the submitting user– since the description is modified by intermediaries (brokers), the

signature can only be to the original content, and the site would have to evaluate whether the job received matches the signed JDL

– or use an inheritance model for the job description, and treat the job like you would, e.g., a CIM entity

Page 24: INFSO-RI-508833 Enabling Grids for E-sciencE  glexec deployment models local credentials and grid identity mapping in the presence of complex

glexec deployment models, LCG Operations W/S June 19-20, 2006 24

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• Realize that today some VOs are doing ‘pilot’ jobs today – there is no effective enforcement against this– some sites may even just don’t care yet, whilst others have hard

requirements on auditability and regulatory compliance

• The glexec-on-WN model gives the VOs tools to comply with site requirements– at least makes it ‘better’ than it is today– but you, as a site, will miss that warm and fuzzy feeling of trust

• a glexec-on-WN is always replaceable by the ‘null operation’ for sites that don’t care or want it– but realize this is for just one of the glexec deployment models