grid security 1. grid security is a crucial component need for secure communication authenticated (...

48
Grid Security 1

Upload: miles-raymond-joseph

Post on 29-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Grid Security

1

Page 2: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Grid security is a crucial component Need for secure communication

Authenticated (verify entities are who they claim to be -> use certificates

and CAs) Confidential - only invited to understand conversation (use encryption)

between grid elements Need to support security across organizational boundaries

No centrally managed security system

Need to support “single sign-on” for users of grid Delegation of credentials for computations that involve multiple

resources and/or sites allowing or denying access to services based on policies (authorization)

2

Page 3: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Identity & Authentication

Each entity should have an identity Authenticate: Establish identity

Is the entity who he claims he is ? Examples:

Driving License Username/password

Stops masquerading imposters

Page 4: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Authorization

Establishing rights What can a certain identity do ?

Examples: Are you allowed to be on this flight ?

Passenger ? Pilot ?

Unix read/write/execute permissions Must authenticate first

Page 5: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Single Sign-on

Important for complex applications that need to use Grid resources Enables easy coordination of varied resources Enables automation of process Allows remote processes and resources to act on user’s

behalf Authentication and Delegation

Page 6: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Asymmetric Encryption

• Encryption and decryption functions that use a key pair are called asymmetric– Keys are

mathematically linked

Page 7: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Public and Private Keys

• With asymmetric encryption each user can be assigned a key pair:

a private and a public key

Private key is known only to owner

Public key is given away to the world

• Encrypt with public key, can decrypt with only private key

• Message Privacy

Page 8: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Public Key Infrastructure (PKI)• PKI allows you to know that

a given public key belongs to a given user

• PKI builds off of asymmetric encryption:– Each entity has two keys:

public and private– The private key is known

only to the entity

• GSI is based on PKI• The public key is given to

the world encapsulated in a X.509 certificate

Owner

Page 9: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Certificates• Central concept in GSI authentication• Every user, resource and service on Grid is

identified via a certificate• Contains:

– Subject name (identifies entity)– Corresponding public key– Identity of the CA that has signed the cert (to certify

that the public key and the identity both belong to the subject)

– The digital signature of the CA

• GSI certs are encoded in a X509 certificate format

Page 10: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Globus Security:

GSI - is a set of tools, libraries and protocols used in Globus to allow users and applications to securely access resources.

Based on PKI Uses Secure Socket Layer for authentication and message

protection Encryption Signature

Adds features needed for Single-Sign On Proxy Credentials Delegation

Page 11: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Authorization - Gridmap

Gridmap is a list of mappings from allowed DNs to user name

Controlled by administrator Open read access

"/C=US/O=Globus/O=ANL/OU=MCS/CN=Ben Clifford” benc"/C=US/O=Globus/O=ANL/OU=MCS/CN=MikeWilde” wilde

(in /etc/grid-security/grid-mapfile directory)

Page 12: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

GSI: Credentials

In the GSI system each user has a set of credentials they use to prove their identity on the grid Consists of a X509 certificate and private key

Long-term private key is kept encrypted with a pass phrase Good for security, inconvenient for repeated usage Do not lose this phrase !

Page 13: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

GSI: Proxy Credentials

Proxy credentials are short-lived credentials created by user Proxy is signed by certificate private key

Short term binding of user’s identity to alternate private key

Same effective identity as certificate

SIGN

Page 14: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

GSI: Proxy Credentials

Stored unencrypted for easy repeated access Chain of trust

Trust CA -> Trust User Certificate -> Trust Proxy Key aspects:

Generate proxies with short lifetime Set appropriate permissions on proxy file

Destroy when done

Page 15: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Grid Security - in practice - steps: Get certificate from relevant CA

DOEGrids in our case

Request to be authorized for resources Meaning you will be added to the OSGEDU VOMS

Generate proxy as needed Using grid-proxy-init

Run clients Authenticate Authorize Delegate as required

Numerous resources, different CAs, numerous credentials

Page 16: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

National GridCyberinfrastructure

16

Page 17: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Grid Resources in the US

Origins: National Grid (iVDGL, GriPhyN,

PPDG) and LHC Software & Computing Projects

Current Compute Resources: 61 Open Science Grid sites Connected via Inet2, NLR.... from

10 Gbps – 622 Mbps Compute & Storage Elements All are Linux clusters Most are shared

Campus grids Local non-grid users

More than 10,000 CPUs A lot of opportunistic usage Total computing capacity

difficult to estimate Same with Storage

Origins: National Grid (iVDGL, GriPhyN,

PPDG) and LHC Software & Computing Projects

Current Compute Resources: 61 Open Science Grid sites Connected via Inet2, NLR.... from

10 Gbps – 622 Mbps Compute & Storage Elements All are Linux clusters Most are shared

Campus grids Local non-grid users

More than 10,000 CPUs A lot of opportunistic usage Total computing capacity

difficult to estimate Same with Storage

Origins: National Super Computing Centers,

funded by the National Science Foundation

Current Compute Resources: 14 TeraGrid sites Connected via dedicated multi-Gbps

links Mix of Architectures

ia64, ia32 LINUX Cray XT3 Alpha: True 64 SGI SMPs

Resources are dedicated but Grid users share with local users 1000s of CPUs, > 40 TeraFlops

100s of TeraBytes

Origins: National Super Computing Centers,

funded by the National Science Foundation

Current Compute Resources: 14 TeraGrid sites Connected via dedicated multi-Gbps

links Mix of Architectures

ia64, ia32 LINUX Cray XT3 Alpha: True 64 SGI SMPs

Resources are dedicated but Grid users share with local users 1000s of CPUs, > 40 TeraFlops

100s of TeraBytes

The TeraGridThe OSG

Page 18: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Open Science Grid (OSG) provides shared computing resources, benefiting a broad set of disciplines

OSG incorporates advanced networking and focuses on general services, operations, end-to-end performance

Composed of a large number (>50 and growing) of shared computing facilities, or “sites”

http://www.opensciencegrid.org/

A consortium of universities and national laboratories, building a sustainable grid infrastructure for science.

18

Page 19: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

www.opensciencegrid.org

Diverse job mix

Open Science Grid70 sites (25,000 CPUs) & growing400 to >1000 concurrent jobsMany applications + CS experiments;

includes long-running production operationsUp since October 2003;

19

Page 20: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

96 Resources across

production & integration infrastructures

20 Virtual Organizations +6 operations

Includes 25% non-physics.

~20,000 CPUs (from 30 to 4000)

~6 PB Tapes

~4 PB Shared Disk

Snapshot of Jobs on OSGs

Sustaining through OSG submissions:

3,000-4,000 simultaneous jobs .

~10K jobs/day

~50K CPUhours/day.

Peak test jobs of 15K a day.

Using production & research networks

OSG Snapshot

Page 21: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

To efficiently use a Grid, you must locate and monitor its resources. Check the availability of different grid sites Discover different grid services Check the status of “jobs” Make better scheduling decisions with

information maintained on the “health” of sites

21

Page 22: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

OSG - VORS (VO Resource Locator)

Page 23: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

OSG - Monitoring - MonALISA

Page 24: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

OSG Middleware

Infr

ast

ruct

ure

Ap

pli

cati

on

s VO Middleware

Core grid technology distributions: Condor, Globus, myproxy: shared with TeraGrid and

others

Virtual Data Toolkit (VDT) core technologies + software needed by

stakeholders: many components shared with EGEE

OSG Release Cache: OSG specific configurations, utilities etc.

HEP

Data and workflow management etc

Biology

Portals, databases etc

User Science Codes and Interfaces

Existing Operating, Batch systems and Utilities

Astrophysics

Data replication etc

Page 25: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

What’s included in VDT ?

GRAM: Allow job submissions GridFTP: Allow file transfers CEMon/GIP: Publish site information Some authorization mechanism

grid-mapfile: file that lists authorized users GUMS: service that maps users

other pieces of software

Page 26: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

TeraGrid provides vast resources via a number of huge computing facilities.world's largest, most comprehensive distributed cyberinfrastructure for open scientific research (750 teraflops of computing capability and more than 30 petabytes of storage)

26

Page 27: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

The TeraGrid

open scientific discovery infrastructure combining leadership class resources at 11 partner sites to create an integrated, persistent computational resource

750 teraflops of computing capability and more than 30 petabytes of online and archival data storage

Resource Providers: Currently NCSA, SDSC, PSC, Indiana, Purdue, ORNL, TACC,

UC/ANL Systems (resources, services) support, user support Provide access to resources via policies, software, and mechanisms

coordinated by and provided through the GIG (coordinated through the Grid Infrastructure Group (GIG) at the University of Chicago)

Page 28: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Science GatewaysA new initiative for the TeraGrid• Increasing investment by

communities in their own cyberinfrastructure, but heterogeneous:

• Resources• Users – from expert to K-12• Software stacks, policies

• Science Gateways– Provide “TeraGrid Inside”

capabilities– Leverage community investment

• Three common forms:– Web-based Portals – Application programs running on

users' machines but accessing services in TeraGrid

– Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.

Workflow Composer

Page 29: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

For More Info

Open Science Gridhttp://www.opensciencegrid.org

TeraGridhttp://www.teragrid.org

Page 30: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Conclusion: Why Grids?

New approaches to inquiry based on Deep analysis of huge quantities of data Interdisciplinary collaboration Large-scale simulation and analysis Smart instrumentation Dynamically assemble the resources to tackle a new

scale of problem Enabled by access to resources & services without

regard for location & other barriers

30

Page 31: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Grids:Because Science needs community … Teams organized around common goals

People, resource, software, data, instruments…

With diverse membership & capabilities Expertise in multiple areas required

And geographic and political distribution No location/organization possesses all required skills and

resources

Must adapt as a function of the situation Adjust membership, reallocate responsibilities, renegotiate

resources

31

Page 32: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Science grid application examples

Page 33: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Example: OSG CHARMM ApplicationProject and Slides are the work of:

Ana Damjanovic (JHU, NIH)JHU: Petar Maksimovic Bertrand Garcia-MorenoNIH: Tim Miller Bernard BrooksOSG: Torre Wenaus and team

Page 34: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

CHARMM and MD simulations

CHARMM is one of the most widely used programs for computational modeling, simulations and analysis of biological (macro)molecules

The widest use of CHARMM is for molecular dynamics (MD) simulations:

• Atoms are described explicitly

• interactions between atoms are described with an empirical force-field * electrostatic, van der Waals

* bond vibrations, angle bending, dihedrals …

• Newton’s equations are solved to describe

time evolution of the system: timestep 1-2 fs

typical simulation times: 10-100 ns

• CHARMM has a variety of tools for analysis of MD

trajectories

Page 35: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Hydration of the protein interior

staphylococcal nuclease

• Interior water molecules can play key roles in many biochemical processes such as proton transfer, or catalysis.

• Precise location and numbers of water molecules in protein interiors is not always known from experiments.

• Knowing their location is important for understanding how proteins work, but also for drug design

Using traditional HPC resourceswe performed 10 X 10 nslong MD simulations.

Not enough statistics!

Crystallographic structures obtained at different temperaturesdisagree in the number of observed water molecules. How many water molecules are in the protein interior?

Two conformations, each different hydration pattern

Use OSG to run lots of simulations with different initial velocities.

Page 36: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Running jobs on the OSG

• Manual submission and running of a large number of jobs can be time-consuming and lead to errors

• OSG personnel worked with scientists to get this scheme running

• PanDA was developed for ATLAS, and is being evolved into OSG-WMS

Page 37: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Accomplishments with use of OSG

staphylococcal nuclease

Longer runs are needed to answer “how many water molecule are in the protein?”

Parallel CHARMM is key to running longer simulations (as simulation time is reduced).OSG is exploring and testing ways to execute parallel MPI applications.

2 initial structures X 2 methodseach 40 X 2 ns long simulationsTotal: 160 X 2ns.Total usage: 120K cpu hours

Interior water molecules can influence protein conformations

Different answers obtained if simulations started with and without interior water molecules

Page 38: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Plans for near term future

• Hydration of interior of staphylococcal nuclease: • answer previous question: (80 X 8 ns = 153K cpu hours)• test new method for conformational search, SGLD (80 X 8 ns = 153K cpu hours)

• Conformational rearrangements and effects of mutations in AraC protein.• when sugar arabinose is present, “arm”, gene expression on• when sugar arabinose is absent, “arm”, gene expression off

Computational requirements:

• Wild type with and without arabinose (50 X 10 ns)

• F15V with and without arabinose (50 X 10 ns)

Total: 100 X 10 ns = 240K cpu hours

Additional test simulations (100 X 25 X 1 ns)

Total: 2500 X 1ns = 600K cpu hours

Provide feedback to experiments

Page 39: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Long term needs

Study conformational rearrangements in other proteins

• Conformational rearrangements are at the core of key biochemical processes such as regulation of enzymatic and genetic activity.

• Understanding of such processes is pharmaceutically relevant.

• Such processes usually occur on timescales of s and longer, are not readily sampled in MD simulations.

• Experimentally little is know about the mechanisms of these processes.

• Poorly explored territory, will be “hot” for the next 10 years

• Use brute force approach

• Test performance of different methods

• Test different force fields

• Study effects of mutations -> provide feedback to experiments

Page 40: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Impact on scientific community

• CHARMM on OSG is still in testing and development stage sothe user pool is small

• Potential is great - there are 1,800 registered CHARMM users

• bring OSG computing to hundreds of other CHARMM users–give resources to small groups–do more science by harvesting unused OSG cycles–provide job management software

• “Recipe” used here developed for CHARMM, but can be easily extended to other MD programs (AMBER, NAMD, GROMACS...)

Page 41: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

41

Genome Analysis and Database Update Runs across TeraGrid and OSG. Used VDL and Pegasus workflow & provenance.

Scans public DNA and protein databases for new and newly updated genomes of different organisms and runs BLAST, Blocks, Chisel. 1200 users of resulting DB.

On OSG at the peak used >600CPUs,17,000 jobs a week.

Page 42: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

PUMA: Analysis of Metabolism

PUMA Knowledge Base

Information about proteins analyzed against ~2 million gene sequences

Analysis on Grid

Involves millions of BLAST, BLOCKS, and

other processesNatalia Maltsev et al.http://compbio.mcs.anl.gov/puma2 42

Page 43: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Sample Engagement: Kuhlman Lab

Sr. Rosetta Researcher and his team, little CS/IT expertise, no grid expertise. Quickly up and running with large scale jobs across OSG, >250k CPU hours

Using OSG to design proteins that adopt specific three dimensional structures and bind and regulate target proteins important in cell biology and pathogenesis. These designed proteins are used in experiments with living cells to detect when and where the target proteins are activated in the cells

Page 44: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Rosetta on OSG Each protein design requires about 5,000 CPU hours,

distributed across 1,000 individual compute jobs.

Page 45: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Workflow Motivation:example from Neuroscience

• Large fMRI datasets– 90,000 volumes / study– 100s of studies

• Wide range of analyses– Testing, production runs– Data mining– Ensemble, Parameter studies

Page 46: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

A typical workflow pattern in fMRI image analysis runs many filtering apps.3a.h

align_warp/1

3a.i

3a.s.h

softmean/9

3a.s.i

3a.w

reslice/2

4a.h

align_warp/3

4a.i

4a.s.h 4a.s.i

4a.w

reslice/4

5a.h

align_warp/5

5a.i

5a.s.h 5a.s.i

5a.w

reslice/6

6a.h

align_warp/7

6a.i

6a.s.h 6a.s.i

6a.w

reslice/8

ref.h ref.i

atlas.h atlas.i

slicer/10 slicer/12 slicer/14

atlas_x.jpg

atlas_x.ppm

convert/11

atlas_y.jpg

atlas_y.ppm

convert/13

atlas_z.jpg

atlas_z.ppm

convert/15

Workflow courtesy James Dobson, Dartmouth Brain Imaging Center 46

Page 47: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

What’s next ?

Take the self-paced course:

opensciencegrid.org/OnlineGridCourse

Contact us:

[email protected]

Info for all aspects of Grid usage is at OSG Web site:

opensciencegrid.org

Page 48: Grid Security 1. Grid security is a crucial component Need for secure communication  Authenticated ( verify entities are who they claim to be -> use

Acknowledgements (OSG, Globus, Condor teams and associates)

Gabrielle Allen, LSU CCT

Rachana Ananthakrishnan, ANL/Globus

Ben Clifford, UChicago CI

Jaime Frey, UWisconsin/Condor

Alain Roy, UWisconsin/Condor

Alex Sim, BNL

48