grid infrastructure
Post on 02-Jan-2016
40 Views
Preview:
DESCRIPTION
TRANSCRIPT
Eddie.Aronovich@cs.tau.ac.il
Grid Infrastructure
What is it ?
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 3
SERVERS
Clients
IT all about IT
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 4
Hardware utilization
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 5
SOA & Web services
• Decompose processing into services
• Each service works independently
• Main components:– Universal Description, Discovery and Integration– Simple Object Access Protocol – Web Services Description Language
• W3C standard
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 6
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 7
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 8
THE WORLD NEEDS ONLY FIVE COMPUTERS (Thomas J. Watson)
• Google grid• Microsoft's live.com • Yahoo!• Amazon.com• eBay• Salesforce.com
Well, that's O(5) ;)
Greg Matter (http://blogs.sun.com/Gregp/entry/the_world_needs_only_five)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 9
Scaling• Scale-up
– Add more resources within the system– Does not requires changes in the applications– Limited extension– Singe point of failure
• Scape-out– Add more systems– Architecture dependent (needs change of code)– Economically
• Howto ?– Split the operation into groups– Perform each group on a different machine
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 10
How fast can parallelization be ?
• Let:– α be the proportion of the process that can not be
parallelized.– P – number of processors– S – System speedup
Amdhals law:
S = 1 / (α + (1- α ) / P )
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 11
Cluster types• High availability
– Active-Active– Active-Passive– Heart beat
• Load Balancing Cluster– Round robin (weighted/non-weighted)– System status aware (session, cpu load, etc)
• Compute cluster– Queuing system (condor, hadoop, open-pbs, LSF, etc.)– Single system image (ScaleMP, SSI, Mosix, nomad,etc.)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 12
Condor script ################# # Sample script # #################
Executable = /bin/hostname when_to_transfer_output = ON_EXIT_OR_EVICT Log = {file name}.log Error = err.$(Process) Output = out.$(Process) Requirements = substr(Machine,0,4)=="dopp"
&& ARCH=="X86_64" Arguments = +-u notification = Complete Universe = VANILLA Queue 10
From a single PC to a Grid
Farm of PCs
Examples:
Seti@home
Africa@home
Example:
EGEE
Enterprise grid:Mutualization of resources in a company
Volunteer computing: CPU cycles made available by PC owners
Grid infrastructure: Internet + disk and storage resources + services for information management ( data collection, transfer and analysis)
Batch to On-Line scale
gLite
&
Globus
Dedicated resources
PBS Torque
Utility computing
(Condor)hadoop
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 15
Key Cloud Services Attributes• Off-Site, Thirds-party provider• Access via Internet• Minimal/no IT skills required to “implement”• Provisioning - self-service requesting; near
real-time deployment; dynamic & fine-grained scaling
• Fine-grained usage-based pricing model• UI - browser and successors• Web services APIs as System Interface• Shared resources/common versions
Source: IDC, Sep 2008
What is “Grid”
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 17
What is Grid Computing ?
Definition is not widely agreed
Foster & Kesselman:
• Computing resources are not administered centrally.
• Open standards are used.
• Non-trivial quality of service is achieved.
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 18
Other definitions
• "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations." (Plaszczak/Wellner)
• "a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements“ (Buyya )
• "a service for sharing computer power and data storage capacity over the Internet." (CERN)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 19
Virtual Organization
• What’s a VO?– People in different organisations
seeking to cooperate and share resources across their organisational boundaries
• Why establish a Grid?– Share data– Pool computers– Collaborate
• The initial vision: “The Grid”• The present reality: Many “grids” • Each grid is an infrastructure
enabling one or more “virtual organisations” to share computing resources
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 20
Institute A
VO1
Institute C
Institute B
Institute D
Institute E
VO2Institute F
The Grid Metaphor
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 21
GRID
MIDDLEWARE
Visualising
Workstation
Mobile Access
Supercomputer, PC-Cluster
Data-storage, Sensors, Experiments
Internet, networks
Stand alone computer
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 22
Hardware
Operating system
Application
Stand alone computer
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 23
Hardware
Operating system
Network stack
Application
Stand alone computer
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 24
Hardware
Operating system
Network stack
Grid Middleware
Application
Middleware components – The batch approach
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 25
Information Information ServiceService
SE & CE info
Pu
blis
h
Input “sandbox” + Broker Info
ReplicaReplicaCatalogueCatalogueDataSets info
Logging &Logging &Book-keepingBook-keeping
Author.&Authen.
StorageStorageElementElement
ComputingComputingElementElement
Output “sandbox”
ResourceResourceBrokerBroker
Job Status
Job S
ub
mit
Even
t
Job
Qu
ery
Job
Stat
us
Input “sandbox”
Output “sandbox”
““User User interface”interface”
UI
NetworkServer
Job Contr.
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
Characts.& status
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
submitted
Job Status
UI: allows users to access the functionalitiesof the WMS(via command line, GUI, C++ and Java APIs)
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
edg-job-submit myjob.jdlMyjob.jdl
JobType = “Normal”;Executable = "$(CMS)/exe/sum.exe";InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};OutputSandbox = {“sim.err”, “test.out”, “sim.log"};Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000;Rank = other.GlueCEStateFreeCPUs;
submitted
Job Statu
s
Job Description Language(JDL) to specify job characteristics and requirements
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Input Sandboxfiles
Jobwaiting
submitted
Job StatusNS: network daemon
responsible for acceptingincoming requests
Job submission
UI
NetworkServer
Job Contr.-
CondorG
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
WM: acts to satisfy the request
Job
Workload manager
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/Broker
Where must thisjob be executed ?
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/ Broker
Matchmaker: responsible to find the “best” CE for a job
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/ Broker
Where are (which SEs) the needed data ?
What is thestatus of the
Grid ?
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/Broker
CE choice
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
JobAdapter
Job Adapter: responsible for the final “touches” to the job before performing submission(e.g. creation of wrapper script, PFN, etc.)
Job submission
UI
NetworkServer
Job Contr.
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Job Status
Job Controller: responsible for theactual job managementoperations (done via CondorG)
Job
submitted
waiting
ready
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Job Status
Job
submitted
waiting
ready
scheduled
“Compute element” – reminder!
Homogeneous set of worker nodes
Grid gate node
Local resource management system:Condor / PBS / LSF master
Globus gatekeeper
Job request
Info system
Logging
gridmapfile
I.S.
Logging
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
submitted
waiting
ready
scheduled
running
“Grid enabled”data transfers/
accesses
Job
InputSandboxfiles
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
OutputSandboxfiles
submitted
waiting
ready
scheduled
running
done
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
submitted
waiting
ready
scheduled
running
done
edg-job-get-output <dg-job-id>
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
OutputSandboxfiles
submitted
waiting
ready
scheduled
running
done
cleared
Job monitoring
UI
Log Monitor
Logging &Bookkeeping
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ComputingElement
RB node
LM: parses CondorG logfile (where CondorG logsinfo about jobs) and notifies LB
LB: receives and stores job events; processes corresponding job status
Log ofjob events
edg-job-status <dg-job-id>edg-job-get-logging-info <dg-job-id>
Job status
Grid Operation and Security by Eddie Aronovich, Mar 2008 44
Approaches to Security: 1
The Poor Security House
Grid Operation and Security by Eddie Aronovich, Mar 2008 45
Approaches to Security: 2
The Paranoid Security House
Grid Operation and Security by Eddie Aronovich, Mar 2008 46
Approaches to Security: 3
The Realistic Security House
Grid Operation and Security by Eddie Aronovich, Mar 2008 47
Mapping certificate to local user
• Site use local accounting system
• Pool of users dedicated for the Grid
• Each user is mapped using gridmap file or VOMS
• Mapping can implement local policy on external users
Grid Operation and Security by Eddie Aronovich, Mar 2008 48
Certificate Request
Private Key encrypted on
local disk
CertificateRequest
Public Key
ID
Cert
User generatespublic/private
key pair.
User send public key to CA along
with proof of identity.
CA confirms identity, signs
certificate and sends back to user.
slide based on presentation given by Carl Kesselman at GGF Summer School 2004
Public
Grid Operation and Security by Eddie Aronovich, Mar 2008 49
Inside the Certificate
• Standard (X.509) defined format.
• User identification (e.g. full name).
• Users Public key.
• A “signature” from a CA created by encoding a unique string (a hash) generated from the users identification, users public key and the name of the CA. The signature is encoded using the CA’s private key. This has the effect of:– Proving that the certificate came from the CA.– Vouching for the users identification.– Vouching for the binding of the users public key to their
identification.
NameIssuer: CAPublic KeySignature
Grid Operation and Security by Eddie Aronovich, Mar 2008 50
Mutual Authentication
A sends their certificate;
B verifies signature in A’s certificate;
B sends to A a challenge string;
A encrypts the challenge string with his private key;
A sends encrypted challenge to B
B uses A’s public key to decrypt the challenge.
B compares the decrypted string with the original challenge
If they match, B verified A’s identity and A can not repudiate it.
AA BBA’s certificateA’s certificate
Verify CA signatureVerify CA signature
Random phraseRandom phrase
Encrypt with A’ s private keyEncrypt with A’ s private key
Encrypted phraseEncrypted phrase
Decrypt with A’ s public keyDecrypt with A’ s public key
Compare with original phraseCompare with original phrase
Grid Operation and Security by Eddie Aronovich, Mar 2008 51
Proxy certificate
• Avoid passphrase re-enter by creating a proxy• Proxy consists of a new certificate and a private key• Proxy certificate contains the owner's identity (modified) • Remote party receives proxy's certificate (signed by
the owner), and owner's certificate. • Proxy certificate is life-time limited• Chain of trust from the CA to proxy through the owner
Grids in Europe
www.eu-egi.eu
52EGEE08 Istanbul, Turkey
•www.eu-egi.eu
•Prof. Dieter KRANZLMUELLER , EGEE 08
To be continued
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 53
top related