grid computing
DESCRIPTION
Grid Computing. ECI, July 2005. Living in an Exponential World. Moore’s Law: transistors count x2 in 18 months Storage density x2 in 12 months Online data x10 in 12 months (current = 10pB) Telescope to generate > 10pB by 2008 Network speed x2 in 9 months - PowerPoint PPT PresentationTRANSCRIPT
Grid Computing
ECI, July 2005ECI, July 2005
ECI – July 2005 2
Living in an Exponential World
Moore’s Law: transistors count x2 in 18 months
Storage density x2 in 12 months Online data x10 in 12 months (current =
10pB) Telescope to generate > 10pB by 2008
Network speed x2 in 9 months 1986-2000: cpu x500, network x340000 2001-2010: cpu x60, network x4000
ECI – July 2005 3
What is a Grid (informal)
Three key criteria: Coordinates resources not under centralized
control Using standard, open, general purpose
protocols and interfaces To deliver non-trivial quality of service
What is not a Grid? A cluster, a network attached storage device, a scientific instrument, a network, (though these are important components)
ECI – July 2005 4
So…
We’ve got: Fast computers (but not fast enough…) Bigger storage (but not big enough…) Fast networks (well, not speedy enough…)
And we want to: Solve big computational problems…
In that case: How about joining resources together ?
That’s GRID!
ECI – July 2005 5
Why “Grid” ?
Analogy with the Power Grid Service with known characteristics:
Stable voltage (~220v) Contracted power Pay the installed capacity and consumed
power Standard sockets, outlets, devices Available 24/7 (usually…)
ECI – July 2005 6
And in Computers
“Computer Grid” similar to “Power Grid” Special socket to get connected Pay subscription and the power consumed If need more – contract more
ECI – July 2005 7
Definitions of Grid
A paradigm/infrastructure that enables the sharing, selection, & aggregationof geographically distributed resources to solve large scale problems/applications
Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations Computers, software, catalogue data and
databases, special devices/instruments, people
ECI – July 2005 8
What is a Grid (informal)
Three key criteria: Coordinates resources not under centralized
control Using standard, open, general purpose
protocols and interfaces To deliver non-trivial quality of service
What is not a Grid? A cluster, a network attached storage device, a scientific instrument, a network, (though these are important components)
ECI – July 2005 9
Grid and the Hype
The classic Hype curve
HERE !
ECI – July 2005 10
Types of Grids
Grid systems can be classified depending on their usage:
GridSystems
DataGrid
Computational Grid
ServicesGrid
High Throughput
DistributedSupercomputi
ng
OnDemand
Collaborative
Multimedia
ECI – July 2005 11
Types of Grids
Computational Grids Distributed Supercomputing: grand challenge
apps High-Throughput: parametric modeling,
independent tasks Data Grids
Data mining, analysis, data processing Service Grids
Collaborative: connects users, apps and devices Multimedia: real time multimedia, virtual reality Demand: aggregate more resource if required
ECI – July 2005 12
A Typical Grid Computing Environment
Grid Resource Broker
Resource Broker
Application
Grid Information Service
Grid Resource Broker
databaseR2R3
RN
R1
R4
R5
R6
Grid Information Service
ECI – July 2005 13
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
How it Really Happens(A Simplified View)
WebBrowser
ComputeServer
DataCatalog
DataViewer
Tool
Certificateauthority
ChatTool
CredentialRepository
WebPortal
ComputeServer
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
RegistrationService
ECI – July 2005 14
How it Really Happens(without Grid Software)
WebBrowser
ComputeServer
DataCatalog
DataViewer
Tool
Certificateauthority
ChatTool
CredentialRepository
WebPortal
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
RegistrationService
A
B
C
D
E
Application Developer
10
Off the Shelf
12
GlobusToolkit
0
Grid Community
0
ECI – July 2005 15
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
How it Really Happens(with Grid Software)
WebBrowser
ComputeServer
GlobusMCS/RLS
DataViewer
Tool
CertificateAuthority
ChatTool
MyProxy
CHEF
ComputeServer
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
Globus IndexService
GlobusGRAM
GlobusGRAM
OGSADAI
OGSADAI
OGSADAI
Application Developer
2
Off the Shelf
9
GlobusToolkit
5
Grid Community
3
ECI – July 2005 16
Grid Characteristics
* Resource Management* Application Construction
Entities/Issues
Characteristics
Users, Resources, Owners
Geographically Distributed
User, Resources, Applications
Heterogeneous
Resource Availability/Capability
Varies with time
Policies and strategies
Heterogeneous & decentralised
QoS requirements Heterogeneous
Cost / Price Varies: different resources, users, time
ECI – July 2005 17
Why is it Complex ?
Size (nodes, providers, consumers) Heterogeneity of resources Heterogeneity of fabric management
Systems, policies Heterogeneity of applications
Type, requirements, patterns Geographic distribution, varying time zones Non-secure and Unreliable environment
ECI – July 2005 18
Networked Resources across Organizations
Computers Networks Data Sources Scientific InstrumentsStorage Systems
Local Resource Managers
Operating Systems Queuing Systems Internet ProtocolsLibraries & App Kernels
Distributed Resources Coupling Services
Information QoSProcess
Development Environments and Tools
Languages/Compilers Libraries Debuggers Web tools
Resource Management, Selection, and Aggregation (BROKERS)
Applications and Portals
Prob. Solving Env.Scientific…CollaborationEngineering Web enabled Apps
Trading
…
…
…
…
FABRIC
APPLICATIONS
SECURITY LAYER
Security Data
CORE MIDDLEWARE
USER LEVEL MIDDLEWARE
Monitors
Layered Grid Architecture
ECI – July 2005 19
Resource/Service Integrationas a Fundamental Challenge
R
Discovery
Many sourcesof data, services,computation
R
Registries organizeservices of interestto a community
Access
Data integration activitiesmay require access to, &exploration/analysis of, dataat many locations
Exploration & analysismay involve complex,multi-step workflows
RM
RM
RMRM
RM
Resource managementis needed to ensureprogress & arbitrate competing demands
Securityservice
Securityservice
PolicyservicePolicyservice
Security & policymust underlie access& managementdecisions
ECI – July 2005 20
Grid Middleware Technologies
Globus – Argonne National Lab and ISI
Gridbus – University of Melbourne Unicore – Germany Legion – University of VirginiaMiddleware Clients and Portals
User-Level Middleware
Low-Level Middleware
Fabric Access Management
Globus GridbusLegionUNICORE
3rd PartySolutions
Middleware Clients and Portals
User-Level Middleware
Low-Level Middleware
Fabric Access Management
Globus GridbusLegionUNICORE
3rd PartySolutions
ECI – July 2005 21
The Globus Toolkit
Grid Resources and Local Services
Grid Resource Management (GRAM, GASS)
GSI Security Layer
Third Party User-Level Middleware
Grid Information Services (MDS)
Grid Data Management
(GridFTP, ReplicaCatalog)
Applications
Globus
Grid Resources and Local Services
Grid Resource Management (GRAM, GASS)
GSI Security Layer
Third Party User-Level Middleware
Grid Information Services (MDS)
Grid Data Management
(GridFTP, ReplicaCatalog)
Applications
Globus
Globus Toolkit Services
Security (GSI) PKI-based Security (Authentication) Service
Job submission and management (GRAM) Uniform Job Submission
Information services (MDS) LDAP-based Information Service
Remote file management (GASS) Remote Storage Access Service
Remote Data Catalogue and Management Tools
ECI – July 2005 23
Security
Resources and users belong to organizations
An authentication infrastructure is needed Both users and owners should be
protected from each other Ensure security and privacy:
Data Code Message
ECI – July 2005 24
Grid Security Infrastructure (GSI)
GSI is:
PKI(CAs and
Certificates)
SSL/TLS
Proxies and Delegation
PKI forcredentials
SSL forAuthenticationAnd message protection
Proxies and delegation (GSIExtensions) for secure singleSign-on
ECI – July 2005 25
Simple job submission
globus-job-run provides a simple RSH compatible interface % grid-proxy-init Enter PEM pass phrase: *****
% globus-job-run host program [args] Authentication Test
% globusrun –a –r hostname Running a Job on Remote node
% globusrun hostname <executable> globus-job-run belle.anu.edu.au /bin/dat
ECI – July 2005 26
Authorization
GSI handles authentication, but not authorization
Authorization issues: Management of authorization on a multi-
organization grid is still an interesting problem
Mapping resources to users does not scale well
Large communities that share resources...
ECI – July 2005 27
Globus Resource Access Manager
Resource Specification Language (RSL) GRAM allows programs to be started on
remote resources A layered architecture allows app-specific
resource brokers and co-allocators to be defined as services
ECI – July 2005 28
GRAM GRAM GRAM
LSF EASY-LL NQE
Application
RSL
Simple ground RSL
Information Service
Localresourcemanagers
RSLspecialization
Broker
Ground RSL
Co-allocator
Queries& Info
Resource Management Architecture
ECI – July 2005 29
GRAM Components
Globus SecurityInfrastructure
Job Manager
GRAM client API calls to request resource allocation
and process creation.
MDS client API callsto locate resources
Query current statusof resource
Create
RSL Library
Parse
RequestAllocate &
create processes
Process
Process
Process
Monitor &control
Site boundary
Client MDS: Grid Index Info Server
Gatekeeper
MDS: Grid Resource Info Server
Local Resource Manager (e.g., PBS, Condor, or OS-fork())
MDS client API callsto get resource info
GRAM client API statechange callbacks
ECI – July 2005 30
A simple run
Interactive Run/Output: > globus-job-run belle.anu.edu.au /bin/date
Mon May 3 15:05:42 EST 2004 > globusrun -o -r belle.anu.edu.au
"&(executable=/bin/date)" Sun May 22 17:27:22 EST 2005
Batch Commands: > globusrun -b -r belle.anu.edu.au
"&(executable=/bin/date)(stdout=MyOutputFile)" > gsincftpget belle.anu.edu.au . MyOutputFile
(Pull output file to local directory)
ECI – July 2005 31
Resource Specification Language (RSL)
Common notation for information exchange
Provides two types of information: Resource requirements: machine type,
number of nodes, memory, etc. Job configuration: directory, executable, args,
environment API provided for manipulating RSL
ECI – July 2005 32
RSL Syntax
Elementary form: parenthesis clauses (attribute op value [ value … ] )
Operators Supported: <, <=, =, >=, > , !=
Some supported attributes: executable, arguments, environment, stdin,
stdout, stderr Unknown attributes are passed through
May be handled by subsequent tools
ECI – July 2005 33
Constraints: “&”
globusrun -o -r belle.anu.edu.au "&(executable=/bin/date)"
For example:& (count>=5) (count<=10) (max_time=240) (memory>=64) (executable=myprog)
“Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours”
ECI – July 2005 34
Running job as batch job
globusrun -b -r belle.anu.edu.au '&(executable=/bin/date)(stdout=filename)'
It prints a "handle" that you can use to interrogate the job while it is running: https://belle.anu.edu.au:4029/288/1116418550/
Check job status: > globusrun -status
https://belle.anu.edu.au:4029/288/1116418550/ Terminate job execution:
> globusrun -kill https://belle.anu.edu.au:4029/288/1116418550/
ECI – July 2005 35
Disjunction: “|”
For example: & (executable=myprog) ( | (&(count=5)(memory>=64))
(&(count=10)(memory>=32))) Create 5 instances of myprog on a
machine that has at least 64MB of memory, or 10 instances on a machine with at least 32MB of memory
ECI – July 2005 36
Multirequest: “+”
A multi-request allows us to specify multiple resource needs, for example+ (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2)) Execute 5 instances of p1 on a machine with
at least 64M of memory Execute p2 on a machine with an ATM
connection Multirequests are central to co-allocation
ECI – July 2005 37
Job Submission Interfaces
Command line programs for job submission globus-job-run: Interactive jobs globus-job-submit: Batch/offline jobs globusrun: Flexible scripting infrastructure
Other High Level Interfaces General purpose
Nimrod-G, Condor-G, Gridbus Broker, PBS, etc Application specific
Web portals
ECI – July 2005 38
globus-job-run
For running of interactive jobs Additional functionality beyond rsh
Ex: Run 2 process job w/ executable stagingglobus-job-run -: host –np 2 –s myprog arg1 arg2
Ex: Run 5 processes across 2 hostsglobus-job-run \
-: host1 –np 2 –s myprog.linux arg1 \-: host2 –np 3 –s myprog.aix arg2
For list of arguments run:globus-job-run -help
ECI – July 2005 39
globus-job-submit
For running of batch/offline jobs globus-job-submit Submit job
Same interface as globus-job-run Returns immediately
globus-job-status Check job status globus-job-cancel Cancel job globus-job-get-output Get job
stdout/err globus-job-clean Cleanup after
job
ECI – July 2005 40
Simultaneous start
co-allocator
InformationService
“Run SF-Expresson 300 nodes”
"Run SF-Expresson 256 nodes”
“Run adistributed interactive
simulation involving100,000 entities”
“80 nodes on Argonne SP,256 nodes on CIT Exemplar300 nodes on NCSA O2000”
“Supercomputers providing 100 GFLOPS, 100 GB, < 100 msec latency”DIS-Specific
Broker
" . . ."
“Performa parameter studyinvolving 10,000separate trials”
Parameter studyspecific broker
Supercomputerresource broker
NCSAResource Manager
ArgonneResource Manager
CITResource Manager
Resource Brokers
" . . ."
“Create ashared virtual space
with participantsX, Y, and Z”
Collaborativeenvironment-specific
resource broker
"Run SF-Expresson 80 nodes”
ECI – July 2005 41
Remote I/O and Data Access
Tell GRAM to pull executable from remote Access files from a remote location stdin/stdout/stderr from a remote location
ECI – July 2005 42
What is GASS?
GASS file access API Replace open/close with globus_gass_open/close;
read/write calls can then proceed directly RSL extensions
URLs used to name executables, stdout, stderr Remote cache management utility Low-level APIs for specialized behaviors
ECI – July 2005 43
GASS File Naming
URL encoding of resource nameshttps://quad.mcs.anl.gov:9991/~bester/myjob
protocol server address file name Other examples
https://pitcairn.mcs.anl.gov/tmp/input_dataset.1https://pitcairn.mcs.anl.gov:2222/./output_datahttp://www.globus.org/~bester/input_dataset.2
Supports http & https Support ftp & gsiftp.
ECI – July 2005 44
Example GASS Applications
On-demand, transparent loading of data sets
Caching of data sets Automatic staging of code and data to
remote supercomputers (Near) real-time logging of application
output to remote server
ECI – July 2005 45
GASS File Access API
Minimum changes to application globus_gass_open(),
globus_gass_close() Same as open(), close() but use URLs
instead of filenames Caches URL in case of multiple opens Return descriptors to files in local
cache or sockets to remote server
ECI – July 2005 46
GASS File Access API (cont)
Support for different access patterns Read-only (from local cache) Write-only (to local cache) Read-write (to/from local cache) Write-only, append (to remote server)
ECI – July 2005 47
1. Derive Contact String2. Build RSL string3. Startup GASS server4. Submit to request5. Return output
jobmanager
gatekeeper
program
GRAM & GASS
stdout
GASS server
3
4
globus-job-run
Host name
Contactstring
1
RSLstring
2CommandLine Args
4
4
55
55
ECI – July 2005 48
Example: A Simple Broker
Select machines based on availability Use MDS queries to get current host loads Look at output and figure out what
machines to use Generate RSL based on selection
globus-job-run -dumprsl can assist Execute globusrun, feeding it the RSL
generated in previous step
ECI – July 2005 49
GRAM Components
Globus SecurityInfrastructure
Job Manager
GRAM client API calls to request resource allocation
and process creation.
MDS client API callsto locate resources
Query current statusof resource
Create
RSL Library
Parse
RequestAllocate &
create processes
Process
Process
Process
Monitor &control
Site boundary
Client MDS: Grid Index Info Server
Gatekeeper
MDS: Grid Resource Info Server
Local Resource Manager (e.g., PBS, Condor, or OS-fork())
MDS client API callsto get resource info
GRAM client API statechange callbacks
ECI – July 2005 50
MDS: Monitoring and Discovery Service
General information infrastructure Locate and determine characteristics of
resources Locate resources
Where are resources with required architecture, installed software, available capacity, network bandwidth, etc.?
Determine resource characteristics What are the physical characteristics,
connectivity, capabilities of a resource?
ECI – July 2005 51
Examples of Useful Information
Characteristics of a compute resource IP address, software available, system
administrator, networks connected to, OS version, load
Characteristics of a network Bandwidth and latency, protocols, logical
topology Characteristics of the Globus
infrastructure Hosts, resource managers
ECI – July 2005 52
MDS
Store information in a distributed directories Directory stored in collection of servers Each server optimized for particular function
Directory can be updated by Information providers and tools Applications (i.e., users) Backend tools which generate info on demand
Information dynamically available to Tools Applications
ECI – July 2005 53
Directory Service Functions
White Pages Look up the IP number, amount of memory, etc.,
associated with a particular machine Yellow Pages
Find all the computers of a particular class or with a particular property
Temporary inconsistencies may be okay A distributed system may be imprecise about
the state of a resource, until you actually use it Information is often used as “hints” Information itself can contain ttl, etc
ECI – July 2005 54
GRAM Components
Globus SecurityInfrastructure
Job Manager
GRAM client API calls to request resource allocation
and process creation.
MDS client API callsto locate resources
Query current statusof resource
Create
RSL Library
Parse
RequestAllocate &
create processes
Process
Process
Process
Monitor &control
Site boundary
Client MDS: Grid Index Info Server
Gatekeeper
MDS: Grid Resource Info Server
Local Resource Manager
MDS client API callsto get resource info
GRAM client API statechange callbacks
ECI – July 2005 55
What users want ? Grid Consumers
Execute jobs for solving varying problem size and complexity
Benefit by selecting and aggregating resources wisely Tradeoff timeframe and cost
minimize expenses Grid Providers
Contribute (“idle”) resource for consumer jobs Benefit by maximizing resource utilization Tradeoff local requirements & market opportunity
maximize return on investment
ECI – July 2005 56
What’s Wrong with Cluster Methods ?
They use centralised policy that need complete state-information common fabric management policy or decentralised
consensus-based policy. Too many heterogenous parameters
define system-wide performance matrix ? define common fabric management policy ?
“distributed computational economy” proved successful in human economies can leverage proven economic principles/techniques can regulate demand and supply offers incentive (money?) for being part of the grid!
.....
ECI – July 2005 57
Grid Economy: “Incentive” as a Design Parameter
Grids aim at exploiting synergies that result from cooperation of autonomous distributed entities. Creation of Virtual Organisations/Enterprises Resource sharing Aggregation of resources on demand.
For this cooperation to be sustainable, all need to have (economic) incentive.
Therefore, “incentive” mechanisms should be considered as one of key design parameters of Grid computing.
ECI – July 2005 58
Gridbus Architecture Layer
Grid Resources and Local Services
Alchemi
WorkFlow and Application Programming Interface
Globus Unicore
Applications
Gridbus Grid Service Broker
Adapter Layer
Alchemi Actuator
GlobusActuator
UnicoreActuator
Grid Trading and Banking Services
Grid Economy and Allocation…
…
Grid Resources and Local Services
Alchemi
WorkFlow and Application Programming Interface
Globus Unicore
Applications
Gridbus Grid Service Broker
Adapter Layer
Alchemi Actuator
GlobusActuator
UnicoreActuator
Grid Trading and Banking Services
Grid Economy and Allocation…
…
Gridbus and Complementary Grid Technologies
AIXSolarisWindows Linux
.NET GridFabricSoftware
GridApplications
Core GridMiddleware
User-LevelMiddleware(Grid Tools)
GridBank
Grid Exchange & Federation
JVM
Grid Brokers:
X-Parameter Sweep Lang.
Gridbus Data Broker
MPI
Condor SGE TomcatPBS
Alchemi
Workflow
IRIX OSF1 Mac
Libra
Globus Unicore ……Grid
MarketDirectory
PDB
CDB
Worldwide Grid
GridFabricHardware
……
PortalsScience Commerce Engineering ……Collaboratories
……
Workflow Engine
Grid Storage Economy
Gri
d E
con
om
y NorduGrid XGrid
ExcellGrid
Nimrod-G
GRIDSIM
Gridscape
ECI – July 2005 60
Putting them All Together:On Demand Assembly of Services
Data Source
(Instruments/distributed sources)
Data Replicator(GDMP) ASP Catalogue
Grid Info Service
Grid Market Directory
GSP(Accounting Service)
GridbusGridBank
Data
GSP(e.g., UofM)
PEGSP
(e.g., VPAC)
PE
GSP(e.g., IBM)
CPUorPE
Grid Service (GS)
(Globus)
Alchemi
GS
GTS
Cluster Scheduler
Grid Service Provider (GSP)
(e.g., CERN)
PECluster Scheduler
Job
8
GridResource Broker
2
Visual Application Composer
Application CodeExplore
data1
46
35
Resu
lts9 7
Results+
Cost Info
10
11
Bill
12Data Catalogue
ECI – July 2005 61
Grid Brokers
Perform parameter sweep (bag of tasks) (utilizing distributed resources) within “T” hours or early and cost not exceeding $M.
Three Options: Using pure Globus commands Build your own distributed app & scheduler Use Nimrod-G / Gridbus (Resource Broker)
ECI – July 2005 62
Remote Execution Steps
Choose Resource
Transfer Input Files
Set Environment
Start Process
Pass Arguments
Monitor Progress
Read/Write Intermediate Files
Transfer Output Files
Summary ViewJob ViewEvent View
GRID
ECI – July 2005 63
Scheduling task farming (Data Grid apps) with static or dynamic parameter sweeps
Employ computational economy for selection of services, depending on quality, cost, and availability, and users requirements (deadline, budget) A single window to manage & control experiment Programmable task farming engine Resource discovery and resource trading Transportation of data & sharing of results Accounting
Grid Service Broker (GSB)
ECI – July 2005 64
Example Grid Schedulers
Nimrod-G - Monash University Computational Grid & Economic based
Condor-G – University of Wisconsin Computational Grid & System centric
Gridbus Broker – Melbourne University Data Grid & Economic based
ECI – July 2005 65
Key Steps in Grid Scheduling
1. Authorization Filtering
3. Min. Requirement Filtering
2. Application Definition
Phase I-Resource Discovery
5. System Selection
4. Information Gathering
Phase II - Resource Selection
7. Job Submission
6. Advance Reservation
9. Monitoring Progress
8. Preparation Tasks
11. Clean-up Tasks
10 Job Completion
Phase III- Job Execution