grid services

Post on 21-Jan-2016

73 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Grid Services. Presented by Karan Bhatia. Hype Curve. Overview. Grid Computing Background Definition Opportunities Markets Technical Challenges Security Infrastructure Resource Management Service Interoperability Summary. Grid Computing is …. - PowerPoint PPT Presentation

TRANSCRIPT

Grid ServicesGrid Services

Presented by

Karan Bhatia

Presented by

Karan Bhatia

2

Hype Curve

3

Overview

• Grid Computing Background– Definition

– Opportunities

– Markets

• Technical Challenges– Security Infrastructure

– Resource Management

– Service Interoperability

• Summary

4

Grid Computing is …

• “Co-ordinated resource sharing and problem solving in dynamic multi-institutional virtual organization.” [Foster, Kesselman, Tuecke]

– Co-ordinated - multiple resources working in concert, eg. Disk & CPU, or instruments & database, etc.

– Resources - compute cycles, databases, files, application services, instruments.

– Problem solving - focus on solving scientific problems

– Dynamic - environments that are changing in unpredictable ways

– Virtual Organization - resources spanning multiple organizations and administrative domains, security domains, and technical domains

5

Grid Computing is … (Industry)

• “about finding distributed, underutilized compute resources (systems, desktops, storage) and provisioning those resources to users or applications requiring them.” [The Grid Report, Clabby Analytics]

– Distributed - all the resources laying around in departments or server rooms.

– Underutilized - typical utilization of “big iron” is 5 to 10%. Organizations save money by increasing utilization versus purchasing new resources.

– Resources - servers and server cycles, applications, data resources

– Provisioning - predict and schedule resource use depending on load.

6

Types of Grids…

• Compute Grids– Seti@home, Entropia,

United Devices, Condor

• Data Grids– Storage Resource Broker

(SRB), Avaki, BIRN, GEON

• Collaboration Grids– Instrumentation

(telescience), applications

• Enterprise Grids– Majority of commercial

interest

• Partner Grids– B2B, Academic/Govt Grids

• Service Grids– “Utility” Computing, “On

Demand”, pervasive, autonomic, etc…

7

A Grid is …

• “the next generation Internet,”

• “all about free cycles ala SETI@HOME,”

• “a distributed object system,”

• “a new programming model,”

• “a replacement for high performance computing,”

8

IMAGING INSTRUMENTS

COMPUTATIONALRESOURCES

LARGE-SCALE DATABASES

DATAACQUISITION ,ANALYSIS

ADVANCEDVISUALIZATION

Example… TeleScience Grid

9

Grid Resources - Networks

10

Grid Resources - Compute

11

Top 500.org

12

13

Another Grid Example … Google

• Queries– 150 M queries/day (2000/s)

– 100 countries

– 3.3 B documents

• Hardware– 15,000 Linux systems in 6 data centers

– 15 Tflop/s and 1000 TB total capacity

– 40-80 1U/2U servers/cabinet

– 100 MB Ethernet switches/cabinate with gigabit uplinks

– Growth from 4000 systems (18 M queries/day)

14

Grid Resources - Data

• SDSC Resources – HPSS:

• SDSC's central long-term data storage system,• one of the world's largest IBM High Performance Storage System

(HPSS) units,• currently holds more than a petabyte (a million gigabytes) of data in

approximately 21 million files,• It has the capacity to store six petabytes of data; files are added at an

average rate of 10,000 gigabytes per month.

– Storage-Area Network (SAN): • A 72-processor Sun Microsystems SunFire 15K high-end server and 11

Brocade switches (1,400 ports) • 225,000 gigabytes of networked disk storage for data-oriented

applications.

• 1 TB of data = $2500

15

Protein Data Bank (PDB)

16

Putting it all together… TeraGrid

17

Grid Market

18

Grid Companies

• IBM– “on demand” solutions

• Sun Microsystems– N1 initiative

• Oracle– 10g

• Dell

• HP– “utility” computing

• Platform Computing– LSF, metaclulstering

• United Devices– Desktop grids

• DataSynapse• Akamai• Google?• Sony online

entertainment?

• Where’s Microsoft?

19

Grid Organizations

• Global Grid Forum (GGF)

• Organization for the Advancement of Structured Information Standards (OASIS)

• Distributed Management Task Force (DMTF)

• World Wide Web Consortium (W3C)

• Globus Alliance

• NSF Middleware Initiative (NMI)

• NASA IPG

• DOE Science Grid

• EU DataGrid

• NSF TeraGrid

20

Technical Challenges for Grid Computing

21

Challenges: Security

• Grids traverse organizational boundaries– Different administration domains have different authentication

mechanisms– Resources have different use agreements and sharing priorities

• Single sign-on– Multiple passwords difficult to manage

• Rights delegation• Trust

– Authentication of users– Authorization of users– Resource access

22

Security• Public Key Infrastructure

– Public key A.public– Private key A.private

• Supports Encrpyption– Message to B:

• m’ = F(m,A.private), send m’ to B• recv m’, m = F’(m’,A.public)

• Digital Signatures– Signed message to B:

• m’ = (m,F(m,A.public))

– Receiver verifies that m’ is from A and not tampered

23

Grid Security Infrastructure (GSI)

• A central concept in GSI authentication is the certificate.

• Every user and service on the Grid is identified via a certificate, a text file containing the following information:– a subject name identifying the person

or object that the certificate represents, – the public key belonging to the

subject, – the identity of a Certificate Authority

(CA) that has signed the certificate to certify that the public key and the identity both belong to the subject,

– the digital signature of the named CA.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

24

Proxy Certificate

• A proxy consists of a new certificate with a new public and private key.

• The new certificate contains the owner's identity modified slightly to indicate that it is a proxy.

• The new certificate is signed by the owner rather than a CA.

– This is called a self-signed certificate.

• The certificate also includes a time notation after which the proxy should no longer be accepted by others.

• Proxies have limited lifetimes in order to minimize the security vulnerability.

• Because the proxy isn't valid for very long, it doesn't have to kept quite as secure as the owner's private key.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

25

Mutual Authentication

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

26

Additional Challenges

• Certificate Management– MyProxy

• Role-based Access Control– CAS, VOM

• Authorization services• Integration with

applications & Portals

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

27

Challenges: Resource Management

• Resources loosely-coupled– Higher network latencies– Planned and unplanned disruptions

• How to provide QoS guarantees?

• Case Study: Entropia Desktop Grids– Additional trust/security issues

29

Entropia 1: Gimps• Over 1.5 Billion

CPU hours served

• 300,000+ machines, over 4 years operational

• Every PC and hardware config imaginable (proc, memory, disk, etc.)

• Every networking hookup imaginable

• Found 35th, 36th, 37th, 38th, and 39th Mersenne Primes

30

Entropia 2: FightAids@home

• Sept 2000 launch• Internet-Based• 54,657 total

machines• 10,770,506 total

hours of computation

• 27,881 peak billions of calculations/sec

31

Entropia 3: DCGrid

• Enterprise focus– Tremendous resources available in enterprise– Complements other HPC resources

• Computing Platform– Arbitrary application (open scheduling model)– Security, unobtrusiveness, manageability guaranteed

• Focus on – Pharmaceuticals, Chemicals, and Materials – Financial Services

32

DCGrid Architecture

35

Server vs. Desktop Grids

• Server environment– Fixed IP, always connected

– Always-on operation

– Moderate number of systems (10’s – 100’s)

– Dedicated use, trusted systems

• Desktop environment– Dynamic, temporary IP, intermittent connection

– Off evenings, off weekends, off lunch

– Large numbers of systems (100’s – 1000’s - ?)

– Shared resources, potentially untrusted users

• These differences give rise to desktop Grid challenges

36

Typical PC-Grid Environment

0

100

200

300

400

500

600

700

552 576 600 624 648 672 696 720

Time (hours)

37

PC-Grid Challenges

• Provide a stable compute environment for apps– Isolate app from variable desktop environment

• Operate in environment of dynamic use– Unobtrusiveness and Fault Tolerance are key!

• Provide simple application integration– Support ANY Application without modification

• Provide centralized management console– Zero additional management costs

38

JobManagement

ResourceSchedulinng

Physical NodeManagement

Job Manager

Subjob Scheduler

Node Manager

End-user

Entropia Clients

computation

resource

resource description

Workflow

2

3

45

6b

1

7

8

a

39

Stable Compute Environment

• Entropia Proprietary Sandbox– Binary-level protection

– System virtualization (registry, file system, network)

• Open Scheduling Infrastructure– Intelligent scheduling (match resources to subjobs

requirements)

– Manage subjob redundancy/fault tolerance

40

Manage Dynamic Use

• PC primary use must be respected!• Entropia Proprietary Sandbox

– Guaranteed to run at idle priority– Limit application capability– Monitor page faults, network access

• Management– Provide time-of-use windows– Different levels of unobtrusiveness

• Gathers 95+ % of cycles

41

Application Integration

• Support any Win32 binary– Language Neutral (C, C++, Fortran, Java,C#, etc.)

– Compiler/library Neutral

Client1 *

Client2 *

Open Grid Platform

App A

App B

App C

qsubqstat…

ApplicationPreparation Tools

Run Applications

42

Manageability

43

Application Performance

0

5

10

15

20

25

30

35

40

0 25 50 75 100 125 150

Number of Clients

Sequences per hourEntropia

1CPU SGI

1CPU SUN

Linear (Entropia)

0

50

100

150

200

250

300

350

400

0 100 200 300 400 500 600

Number of Clients

Throughput (Packets per Hour)

0

20

40

60

80

100

120

140

160

0 5 10 15 20 25 30 35 40 45 50

Number of Clients

Compounds per Hour

GOLD

AUTODOCK

HMMER

0

1000

2000

3000

4000

5000

6000

7000

0 100 200 300 400 500

Number of Clients

Compounds per Hour

DOCK

44

Scheduling PerformanceJob 14 Nodes (94 clients)

0

10

20

30

40

50

60

70

80

90

100

0 3600 7200 10800 14400 18000 21600

Time (secs)

Client ID

45

Challenges: Service Interoperability

• Trying to force homogeneity on users is futile. Everyone has their own preferences, sometimes even dogma.

• The Internet provides the model…

46

Typical Application

WebBrowser

ComputeServer

DataCatalog

DataViewer

Tool

Certificateauthority

ChatTool

CredentialRepository

WebPortal

ComputeServer

Resources implement standard access & management interfaces

Collective services aggregate &/or

virtualize resources

Users work with client applications

Application services organize VOs & enable

access to other services

Databaseservice

Databaseservice

Databaseservice

SimulationTool

Camera

Camera

TelepresenceMonitor

RegistrationService

47

Typical Application

• Implementations are provided by a mix of– Application-specific code

– “Off the shelf” tools and services

– Tools and services from the Globus Toolkit

– Tools and services from the Grid community (compatible with GT)

• Glued together by…– Application development

– System integration

48

How it Really Happens(without the Grid)

WebBrowser

ComputeServer

DataCatalog

DataViewer

Tool

Certificateauthority

ChatTool

CredentialRepository

WebPortal

ComputeServer

Resources implement standard access & management interfaces

Collective services aggregate &/or

virtualize resources

Users work with client applications

Application services organize VOs & enable

access to other services

Databaseservice

Databaseservice

Databaseservice

SimulationTool

Camera

CameraTelepresence

Monitor

RegistrationService

A

B

C

D

E0Grid

Community

0Globus Toolkit

13Off the Shelf

9Application Developer

49

How it Really Happens(with the Grid)

WebBrowser

ComputeServer

GlobusMCS/RLS

DataViewer

Tool

CertificateAuthority

portlet

MyProxy

Portal

ComputeServer

Resources implement standard access & management interfaces

Collective services aggregate &/or

virtualize resources

Users work with client applications

Application services organize VOs & enable

access to other services

Databaseservice

Databaseservice

Databaseservice

SimulationTool

Camera

CameraTelepresence

Monitor

Globus IndexService

GlobusGRAM

GlobusGRAM

GlobusDAI

GlobusDAI

GlobusDAI

4Grid Community

4Globus Toolkit

9Off the Shelf

2Application Developer

50

Theory -> Practice

51

What You Get in the Globus Toolkit

• OGSI(3.x)/WSRF(4.x) Core Implementation– Used to develop and run OGSA-compliant Grid Services (Java,

C/C++)

• Basic Grid Services– Popular among current Grid users, common interfaces to the most

typical services; includes both OGSA and non-OGSA implementations

• Developer APIs– C/C++ libraries and Java classes for building Grid-aware

applications and tools

• Tools and Examples– Useful tools and examples based on the developer APIs

52

Components in Globus Toolkit 3.0

GSI

WS-Security

Data Managemen

tSecurity

WSCore

Resource Managemen

t

Information Services

RFT(OGSI)

RLS

WU GridFTPJAVA

WS Core(OGSI)

OGSI C Bindings

MDS2

WS-Index(OGSI)

Pre-WSGRAM

WS GRAM(OGSI)

53

Components in Globus Toolkit 3.2

GSI

WS-Security

CAS(OGSI)

SimpleCA

Data Managemen

tSecurity

WSCore

Resource Managemen

t

Information Services

RFT(OGSI)

RLS

OGSI-DAI

WU GridFTP

XIO

JAVAWS Core(OGSI)

OGSI C Bindings

MDS2

WS-Index(OGSI)

Pre-WSGRAM

WS GRAM(OGSI)

OGSI Python Bindings

(contributed)

pyGlobus(contributed)

54

Planned Components in GT 4.0GSI

WS-Security

CAS(WSRF)

SimpleCA

Data Managemen

tSecurity

WSCore

Resource Managemen

t

Information Services

Authz Framework

RFT(WSRF)

RLS

OGSI-DAI

New GridFTP

XIO

JAVAWS Core(WSRF)

C WS Core(WSRF)

MDS2

WS-Index(WSRF)

Pre-WSGRAM

WS-GRAM(WSRF)

CSF(contribution)

pyGlobus(contributed)

55

Grid and Web Services Convergence

The definition of WSRF means that the Grid and Web services communities can move forward on a common base.

Grid

Services

Example

• (from sotomayor tutorial)

• MathService API:

– add(int x)

– subtract(int x)

– getvalue()

Note 1: How is this different than - Web Services? - Corba? - COM/DCOM?

Note 2: This is too simple! What about - co-ordination/workflows - personalization - presentation - security

OGSI

(or

what is a

grid service?)

• Using web service infrastructure

– MathService is defined by WSDL (like idl)

<?xml version="1.0" encoding="UTF-8"?>...<types><xsd:schema targetNamespace="http://www.gt3tutorial.org/namespaces/0.2/core/gwsdl/Math" attributeFormDefault="qualified" elementFormDefault="qualified" xmlns="http://www.w3.org/2001/XMLSchema"> <xsd:element name="add"> <xsd:complexType> <xsd:sequence> <xsd:element name="value" type="xsd:int"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="addResponse"> <xsd:complexType/> </xsd:element>...</types>

<message name="AddInputMessage"> <part name="parameters" element="tns:add"/></message><message name="AddOutputMessage"> <part name="parameters" element="tns:addResponse"/></message>...

<gwsdl:portType name="MathPortType" extends="ogsi:GridService"> <operation name="add"> <input message="tns:AddInputMessage"/> <output message="tns:AddOutputMessage"/> <fault name="Fault" message="ogsi:FaultMessage"/> </operation> <operation name="subtract"> <input message="tns:SubtractInputMessage"/> <output message="tns:SubtractOutputMessage"/> <fault name="Fault" message="ogsi:FaultMessage"/> </operation> <operation name="getValue"> <input message="tns:GetValueInputMessage"/> <output message="tns:GetValueOutputMessage"/> <fault name="Fault" message="ogsi:FaultMessage"/> </operation></gwsdl:portType>

</definitions>

Basic

Concepts

The

GridService

PortType

• a “grid service” is a web service that implements the GridService PortType

<portType name="GridService"><operation name="setServiceData"> [snip] </operation><operation name="destroy"> [snip] </operation><operation name="requestTerminationAfter"> [snip] </operation><operation name="requestTerminationBefore"> [snip] </operation><operation name="findServiceData"> [snip] </operation></portType>

<gwsdl:portType name="GridService"><sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="constant" name="interface" nillable="false" type="xsd:QName"/> <sd:serviceData maxOccurs="unbounded" minOccurs="0" modifiable="false" mutability="mutable" name="serviceDataName" nillable="False" type="xsd:QName"/> <sd:serviceData maxOccurs="1" minOccurs="1" modifiable="false" mutability="mutable" name="factoryLocator" nillable="true" type="ogsi:LocatorType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="0" modifiable="false" mutability="extendable" name="gridServiceHandle" nillable="false" type="ogsi:HandleType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="mutable" name="gridServiceReference" nillable="false" type="ogsi:ReferenceType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="static" name="findServiceDataExtensibility" nillable="false" type="ogsi OperationExtensibilityType"/> <sd:serviceData maxOccurs="unbounded" minOccurs="1" modifiable="false" mutability="static" name="setServiceDataExtensibility" nillable="false" type="ogsi:OperationExtensibilityType"/> <sd:serviceData maxOccurs="1" minOccurs="1" modifiable="false" mutability="mutable" name="terminationTime" nillable="false" type="ogsi:TerminationTimeType"/> <sd:staticServiceDataValues> <ogsi:findServiceDataExtensibility inputElement="ogsi:queryByServiceDataNames"/> <ogsi:setServiceDataExtensibility inputElement="ogsi:setByServiceDataNames"/> <ogsi:setServiceDataExtensibility inputElement="ogsi:deleteByServiceDataNames"/> </sd:staticServiceDataValues></gwsdl:portType>

GridService

PortType

• FindServiceData()• QueryByServiceDataNames()• GetServiceData()• SetByServiceDataNames()• DeleteByServiceDataNames()• RequestTerminationAfter()• RequestTerminationBefore()• Destroy()

Capabilities

of a

Grid

Service

• 2-level naming (GSH vs. GSR)

• Factories

• Lifetime management

• Service Data Elements

• Event Notification

• ServiceGroups

GSH

versus

GSR

• A GSH (Grid Service Handle) is a unique name for a Grid Service Instance

• A GSR (Grid Service Reference) is a perhaps temporary mechanism to access the Grid Service Instance

Factories

• Create new instances of services dynamically

• Individualized Instances

• lifetime management techniques

Service

Data

Elements

• Generalized State

– useful for describing capability

– Get/Set model similar to javaBeans Properties

• Can specify initial values in WSDL

• Integrated with Notification mechanism

Service

Data

Elements:

GridService

• Interface

• ServiceDataName

• FactoryLocator

• GridServiceHandle

• GridServiceReference

• TerminationTime

Notifications

• Source – implements NotificationSourcePortType– sends a notification message (XML Element) to Sinks• Sink– implements NotificationSinkPortType– sends a notification subscription request to source– causes a GridService Instance of porttype NotificationSubscription to be created

ServiceGroups

• A grid service that maintains information about other grid services• Can be used to implement a classic registry model• Can be used for dataset replication• A grid service can belong to more than one Service Group• Membership in a ServiceGroup can be homogeneous or heterogeneous• Service group portTypes are optional

Grid

Services:

Summary

• Extends Web Services to support Transient Services– WSDL 1.2 expected to include extensions• Requires support for factories, lifetime management, soft-state management, and

notifications• Java implementation pretty solid– Security implementation still shaky

69

Other Challenges

• Developing user interfaces

• Data Management

• Scheduling/co-scheduling of resources

• Failure management

• Application development

• Performance

• Many others…

70

What I hope you got from this talk

• Grid Computing is about – Co-ordinated use of different resources– Provisioning resources for increased utilization– Scaling to large numbers of resources, services

and users

• Many systems being built

• Many Applications being developed

top related