www.grid.org.il distributed data management for compute grid presented by michael di stefano founder...

36
www.Grid.org.il Distributed Data Management for Compute Grid Presented by Michael Di Stefano Founder of Author of Meeting: Tuesday, September 13 th , 2005

Post on 22-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

www.Grid.org.il

Distributed Data Management for

Compute Grid

Presented by Michael Di Stefano

Founder of Author of

Meeting: Tuesday, September 13th, 2005

www.Grid.org.il

Slide - 2 -

Agenda

Data Management - The Next Grid Problem

Evolution in Compute Topology

Objectives of Data Management

New Topology – New Data Management Techniques

New Techniques, New Research, Emergence of Standards

www.Grid.org.il

Slide - 3 -

Two Components of The Grid

Compute GRID

The Grid Operating System - provides the core services for grid computing

– Physical Resource Accounting

– Process Task Queues

– Management of Task/Resource Execution

Data GRID

Data Management System of Grid - Manages all aspects

– Enterprise Data

– Data Scheduling

– Replication

– Availability

– Legacy Access

Compute Grid

Data Grid

www.Grid.org.il

Slide - 4 -

Compute Grids

Roll your own Compute Grid

Free Versions of Compute Grids

Product and Supported Compute Grids

www.Grid.org.il

Slide - 5 -

Data Grids

Data Grid Engine - Movement of Bits and Bytes

FTP

Sockets

Middleware (messaging)

Caches

Applications Perspective

Multiple Data Characteristics

Quality of Service

Data Management not Bit/Byte Movement

www.Grid.org.il

Slide - 6 -

Evolution in Computing

Mainframe Mini Client/Server

www.Grid.org.il

Slide - 7 -

15 Years of Distributed Computing Evolution

Sockets

CORBA

Messaging

Internet

ApplicationServers

Tight Bindings

Loose Coupling

Publish / Subscribe

GridTopology

Emerging from the

“Evolutionary Mist”

Client/Server

© Integrasoft, L.L.C. 2005

www.Grid.org.il

Slide - 8 -

Evolution

Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005

www.Grid.org.il

Slide - 9 -

The Grid Topology

Client / Server

Compute Grid

Physical

Operational OperatingSystem

PhysicalCPUPeripheralsExecutionThreads

OperatingSystem

Physical NodesResource/Node ManagementInventory of Work/TasksResource InventoryMatching of Task to Recourse

Close Proximity(Mother Board)

Diverse CPU FamiliesDiverse GeographyDiverse Network Bandwidth

www.Grid.org.il

Slide - 10 -

Application on the Grid

Multiple Data Sources and Destinations

Client Information

Portfolio Information

Market Data

Quality of Service Levels

Application in its entirety

Application components

Speed of Access

Query

Updates (Transactional, Optimistic)

www.Grid.org.il

Slide - 11 -

How QoS is Delivered Today

Relational Databases SQL Query Transactional Updates Stored Procedures

Middleware Queuing Various delivery modes Publish and Subscribe Easy Programmatic API

Other Object Databases Object Relational

Data flow and movement is optimized.

Designed to meetApplication QoSFor Client/Server

Topology

www.Grid.org.il

Slide - 12 -

Application Today in Client/Server

Threads

RAM

Connection Pools

Tailored Middleware

Business

Application

Server Machine

www.Grid.org.il

Slide - 13 -

What Happens in a Grid

Business

Application

Server Machine

Compute Grid

www.Grid.org.il

Slide - 14 -

The Data Access Funnel

Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005

www.Grid.org.il

Slide - 15 -

Data Grid Eliminates the Funnel

Distributed Data Management for Grid Computing

Copyright John Wiley and Sons 2005

www.Grid.org.il

Slide - 16 -

Goals of a Data Management in Grid

The Big 3 Goals of Data Management in Grid

Optimize Data Affinity

– Minimize Data Movement

– Optimize the recourse of the Network

Maintain Business Application QoS for Data Management

Integrate Legacy Systems into the Grid

www.Grid.org.il

Slide - 17 -

How do Achieve Goals of the Data Grid

What the Architect/Developer must Address

How many copies or “Replicas” of data are needed in the Data Grid?

How fine is the granularity of my “Data Atoms” to be replicated?

How do best to “Distribute” Data Atoms across the Data Grid?

What level of “Synchronization” is required?

How do “logically group” data along business lines?

How to “Integrate” and “Operate” legacy data sources?

How to manage “Events” in the Data Grid?

Synchronization of data sources external to the Data Grid?

www.Grid.org.il

Slide - 18 -

Data Management in Grid

Granularity of Data Atoms

Replication

Distribution

Logical Data Groupings (Data Regions)

Synchronization InterRegion IntraRegion External Data Sources

Events

Integration with Legacy Systems

Nothing todo with

mechanicsof the bitsand bytes

These areData

ManagementIssues

www.Grid.org.il

Slide - 19 -

Data Management is NOT Caching

Distributed Data Management for Grid Computing

Copyright John Wiley and Sons 2005

Moves the bits and bytes-Cache-Grid FTP-Others

Data Management to deliverBusiness Application’s QoSgiven the “compute topology”

www.Grid.org.il

Slide - 20 -

Engines of a Data Grid

Cache

Java based engines such as JCache, Java Spaces, …

Various C++ Caches

Recycled Object Data Base Technology

FTP

Grid FTP

Meta Data Services

File Systems

NFS

Distributed File Systems

www.Grid.org.il

Slide - 21 -

Right Tool for the Job

Business Applications have specific QoS levels from the Data Grid

Complex Analysis of Large Data Sets

Dependency of small fast moving data sets

Large Static Data Sets

…….

www.Grid.org.il

Slide - 22 -

Business Drivers Fueling Grid

www.Grid.org.il

Slide - 23 -

Business Drivers Fueling Grid

Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005

www.Grid.org.il

Slide - 24 -

Limited Patience of Business

www.Grid.org.il

Slide - 25 -

No Data Management ToolsDifficult Custom CodeLong Time to Delivery

No Reuse

Business Prospective

Increased ComplexityImproved Performance

Financial ROI

Grid failsWide SpreadAcceptance

www.Grid.org.il

Slide - 26 -

Business Prospective

Financial ROI

With Data Management for GridEasy to use/understand

ReuseEffort on business

Increased ComplexityImproved Performance

Fast Time to MarketEase of Migration to GridChanges Data Centers

www.Grid.org.il

Slide - 27 -

Data Management in Grid

Granularity of Data Atoms

Replication

Distribution

Data Regions

Synchronization

Integration with Legacy Systems

If Distributed Data Management is not addressed, wide acceptance of Grid will fail.

www.Grid.org.il

Slide - 28 -

Measuring QoS to Determine Data Grid

Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005

www.Grid.org.il

Slide - 29 -

Measuring QoS to Determine Data Grid

Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005

Application QoS( Work(), Data(), Time(), Geography() Query() )

Where:

Work( batch/atomic, sync/async )

Data( overall size, atomic size, transient, query )

Time( RealTime, Non-RealTime, Near-RealTime )

Geography( Topology, Bandwidth )

Query( Basic, Complex )

www.Grid.org.il

Slide - 30 -

Objective of Data Grid - Data Affinity

Low cost of CPU

Data size is determined by application

Network bandwidth is limited

Data and Work need to be co-located

Virtual Centrally Managed Data Base

Physically Distributed

www.Grid.org.il

Slide - 31 -

How to Achieve Data Affinity

Locate data and work close together to minimize data movement across the network

Reactive: Data Grid distributes data in anticipation of where work will be assigned. Distributed Data Management policies of Regionalization Replication Distribution Synchronization

Proactive: Routing of Task to Data. Compute Grid Task Scheduler queries Data Locality Information from Data Grid

www.Grid.org.il

Slide - 32 -

Distributed Data Management

Data Regions

Replication

Distribution

Synchronization

Load and Store

Event

www.Grid.org.il

Slide - 33 -

Distributed Data Management Policies

Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005

www.Grid.org.il

Slide - 34 -

Advanced Topics in Distributed Data Management

Natural Attraction Forces of Data BodiesWithin a Data Grid

 To Describe Efficient Data Distribution Patterns---------------White Paper -------------

Michael Di Stefano September 2004

Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005

www.Grid.org.il

Slide - 35 -

Advanced Topics in Distributed Data Management

Natural Attraction Forces of Data BodiesWithin a Data Grid

 To Describe Efficient Data Distribution Patterns---------------White Paper -------------

Michael Di Stefano September 2004

Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005

www.Grid.org.il

Slide - 36 -

Purchasing Information

Please Visit

www.integrasoftware.com

To Purchase your copy of

“Distributed Data Management for

Grid Computing”

To receive a 15% discount.