www.grid.org.il distributed data management for compute grid presented by michael di stefano founder...
Post on 22-Dec-2015
213 Views
Preview:
TRANSCRIPT
www.Grid.org.il
Distributed Data Management for
Compute Grid
Presented by Michael Di Stefano
Founder of Author of
Meeting: Tuesday, September 13th, 2005
www.Grid.org.il
Slide - 2 -
Agenda
Data Management - The Next Grid Problem
Evolution in Compute Topology
Objectives of Data Management
New Topology – New Data Management Techniques
New Techniques, New Research, Emergence of Standards
www.Grid.org.il
Slide - 3 -
Two Components of The Grid
Compute GRID
The Grid Operating System - provides the core services for grid computing
– Physical Resource Accounting
– Process Task Queues
– Management of Task/Resource Execution
Data GRID
Data Management System of Grid - Manages all aspects
– Enterprise Data
– Data Scheduling
– Replication
– Availability
– Legacy Access
Compute Grid
Data Grid
www.Grid.org.il
Slide - 4 -
Compute Grids
Roll your own Compute Grid
Free Versions of Compute Grids
Product and Supported Compute Grids
www.Grid.org.il
Slide - 5 -
Data Grids
Data Grid Engine - Movement of Bits and Bytes
FTP
Sockets
Middleware (messaging)
Caches
Applications Perspective
Multiple Data Characteristics
Quality of Service
Data Management not Bit/Byte Movement
www.Grid.org.il
Slide - 7 -
15 Years of Distributed Computing Evolution
Sockets
CORBA
Messaging
Internet
ApplicationServers
Tight Bindings
Loose Coupling
Publish / Subscribe
GridTopology
Emerging from the
“Evolutionary Mist”
Client/Server
© Integrasoft, L.L.C. 2005
www.Grid.org.il
Slide - 8 -
Evolution
Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005
www.Grid.org.il
Slide - 9 -
The Grid Topology
Client / Server
Compute Grid
Physical
Operational OperatingSystem
PhysicalCPUPeripheralsExecutionThreads
OperatingSystem
Physical NodesResource/Node ManagementInventory of Work/TasksResource InventoryMatching of Task to Recourse
Close Proximity(Mother Board)
Diverse CPU FamiliesDiverse GeographyDiverse Network Bandwidth
www.Grid.org.il
Slide - 10 -
Application on the Grid
Multiple Data Sources and Destinations
Client Information
Portfolio Information
Market Data
Quality of Service Levels
Application in its entirety
Application components
Speed of Access
Query
Updates (Transactional, Optimistic)
www.Grid.org.il
Slide - 11 -
How QoS is Delivered Today
Relational Databases SQL Query Transactional Updates Stored Procedures
Middleware Queuing Various delivery modes Publish and Subscribe Easy Programmatic API
Other Object Databases Object Relational
Data flow and movement is optimized.
Designed to meetApplication QoSFor Client/Server
Topology
www.Grid.org.il
Slide - 12 -
Application Today in Client/Server
Threads
RAM
Connection Pools
Tailored Middleware
Business
Application
Server Machine
www.Grid.org.il
Slide - 13 -
What Happens in a Grid
Business
Application
Server Machine
Compute Grid
www.Grid.org.il
Slide - 14 -
The Data Access Funnel
Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005
www.Grid.org.il
Slide - 15 -
Data Grid Eliminates the Funnel
Distributed Data Management for Grid Computing
Copyright John Wiley and Sons 2005
www.Grid.org.il
Slide - 16 -
Goals of a Data Management in Grid
The Big 3 Goals of Data Management in Grid
Optimize Data Affinity
– Minimize Data Movement
– Optimize the recourse of the Network
Maintain Business Application QoS for Data Management
Integrate Legacy Systems into the Grid
www.Grid.org.il
Slide - 17 -
How do Achieve Goals of the Data Grid
What the Architect/Developer must Address
How many copies or “Replicas” of data are needed in the Data Grid?
How fine is the granularity of my “Data Atoms” to be replicated?
How do best to “Distribute” Data Atoms across the Data Grid?
What level of “Synchronization” is required?
How do “logically group” data along business lines?
How to “Integrate” and “Operate” legacy data sources?
How to manage “Events” in the Data Grid?
Synchronization of data sources external to the Data Grid?
www.Grid.org.il
Slide - 18 -
Data Management in Grid
Granularity of Data Atoms
Replication
Distribution
Logical Data Groupings (Data Regions)
Synchronization InterRegion IntraRegion External Data Sources
Events
Integration with Legacy Systems
Nothing todo with
mechanicsof the bitsand bytes
These areData
ManagementIssues
www.Grid.org.il
Slide - 19 -
Data Management is NOT Caching
Distributed Data Management for Grid Computing
Copyright John Wiley and Sons 2005
Moves the bits and bytes-Cache-Grid FTP-Others
Data Management to deliverBusiness Application’s QoSgiven the “compute topology”
www.Grid.org.il
Slide - 20 -
Engines of a Data Grid
Cache
Java based engines such as JCache, Java Spaces, …
Various C++ Caches
Recycled Object Data Base Technology
FTP
Grid FTP
Meta Data Services
File Systems
NFS
Distributed File Systems
www.Grid.org.il
Slide - 21 -
Right Tool for the Job
Business Applications have specific QoS levels from the Data Grid
Complex Analysis of Large Data Sets
Dependency of small fast moving data sets
Large Static Data Sets
…….
www.Grid.org.il
Slide - 23 -
Business Drivers Fueling Grid
Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005
www.Grid.org.il
Slide - 25 -
No Data Management ToolsDifficult Custom CodeLong Time to Delivery
No Reuse
Business Prospective
Increased ComplexityImproved Performance
Financial ROI
Grid failsWide SpreadAcceptance
www.Grid.org.il
Slide - 26 -
Business Prospective
Financial ROI
With Data Management for GridEasy to use/understand
ReuseEffort on business
Increased ComplexityImproved Performance
Fast Time to MarketEase of Migration to GridChanges Data Centers
www.Grid.org.il
Slide - 27 -
Data Management in Grid
Granularity of Data Atoms
Replication
Distribution
Data Regions
Synchronization
Integration with Legacy Systems
If Distributed Data Management is not addressed, wide acceptance of Grid will fail.
www.Grid.org.il
Slide - 28 -
Measuring QoS to Determine Data Grid
Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005
www.Grid.org.il
Slide - 29 -
Measuring QoS to Determine Data Grid
Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005
Application QoS( Work(), Data(), Time(), Geography() Query() )
Where:
Work( batch/atomic, sync/async )
Data( overall size, atomic size, transient, query )
Time( RealTime, Non-RealTime, Near-RealTime )
Geography( Topology, Bandwidth )
Query( Basic, Complex )
www.Grid.org.il
Slide - 30 -
Objective of Data Grid - Data Affinity
Low cost of CPU
Data size is determined by application
Network bandwidth is limited
Data and Work need to be co-located
Virtual Centrally Managed Data Base
Physically Distributed
www.Grid.org.il
Slide - 31 -
How to Achieve Data Affinity
Locate data and work close together to minimize data movement across the network
Reactive: Data Grid distributes data in anticipation of where work will be assigned. Distributed Data Management policies of Regionalization Replication Distribution Synchronization
Proactive: Routing of Task to Data. Compute Grid Task Scheduler queries Data Locality Information from Data Grid
www.Grid.org.il
Slide - 32 -
Distributed Data Management
Data Regions
Replication
Distribution
Synchronization
Load and Store
Event
www.Grid.org.il
Slide - 33 -
Distributed Data Management Policies
Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005
www.Grid.org.il
Slide - 34 -
Advanced Topics in Distributed Data Management
Natural Attraction Forces of Data BodiesWithin a Data Grid
To Describe Efficient Data Distribution Patterns---------------White Paper -------------
Michael Di Stefano September 2004
Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005
www.Grid.org.il
Slide - 35 -
Advanced Topics in Distributed Data Management
Natural Attraction Forces of Data BodiesWithin a Data Grid
To Describe Efficient Data Distribution Patterns---------------White Paper -------------
Michael Di Stefano September 2004
Distributed Data Management for Grid ComputingCopyright John Wiley and Sons 2005
top related