copyright © 2007, sas institute inc. all rights reserved. sas and all other sas institute inc....

24
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Best Practices for Setting Up Computer Hardware in a Grid Environment Tom Keefer Cheryl Doninger Performance Analyst, SAS R&D Director, SAS

Upload: richard-otis

Post on 31-Mar-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Best Practices for Setting Up Computer Hardware in a Grid Environment

Best Practices for Setting Up Computer Hardware in a Grid EnvironmentTom Keefer Cheryl DoningerPerformance Analyst, SAS R&D Director, SASTom Keefer Cheryl DoningerPerformance Analyst, SAS R&D Director, SAS

Page 2: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Recipe for Success

review different grid architectures• different OS’s, network connectivity, storage solutions

show scalable through-put and sustained I/O as number of grid nodes increase

create reference architectures of successful grid configurations to help answer your questions

SAS Grid Computing lots of SAS users

Page 3: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

What is Grid Computing?

“Grid computing integrates, virtualizes, and manages resources (software and hardware) to provide a much larger, powerful distributed computing infrastructure."

Page 4: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Benefits of SAS on a Grid

increases scalability

increases availability

facilitates provisioning

increases flexibility

reduces costs

Virtual Data

Center=

Page 5: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Running SAS on a Grid

SAS Grid Manager

Distributed Enterprise Scheduling

Workload BalancingParallelized Workload

Balancing

Distribute parallelized SAS workloads to a shared pool of resources. Automatically find and use the best available resource

Distribute workloads to a shared pool of resources.

Automatically find and use the best available resource.

Distribute jobs within workflows to range of hosts.

Automatically find and use the best available resource for each job.

Page 6: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

What products can leverage SAS Grid Manager?SAS Grid Manager

Distributed Enterprise Scheduling

Workload BalancingParallelized Workload

Balancing

SAS Data Integration Studio

SAS Enterprise Miner

SAS Risk Dimensions

Any SAS program (with modification)

Any SAS program (with wrapper)including stored processes and SAS Enterprise Guide programs

SAS Data Integration Studio

SAS Web Report Studio

SAS Marketing Automation

SAS Marketing Optimization

Any SAS program

Page 7: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Metadata Server

Base SASSAS/ConnectSAS Grid ServerSAS Data Step Batch Server

Platform LSF

Grid Control Machine

Grid Node 1

Grid Node 2

Grid Node n

Grid Client+

Metadata Server

Management Console(Grid Manager plug-in) Platform Grid

Management Service

Platform LSF

Base SASSAS/ConnectSAS Grid ServerSAS Data Step Batch Server

Platform LSF Platform LSF

Base SASSAS/ConnectSAS Grid ServerSAS Data Step Batch Server

Platform Process Mgr

DIS or EM

Central File Server for:• Job Deployment Directories• Source and Target Data• SAS Log files

SAS Grid Architecture Topology

2

SASApp

21

1

1

1

. . .

Base SASSAS/ConnectSAS Workspace ServerSAS Grid ServerSAS Data Step Batch Server

2

1

1

3

SAS ProgramLSF

Page 8: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Keys To Success – Areas To Focus

node configuration• heterogeneous or homogeneous

number and type of processors

memory

storage/data access

no different than single server - just more systems.

Page 9: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Data Storage is The Key

sharable

throughput across the grid

scalable

locality of data• input files

• output files

• temporary files

• external data access

Page 10: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Shared File System Testing Efforts

Operating System File Sharing Technology

Red Hat Linux (RHEL 4) EMC Celerra Multi-Path File System on iSCSI (MPFSi)

Red Hat Linux (RHEL 4) Network Appliance (NFS)

Sun Solaris 10 Sun StorageTek QFS

Red Hat Linux (RHEL 4)* Global File System (GFS)

Windows* Polyserve / HP Matrix

AIX* IBM Global Parallel File System (GPFS)

HP-UX* Veritas Clustered File System (CFS)

*Efforts ongoing

Page 11: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Steps to Success With Grid

determine your system requirements• what does your application do?

• data flow diagram

architect your system

test throughput outside of SAS first• third party tools

• replicate your applications behavior (i/o pattern)

single node SAS tests, then scale out

Page 12: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

EMC MPFSi Architecture

Switch

“The Directory”

EMC Storage

Conversion

IP Traffic

Fiber Channel

Notes:

NAS

MPFSi client on nodes

network “managers”

leverage existing net/data

/work

/work

Page 13: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

EMC MPFSi Discussion Points based on previous “Highroad” product

SAS data integration benchmarking scenario

40 Linux grid nodes• dual core, dual Ethernet per node for data

• up to 160 simultaneous SAS processes

performance tips:• analyze throughput from node to storage – data flow!!

• watch placement of disk volumes for performance

• don’t allow non-grid activity on network

• separate client and admin network

• monitor director and data mover throughput

Page 14: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Network ApplianceNFS Architecture

Network Switch

Linux Nodes

NetApp FAS6030

(network storage)

Notes:

NAS

NFS client on nodes

leverage existing network

NFS everywhere/data

/work

ALL Ethernet

Page 15: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Netapp NFS Discussion Points

pure network file system implementation (NFS)

SAS data integration benchmarking scenario

10 Linux grid nodes• quad core* - single Ethernet per node for data

performance tips:• check throughput from node to storage – data flow!!!

• don’t allow non-grid activity on network

• separate client and admin network

• watch placement of disk volumes for performance

* important note: core to throughput per node ratio

Page 16: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Sun QFSArchitecture

Notes:

SAN

QFS software on nodes

QFS server “master”

fibre channel – node to disk

server nodes

Sun storage/data

/work

FC Switch

fibre channel

fibre channel

Page 17: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Sun QFS Discussion Points

pure fibre channel (SAN)

SAS data integration benchmarking scenario

up to 4 Solaris server nodes• 48 to 64 core grid nodes (144 total on grid)

• up to 180 simultaneous SAS processes

• up to 20 fiber channel connections per server

performance tips:• check throughput from node to storage – data flow!!!

• watch placement of disk volumes for performance

• setup of QFS master server

Page 18: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other Shared File System Technologies

SAN based – fibre channel• Multi-Path File System (MPFS) NOT iSCSI

• IBM Global Parallel File System (GPFS)

• Polyserve / HP Matrix

− only one available for windows!!

• Linux Global File System (GFS)

• Veritas Clustered File System (CFS)

NAS - Ethernet• NetApp with iSCSI SAS is continuing its

testing efforts with various partners.

Page 19: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Overall Best Practices for Shared File Systems

data flow diagram• understand your applications throughput requirements

before you talk to a storage vendor

monitoring and management tools are a must!

test throughput OUTSIDE of SAS first!

some technologies have volume placement limitations! • i.e. can you span all the arrays with a single volume?

analyze throughput per $ before you buy

availability…. backups….future scalability….

Page 20: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Scalable Performance Data Serveron a Grid

/spds/data1

/spds/data2

/spds/meta

/spds/indexSAN or NAS

each server / grid node runs its own instance of SAS

and SPDS Server

shared file systems

server / grid nodes

SPDS directories

bottom line: myspdslib.mysastable

is available on any server!

Page 21: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

SAS Really Scales in a Grid

scalable I/O throughput

lots of choices for OS, storage solution, etc.

our work will continue...

Page 22: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

More to See and Do...

“A Throughput-Intensive Compute and Storage Grid Using SAS® Grid Manager”• Somantak Chanda, American Express

• Tues 1:30-2:20, Northern Hemisphere E-2

SAS Grid demo booth #16

IT Intelligence for Grid Optimization- demo booth #53

Platform Computing – Alliance Café booth #87

various storage partners – Alliance Café

Page 23: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

For More Information...

scalability website:

http://support.sas.com/rnd/scalability/grid

today’s presentation

http://support.sas.com/rnd/scalability/grid/gridpapers.html

Page 24: Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks

Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.