the panasas® parallel storage clusterrich/crc_summer_scholars_2015/panasas_bootcamp.pdf2 shelf –...

25
The Panasas® Parallel Storage Cluster Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Upload: others

Post on 12-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

The Panasas® Parallel Storage Cluster

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 2: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

What Is It?

What Is The Panasas ActiveScale Storage Cluster

•A complete hardware and software storage solution

•Implements

•An Asynchronous, Parallel, Object-based, POSIX compliant Filesystem

•A Global Namespace

•Strict client cache coherency

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 3: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Physically, How Is It Organized?

•A shelf is 4U high and contains slots from 11 blades

•0-3 DirectorBlades per shelf, with the remaining slots for StorageBlades

1 DirectorBlade

10 StorageBlades

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 4: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Terminology

Metadata – The information that describes the data contained in files

•Size, create time, modify time, location on disk, permissions

Block Based Filesystem – A filesystem in which the client accessed files

based on the physical location on disk.

File Based Filesystem – A filesystem where a client requests a file by name.

Object Based Filesystem – In this case, the filename is abstracted into a

identifier. We will discuss this later.

RAID – multiple disks arranged into one physical disk tuned for redundancy

or speed.

JBOD – (just a bunch of disks) multiple disks access directly rather than in

an array.

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 5: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Direct Attached Storage (local filesystem)

Private storage for a host operating system

IDE connected internal hard drive

Serial ATA or SCSI attached drives

USB drives

Examples: ext3, reiserFS, NTFS, ufs, FAT32

This discussion is mostly about distributed file systems

Problems of scale require lots of storage computers working together

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 6: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Network Attached Storage

File Server exports storage at the file level

NFS/CIFS are widely deployed

NFS is the only official file system standard

Scalability limited by server hardware

Moderate number of clients (10’s to 100’s)

Moderate amount of storage (few TB)

A nice model until it runs out of steam

“Islands of storage”

Bandwidth to a file limited by its server

NetApp (ONTAP 7.x), Sun, HP, SnapServer, EMC

Celerra, StorEdge NAS, IBM TotalStorage NAS,

whitebox Linux

NAS Head

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 7: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

More scalable than single-headed NAS

Multiple NAS heads share back-end storage

“In-band” NAS head still limits performance and

drives up cost

Two primary architectures

Forward requests to “owner Head”

Export NAS from shared file system

NFS does not provide a good mechanism for

dynamic load balancing

Clients permanently mount a particular Head

GPFS, Isilon OneFS, IBRIX, Polyserve, NetApp-

GX, BlueArc, Exanet ExaStore, ONStor, Pillar Data,

IBM/Transarc AFS, IBM DFS

Clustered NAS

NAS Heads

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 8: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Storage Area Network

Common management and provisioning for host storage

Block Devices (JBOD or RAID) accessible via iSCSI or FC network

Wire-speed/RAID-speed performance potential

Proprietary solutions for shared file systems

Scalability limited by block management on metadata server (e.g., 32 nodes)

NAS access provided by “file head” that re-exports the SAN file system

Asymmetric (pictured) or Symmetric implementations

Clients Storage

SAN

Metadata Server(s)

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 9: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Object Based Storage Clusters

Block and file interfaces replaced with an object abstraction.

Block management pushed all the way out to the disks.

Allows for parallel and direct access to disks

Requires non-standards based Client

Luster, Panasas

OSD Clients

Object (OSD) Storage

Metadata Server

data

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 10: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

pNFS: Standard Storage Clusters

pNFS is an extension to the Network File System v4 protocol standard

Allows for parallel and direct access

From Parallel Network File System clients

To Storage Devices over multiple storage protocols

Moves the Network File System server out of the data path

pNFS Clients

Block (FC) / Object (OSD) / File (NFS) Storage NFSv4.1 Server

data

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 11: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

RAID

Redundant Array of Independent Drives

Many physical disks bound together with

hardware or software.

Multiple layouts to accommodate

performance and fault tolerance requirements.

Used to create larger filesystems out of

standard drive technology.

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 12: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Comparing Technology

How does an object-based, parallel filesystem compare to traditional

storage solutions?

vs. Direct Attached Storage

oSeparate control and data paths. Metadata and data workloads are distributed.

oMultiple access points for redundancy and scalability

oNo need to balance expensive server resources between applications and storage

access

vs. Network Attached Storage

oScalability and ease of management in very large installations

vs. Storage Area Networks

oClients access storage directly, no intermediary gateway

oAll communication is IP based, choose your infrastructure

Low cost, high bandwidth Gigabit or 10-Gigabit Ethernet

Higher cost, low latency Infiniband

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 13: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Panasas Object-Based Storage Cluster

Consist of two primary components

Object Storage Devices (OSD): StorageBlades

MetaData Manager: DirectorBlades

Directors implement file system semantics

Access control, cache consistency, user identity, etc.

Directors have rights to perform these object operations

Create, delete, create group, delete group

Get attributes and set attributes

Clone group, copy-on-right support for snapshots

Clients perform direct I/O with these object operations

Read, write

Get attributes, set (some) attributes

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 14: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Balanced storage device

CPU, SDRAM, GE NIC and 2 spindles, 2x2TB SATA

Commodity parts drive low cost

Performance scales with capacity

Panasas StorageBlade (OSD)

Single Seamless Namespace! Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 15: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

DirectFLOW client is kernel loadable FS module

Implements standard Vnode interface

Uses native Panasas network protocols (RPC and iSCSI)

Caches data, directories, attributes, capabilities

Responds to callbacks for cache consistency

Does RAID I/O directly to StorageBlades w/ iSCSI/OSD

DirectFLOW Client

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 16: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Metadata manager

Realm Control – admit blades, start/stop services, failover

File Manager – access control, cache consistency, file system semantics

Storage Manager – file virtualization (maps), recovery, reconstruction

Management console

Web-based GUI or Command Line Interface (CLI)

Status, charts, reporting

Storage management

Gateway function (NFS/CIFS) collocated on DirectorBlade

Fast processor and large main memory

Multiple DirectorBlades allow service replication for fault tolerance

DirectorBlades

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 17: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

AC Power

Each shelf has dual power supplies and battery

Automatic graceful shutdown if you lose AC power

Masks brownouts and short (5-sec) power glitches

Thermal

800 Watts in 4u!

Power supplies and batteries have fans that cool the shelf

Blades, power supplies, batteries, network cards all monitor tempurature

Warnings generated near temperature limit

Unilateral blade shutdown if a blade gets very hot

Graceful shutdown of a whole shelf if multiple blades are hot

Environment

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 18: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Bladeset is a storage (OSD) failure domain

Single OSD failure results in degraded operation and reconstruction

Two OSD failures results in data unavailability

Bladesets can be expanded or merged (but not unmerged) for growth

Capacity balancing occurs within a bladeset

Volume is a file hierarchy with a quota

One or more volumes compete for space within a bladeset

No physical boundaries between volumes, except quota limits

Volume is unit of DirectFlow metadata work

Each director blade manages one or more volumes

NFS/CIFS gateway workload is orthogonal to DirectFlow metadata

All director blades provide uniform/symmetric NFS/CIFS access

Bladesets and Volumes

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 19: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

What Problems Does It Solve?

It’s all about removing the bottlenecks in traditional storage

No RAID engine bottleneck

oClient driven RAID scales as the number of clients increases

oMultiple Volumes or DirectorBlades for Scalable Reconstruction

No Network Uplink bottleneck

o10GigE port or 4-Port Gig-E Link Aggregation Group per Shelf

Flexible, per File layouts (SDK Required)

oRAID1/5 for large streaming I/O

oRAID10 for N-to-1 Writes or Random I/O

oCustomizable Stripe width and depth

Control the number of spindles

Parity Overhead

Global Namespace

Single, web browser based management interface of 100’s of TBs

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 20: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Customizing For Your Environment

Pick your protocol

DirectFLOW, NFS, CIFS, any combination at one time

oMore Director Blades for NFS / CIFS performance

oMore Storage Blades for DirectFLOW performance

Interactive vs. Batch Processing

ActiveStor 5000 w/ larger cache sizes on Storage Blades for Interactive work

Fault tolerance

Configurable spares for multiple sequential Storage Blade failures

Configurable bladeset sizes for simultaneous Blade failure risk mitigation

Redundant network links

Storage Capacity Options

Smaller Capacity Blades

oMore spindles, less data to reconstruct, more shelves

Larger Capacity Blades

oFewer shelves, reduced double-disk failure risk

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 21: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Logically, How Does Data Flow

Linux Client w/ DirectFLOW Filesystem Client

IP Network IP Network

Director Blades

Storage Blades

NFS / CiFS Clients

NFS / CiFS Gateway

Metadata Manager

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 22: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Logically, How Does Data Flow

Linux Client w/ DirectFLOW Filesystem Client

IP Network IP Network

Director Blades

Storage Blades

NFS / CiFS Clients

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

NFS / CiFS Gateway

Metadata Manager

DB1

DB2

DB3

DB4

DB5

DB6

An Example Six Shelf System

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 23: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Logically, How Does Data Flow

Linux Client w/ DirectFLOW Filesystem Client

IP Network IP Network

Director Blades

Storage Blades

NFS / CiFS Clients

Blad

eset 1

Blad

eset 2

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Blad

eset 3

NFS / CiFS Gateway

Metadata Manager

DB1

DB2

DB3

DB4

DB5

DB6

An Example Six Shelf System, with Three Bladesets

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 24: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

Logically, How Does Data Flow?

Linux Client w/ DirectFLOW Filesystem Client

IP Network IP Network

Director Blades

Storage Blades

NFS / CiFS Clients

Blad

eset 1

Blad

eset 2

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Blad

eset 3

Vo

l1

Vo

l2

Vo

l3

Vo

l4

Vo

l5

Vo

l6

Vo

l7

NFS / CiFS Gateway

Metadata Manager

DB1

DB2

DB4

DB5

Vo

l1

Vo

l2

Vo

l4

Vo

l6

DB6

Vo

l5

Vo

l7

DB3

Vo

l3

An Example Six Shelf System, with Three Bladesets and Eight Volumes

Vol8

Vo

l8

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 25: The Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2015/Panasas_bootcamp.pdf2 Shelf – 10 SBs Shelf – 10 SBs Shelf –10 SBs Shelf – 10 SBs Shelf – 10 SBs Shelf

How Do I Manage 100’s of TB?

All from a single http:// or command

line interface

PanActive Manager: Single GUI for

entire namespace management

Simple out-of-box experience

Seamlessly adopt new blades

Capacity & load balancing

Volumes and quotas

Snapshots

1-touch reporting capabilities for

capacity trends, asset ID, and

performance

Email and/or pager notification of

errors and warnings

Scriptable CLI for all features

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.