file systems and panasas® parallel storage clusterrich/crc_summer_scholars_2016/panasas...cpu quad...

28
File Systems and Panasas® Parallel Storage Cluster Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Upload: others

Post on 09-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

File Systems and Panasas® Parallel Storage Cluster

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 2: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

What Is It?

What Is The Panasas ActiveScale Storage Cluster

•A complete hardware and software storage solution

•Implements

•An Asynchronous, Parallel, Object-based, POSIX compliant Filesystem

•A Global Namespace

•Strict client cache coherency

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 3: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Physically, How Is It Organized?

•A shelf is 4U high and contains slots from 11 blades

•0-3 DirectorBlades per shelf, with the remaining slots for StorageBlades

2 DirectorBlades

9 StorageBlades Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 4: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Terminology

Metadata – The information that describes the data contained in files

•Size, create time, modify time, location on disk, permissions

Block Based Filesystem – A filesystem in which the client accessed files

based on the physical location on disk.

File Based Filesystem – A filesystem where a client requests a file by name.

Object Based Filesystem – In this case, the filename is abstracted into a

identifier. We will discuss this later.

RAID – multiple disks arranged into one physical disk tuned for redundancy

or speed.

JBOD – (just a bunch of disks) multiple disks access directly rather than in

an array.

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 5: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

HDD Architecture

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

• Surface = group of tracks • Track = group of sectors • Sector = group of bytes • Cylinder: several tracks on corresponding surfaces

• spinning platter of special material

• mechanical arm with read/write head must be close to the platter to read/write data

• data is stored magnetically

• disks are random access meaning data can be read/written anywhere on the disk

Upper Surface Platter

Lower

Surface

Cylinder

Track

Sector

Arm

Actuator

Page 6: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

SSD Architecture

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

PROS

No moving parts - called “solid” state

Reads and writes to a flash memory

Faster startup: no spinning

Extremely low read latency

Deterministic: performance does not depend

on the location of the data

CONS

More expensive than hard disks(~3$/GB vs. 0.15$/GB)

Slower writes speed

Limited write erase time

High capacity SSDs may have significant higher power

requirements

SSD can get slower as it ages

Page 7: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Direct Attached Storage (local filesystem)

Private storage for a host operating system

IDE connected internal hard drive

Serial ATA or SCSI attached drives

USB drives

Examples: ext4, xfs, reiserFS, NTFS, ufs, FAT32

This discussion is mostly about distributed file systems

Problems of scale require lots of storage computers working together

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 8: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Network Attached Storage

File Server exports storage at the file level

NFS/CIFS are widely deployed

NFS is the only official file system standard

Scalability limited by server hardware

Moderate number of clients (10’s to 100’s)

Moderate amount of storage (few TB)

A nice model until it runs out of steam

“Islands of storage”

Bandwidth to a file limited by its server

NetApp (ONTAP 7.x), Sun, HP, SnapServer, EMC

Celerra, StorEdge NAS, IBM TotalStorage NAS,

whitebox Linux

NAS Head

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 9: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

More scalable than single-headed NAS

Multiple NAS heads share back-end storage

“In-band” NAS head still limits performance and

drives up cost

Two primary architectures

Forward requests to “owner Head”

Export NAS from shared file system

NFS does not provide a good mechanism for

dynamic load balancing

Clients permanently mount a particular Head

GPFS, Isilon OneFS, IBRIX, Polyserve, NetApp-

GX, BlueArc, Exanet ExaStore, ONStor, Pillar Data,

IBM/Transarc AFS, IBM DFS

Clustered NAS

NAS Heads

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 10: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Storage Area Network

Common management and provisioning for host storage

Block Devices (JBOD or RAID) accessible via iSCSI or FC network

Wire-speed/RAID-speed performance potential

Proprietary solutions for shared file systems

Scalability limited by block management on metadata server (e.g., 32 nodes)

NAS access provided by “file head” that re-exports the SAN file system

Asymmetric (pictured) or Symmetric implementations

Clients Storage

SAN

Metadata Server(s)

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 11: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Object Based Storage Clusters

Block and file interfaces replaced with an object abstraction.

Block management pushed all the way out to the disks.

Allows for parallel and direct access to disks

Requires non-standards based Client

Luster, Panasas

OSD Clients

Object (OSD) Storage

Metadata Server

data

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 12: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Hadoop Architecture

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

datanode

Page 13: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

RAID

Redundant Array of Independent Drives

Many physical disks bound together with

hardware or software.

Multiple layouts to accommodate

performance and fault tolerance requirements.

Used to create larger filesystems out of

standard drive technology.

Raid 0 Striping

Raid 1 Mirroring

Raid 10 Striping + Mirroring

Raid 5 Striping + Parity Min 3 disks

Raid 6 Striping + Parity Min 4disks

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 14: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Panasas RAID+

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Volumes stay online in DEGRADED mode even having more failure than RAID tolerates Directory structure remains Objects that can be reconstructed will be reconstructed “on demand” when accessed Access to unrecoverable files will retry (hang) or return an IO Error (can be changed)

Page 15: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Comparing Technology

How does an object-based, parallel filesystem compare to traditional

storage solutions?

vs. Direct Attached Storage

oSeparate control and data paths. Metadata and data workloads are distributed.

oMultiple access points for redundancy and scalability

oNo need to balance expensive server resources between applications and storage

access

vs. Network Attached Storage

oScalability and ease of management in very large installations

vs. Storage Area Networks

oClients access storage directly, no intermediary gateway

oAll communication is IP based, choose your infrastructure

Low cost, high bandwidth Gigabit or 10-Gigabit Ethernet

Higher cost, low latency Infiniband

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 16: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Panasas Object-Based Storage Cluster

Consist of two primary components

Object Storage Devices (OSD): StorageBlades

MetaData Manager: DirectorBlades

Directors implement file system semantics

Access control, cache consistency, user identity, etc.

Directors have rights to perform these object operations

Create, delete, create group, delete group

Get attributes and set attributes

Clone group, copy-on-right support for snapshots

Clients perform direct I/O with these object operations

Read, write

Get attributes, set (some) attributes

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 17: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Balanced storage device

CPU – Intel Xeon 1.73 GHz, RAM 8GB, GE NIC and 2 spindles, 2x6TB SATA

Dedicated SSD drive – 240GB for metadata

Commodity parts drive low cost

Performance scales with capacity

Panasas StorageBlade (OSD)

Single Seamless Namespace! Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 18: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

DirectFLOW client is kernel loadable FS module

Implements standard Vnode interface

Uses native Panasas network protocols (RPC and iSCSI)

Caches data, directories, attributes, capabilities

Responds to callbacks for cache consistency

Does RAID I/O directly to StorageBlades w/ iSCSI/OSD

DirectFLOW Client

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 19: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Metadata manager

Realm Control – admit blades, start/stop services, failover

File Manager – access control, cache consistency, file system semantics

Storage Manager – file virtualization (maps), recovery, reconstruction

CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB

Management console

Web-based GUI or Command Line Interface (CLI)

Status, charts, reporting

Storage management

Gateway function (NFS/CIFS) collocated on DirectorBlade

Fast processor and large main memory

Multiple DirectorBlades allow service replication for fault tolerance

DirectorBlades

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 20: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

AC Power

Each shelf has dual power supplies and battery

Automatic graceful shutdown if you lose AC power

Masks brownouts and short (5-sec) power glitches

Thermal

800 Watts in 4u!

Power supplies and batteries have fans that cool the shelf

Blades, power supplies, batteries, network cards all monitor temperature

Warnings generated near temperature limit

Unilateral blade shutdown if a blade gets very hot

Graceful shutdown of a whole shelf if multiple blades are hot

Environment

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Front

Page 21: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Bladeset is a storage (OSD) failure domain

Single OSD failure results in degraded operation and reconstruction

Two OSD failures results in data unavailability

Bladesets can be expanded or merged (but not unmerged) for growth

Capacity balancing occurs within a bladeset

Volume is a file hierarchy with a quota

One or more volumes compete for space within a bladeset

No physical boundaries between volumes, except quota limits

Volume is unit of DirectFlow metadata work

Each director blade manages one or more volumes

NFS/CIFS gateway workload is orthogonal to DirectFlow metadata

All director blades provide uniform/symmetric NFS/CIFS access

Bladesets and Volumes

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 22: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

What Problems Does It Solve?

It’s all about removing the bottlenecks in traditional storage

No RAID engine bottleneck

oClient driven RAID scales as the number of clients increases

oMultiple Volumes or DirectorBlades for Scalable Reconstruction

No Network Uplink bottleneck

o10GigE port or 4-Port Gig-E Link Aggregation Group per Shelf

Flexible, per File layouts (SDK Required)

oRAID1/5/6 for large streaming I/O

oRAID10 for N-to-1 Writes or Random I/O

oCustomizable Stripe width and depth

Control the number of spindles

Parity Overhead

Global Namespace

Single, web browser based management interface of 100’s of TBs

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 23: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Customizing For Your Environment

Pick your protocol

DirectFLOW, NFS, CIFS, any combination at one time

oMore Director Blades for NFS / CIFS performance

oMore Storage Blades for DirectFLOW performance

Interactive vs. Batch Processing

ActiveStor 5000 w/ larger cache sizes on Storage Blades for Interactive work

Fault tolerance

Configurable spares for multiple sequential Storage Blade failures

Configurable bladeset sizes for simultaneous Blade failure risk mitigation

Redundant network links

Storage Capacity Options

Smaller Capacity Blades

oMore spindles, less data to reconstruct, more shelves

Larger Capacity Blades

oFewer shelves, reduced double-disk failure risk

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 24: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Logically, How Does Data Flow

Linux Client w/ DirectFLOW Filesystem Client

IP Network IP Network

Director Blades

Storage Blades

NFS / CiFS Clients

NFS / CiFS Gateway

Metadata Manager

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 25: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Logically, How Does Data Flow

Linux Client w/ DirectFLOW Filesystem Client

IP Network IP Network

Director Blades

Storage Blades

NFS / CiFS Clients

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

NFS / CiFS Gateway

Metadata Manager

DB1

DB2

DB3

DB4

DB5

DB6

An Example Six Shelf System

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 26: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Logically, How Does Data Flow

Linux Client w/ DirectFLOW Filesystem Client

IP Network IP Network

Director Blades

Storage Blades

NFS / CiFS Clients

Blad

eset 1

Blad

eset 2

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Blad

eset 3

NFS / CiFS Gateway

Metadata Manager

DB1

DB2

DB3

DB4

DB5

DB6

An Example Six Shelf System, with Three Bladesets

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 27: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

Logically, How Does Data Flow?

Linux Client w/ DirectFLOW Filesystem Client

IP Network IP Network

Director Blades

Storage Blades

NFS / CiFS Clients

Blad

eset 1

Blad

eset 2

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Shelf – 10 SBs

Blad

eset 3

Vo

l1

Vo

l2

Vo

l3

Vo

l4

Vo

l5

Vo

l6

Vo

l7

NFS / CiFS Gateway

Metadata Manager

DB1

DB2

DB4

DB5

Vo

l1

Vo

l2

Vo

l4

Vo

l6

DB6

Vo

l5

Vo

l7

DB3

Vo

l3

An Example Six Shelf System, with Three Bladesets and Eight Volumes

Vol8

Vo

l8

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.

Page 28: File Systems and Panasas® Parallel Storage Clusterrich/CRC_Summer_Scholars_2016/Panasas...CPU Quad Core Intel Xeon 2.13 GHz, RAM 48GB, 1xHDD – 500GB Management console Web-based

How Do I Manage 100’s of TB?

All from a single http:// or command

line interface

PanActive Manager: Single GUI for

entire namespace management

Simple out-of-box experience

Seamlessly adopt new blades

Capacity & load balancing

Volumes and quotas

Snapshots

1-touch reporting capabilities for

capacity trends, asset ID, and

performance

Email and/or pager notification of

errors and warnings

Scriptable CLI for all features

Acknowledgement: Some of the material presented is under copyright by Panasas Inc.