testing 'continuously available' file servers · 2012 storage developer conference. ©...

22
Testing 'Continuously Available' File Servers An end to end service viewpoint Tsan Zheng Aniket Malatpure Microsoft Corporation

Upload: others

Post on 19-Jan-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Testing 'Continuously Available' File Servers An end to end service viewpoint

Tsan Zheng Aniket Malatpure

Microsoft Corporation

Page 2: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Agenda

‘Continuously Available’ file server overview Problem space & testing goals Test methodology Test model Test infrastructure Q & A

2

Page 3: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

‘Continuously Available’ File Server

Continuous Availability Transparent failover of application data storage Application sees contained IO delay

Value propositions Servicing without downtime Reliable low-cost file storage

Deployment Scenarios Server application storage platform File Server consolidation Virtual Desktop Infrastructure

Deployment Variations Multiple customer segments (enterprises, hosters) Multiple networking configurations (Ethernet, Infiniband etc.) Multiple storage options (JBOD, RAID, SAS, SATA, FC etc.)

3

Shared Disk

Hyper-V, SQL, IIS etc.

File Server Node A

File Server Node B

\\fs1\share \\fs1\share

Page 4: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Continuous Availability : Scenarios

4

Node A Node B

Resource Group A Leader

File Share(s)

Distributed Network Name

Node C

Shared Storage with SAS drives

Storage Pool For resource group A

SS SS

Storage Pool For resource group B

SS SS

DPM Server for backup

Hyper-V clusters

SQL Server

Node D Node E

Node F Node G

Node H

IP address A

VIP RG

IP address B

VIP RG

Resource Group A Clone

File Share(s)

Distributed Network Name

Information Workers

....

....

App Server Node 1

App Server Node N

Switch Switch

File Server

Node 1

File Server

Node N

NIC1 NIC2 NIC1 NIC2

NIC1 NIC2 NIC1 NIC2

App App Clustered App

App

Shared Storage SS: StorageSpaces

Page 5: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Continuous Availability: Variations

5

....

App Server Node 1

App Server Node N

Switch Switch

File Server

Node 1

File Server

Node N

NIC1 NIC2 NIC1 NIC2

NIC1 NIC2 NIC1 NIC2

App App App App

Shared Storage

Application workloads Hyper-V SQL Server Information Worker IIS … ….. Networking configurations DCB NIC Teaming RDMA IPSec

File Server & Storage Configurations File servers •Clustered •Scale-out File systems •NTFS •ReFS RAID solutions •StorageSpaces •PCI RAID •RBOD

Page 6: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Testing end-to-end scenarios with ‘Continuously Available’ file servers

6

Personae Client applications

System administrators

Experiences Ability to smoothly migrate workload

Increased network bandwidth

Fast and efficient file access

Scalable application file access

Reliable crash recovery

Resiliency to storage corruptions

Resiliency to storage failures

SLA [C]Zero application client error

[C]Application client response time

[C]Increased network bandwidth utilization

[S]Scalability w.r.t number of nodes

[S]Time to full crash recovery

[S]Ease of use configure, manage, diagnose

Page 7: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Problem space & testing goals

Problem space Complex module inter-ops Application specifics Sensitive timing conditions Hardware variations Software and hardware updates from various sources

Testing goals Test with real-world configurations Test with real-world operations Assess service availability for a long period of time

7

Page 8: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Test methodology

Black-box validation o Various persona interacting with Service

o SLA between persona and Service • Client applications (quantitative) • Admin (qualitative)

White box validation o Service health monitoring and validation

Service-Centered Approach

Client

Administrator

Service

Evaluating Experiences And Services

Experience

Service

Ove

rall

resu

lt/co

vera

ge

VM H

ostin

g

Data

base

tr

ansa

ctio

ns

Web

Hos

ting

Mobility

Memory error recovery

Resiliency against network component failures

Concept: Validate experiences relevant to different persona in the context of services

Measuring Success

2

1

3

3 Defining Experiences 2 Experience:

oAbility provided to a specific persona by the product to perform a task

Use case: oPrecise set of steps relevant to an

experience and performed by a persona

SLA: o In context of experience, persona and

use cases

Covering all phases of the I.T. lifecycle Setup and deployment Operating - Managing, monitoring,

troubleshooting De-commissioning

Defining Services 1

Modeling “typical” customer deployments First step: customers engagement and

survey Second step: Service modeling

Covering various configurations Different ways to implement a service

depending on business needs Modeled by different “profiles”: different

hardware and software configurations

Meeting persona expectations Client SLA (Quantitative) Admin SLA (Qualitative)

Staying healthy Monitoring and measuring the health of the

system’s components : oSCOM alerts oPerformance counters

Lifecycle acceleration and SLA projection

8

Page 9: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Continuously Available File Servers: Experiences, use cases and models

9

Service

Persona SLA Experience

Use case

Test profile

Test topology

Roles Configurations

Test scenarios

Action group

Action Fault SLA test

Modeling service Testing object model

Page 10: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Testing end-2-end scenarios with ‘Continuously Available File Servers (cont.)

10

Application based actions •HyperV: Live migration, storage migration, snapshot/restore, start/stop/pause/resume •SQL Server: DB backup/restore, DBCC, BCP, create/delete •Information worker: DFSR, DFSN, Quota, classification Common actions •Networking: NIC teaming/un-teaming, NIC swap •Storage: Array rebuiiding, disk swap, Dedup, Bitlocker, chkdsk •Clustering: planned fail-over, patching

Actions Networking •NIC failure (disable/enable adapters) •Package loss •Packet delay Storage •Meta-data corruption •User-data corruption

Clustering •Power loss •Low memory

Faults Application •No data access failure •Application specific performance goals Networking •Multi adapter/channel •Throughput •Utilization Clustering •Clustered file server up time •Fail-over completion time

SLA test

Page 11: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Test infrastructure overview

11

Test Machine

Test harness

SQL Database

Configuration XML

Setup Tool Scheduler

Scenario XML

Scheduler Client

SCOM Client

SCOM Server

Test Process

Test Process

Test Process

SCOM

SCOM* Database

Monitoring

Scheduling

Smart Action

Scheduling

Reporting Dashboard

Setup Scheduling Monitoring Reporting

*SCOM: Microsoft System Center Operation Manager

Page 12: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Test infrastructure: setup

12

Goals A light weight & self-contained tool A extensible object model to enable authoring and scheduling complex test scenarios.

Setup object model for CA file server

9. Disk/LUN & Pools

Disk & LUN (MBR/GPT)

Storage Pools (1+)

Common external share storage setup by nodes. FS & Apps Server Cluster & RG setup

1. Cluster Nodes

1/FS

2

4

8

2. Resource

Group Type

Singleton (Non CSV)

Scale out (CSV)

iSCSI Target

Virtual Machine

3. Share Type

SMB

NFS

4. # Shares Per

Volume

Single 1:1

Multiple N:1

5. File System

NTFS

ReFS

6. # Vol Per Disk

Single 1:1

Multiple N:1

7. Resilient Spaces

No RAID

RAID 0

RAID 1

RAID 5

8. # Spaces

Per Pool

Single 1:1

Multiple N:1

10. Disk &

Bus Type

JBOD/SAS (MBR/GPT)

RBOD/FC (MBR/GPT)

(1+) iSCSI Targets

(MBR/GPT)

9. NIC config*

Physical

Virtual

6. Subnet

Mask Value…

7. IP address

Static

Dynamic

5. VLAN*

By Port

By IP

3. DCB*

CA

N Other

Traffic

FCoE

HBA

iSCSI

FC

SAS

8. Teaming

Type

4. QoS*

By Port

By IP

Network setup by nodes.

Cluster setting Node setting Network setting

Setup Scheduling Monitoring Reporting

Page 13: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Test infrastructure: Scheduling and execution

13

ActionGroup & action scheduling Goals: Fixed and random scheduling policies Enable different workflows (test, troubleshooting,

verification) Scheduling ActionGroups Scheduling policy

Fixed: repeatable, pre-defined sequence Random: based on certain distribution of

type of ActionGroups ActionGroup selection

Applicability: based on test env. state Scheduling policy

Repeat/Re-run for verification (“Fixed”) Complete re-run Partial re-run

ActionGroup definitions Action details Error/failure conditions Verification type & definitions

Scheduler & scheduler client

Setup Scheduling Monitoring Reporting

Page 14: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Test infrastructure: Error model and diagnostics

14

Goals: Track issues life cycle Evaluate impact on product quality Error model Sev0: break on error/failure

Test/action failure Failure to meet SLA

Sev1: log error and continue Issue details Diagnostics forensics

Diagnostics & issue tracking Forensics

Logs Traces System state

Issue tracking Associate with bug tracking Track private and scenario impact

Setup Scheduling Monitoring Reporting

Page 15: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Test infrastructure: Monitoring

What to monitor? System health – SCOM infra. Test progress and status –

scenario testing infra.

15

Goals: Provide data necessary to

assess product quality Enable error handling

semantics specified by the tests

Setup Scheduling Monitoring Reporting

Page 16: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Test infrastructure: Reporting

16

Rea

l-tim

e da

shbo

ard

Scen

ario

rol

l-up

repo

rt

Goals: Map test results to user SLAs Reflect trends in product

quality Track progress and coverage

of testing What to report Admin SLA Service SLA Test coverage Trending

Setup Scheduling Monitoring Reporting

Page 17: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

An in-depth peek at reporting dashboard

17

Test scenarios

SLA metric overview

Scenario schedule results

User SLA metric details

Admin SLA metric details

Test scenarios result history

Page 18: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Key takeaways

18

Approach testing by modeling the service first

Map test results to user visible metrics (SLA)

Persona focused Internal verifications

Test with agility

Common needs addressed by infrastructure Test focuses on building re-usable test content Invest in important application workloads

Page 19: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Q & A

19

Page 20: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Appendix

20

Page 21: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Test profile: An example

21

Test scenario

Test topology

Test profile

Test scenario (groups of) Use cases

Groups of test actions Scheduling policies of

defined test actions per group

SLA metrics expected Error handling policies

A sample test profile for data protection test profile for Hyper-V over SMB scenario Test scenario (ActionGroup of Actions)

Start VMs (remote VHD on file server cluster)

Taking a backup of VMs Unexpected reboot of the hosting

file server node Verify VM client access remains

intact Test topology

2-node file server cluster 2-node Hyper-V cluster (100 VMs) DPM server Dual 10G network NICs Mirror StorageSpaces with 50 SAS

drives Same domain with 2-DCs

Test topology (product specific) File server nodes Application nodes Networking Clustering Storage Domain topology

Page 22: Testing 'Continuously Available' File Servers · 2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved. Testing 'Continuously Available' File Servers An

2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.

Test positioning

22

• Functional tests

• Scenario tests

• Unit tests • Random stress tests

Extended operations

Limited verification

Limited operations

Limited verification

Limited operations

Rich verification

Extended operations

Rich verification

Position testing right