the evolving stable product: a contradiction of terms? · cycle 1 cycle 2 cycle 3 data compare siop...

13
The Evolving Stable Product: A Contradiction of Terms?

Upload: doanmien

Post on 01-May-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

The Evolving Stable Product:A Contradiction of Terms?

2

Agenda

�Process/Product Maturity Model

�Xyratex field reliability experience

�Overview of integration test methodology

�Effect of targeted testing

Xyratex Confidential

3

Over 30 years of experience in Storage Process Equipment design

IBM Disk Drive R&D in the UK

IBM Disk Drive Production & Process design in the UK

1972 19921984 1994 1997

IBM Storage System R&D & Production

in the UK

2006

IBM Experience Xyratex Experience

Xyratex Disk Drive Production Equipment design

Xyratex OEM Storage Systems

Over 22 years of continuous Storage System Process Design

Xyratex Confidential

4

Process/Product Maturity Model

Xyratex Confidential

Introduction

Growth

Maturity

Decline

Product ProductProcess

Release F/W V5

Pilot Python A (73G, 144G, 300G)

Pilot Cheetah 7 (73G, 144G, 300G)

Release firmware V7 & CPLD

Release F/W V10

Release F/W V11

Release F/W V12

Release F/W V13 & CPLD

Introduce Python A (73G, 144G, 300G)

Introduce Cheetah 7 (73G, 144G, 300G)

Release Immersion Tin raw card

Release F/W V14.30

Pilot F/W for Python APilot F/W for Cheetah 7Release F/W for Python A

Release F/W for Cheetah 7

Release F/W 16.01

Pilot Jake ATA Filer

Pilot Maxtor Calypso (250G)

Release F/W Package 21

Release F/W Package 22

Release F/W V14.30

Release F/W Package 26

Pilot Maxtor Sabre (250/320G)

Pilot Gen 1 Dongle

Pilot Hitachi K2 (500G)

Pilot Gen 2 Dongle

Introduction RoSH ATA Card

Pilot Maxtor Sablre "Flashless" (250/320G)

Introduction of IBM Specific ATA-Filer

Release F/W Package 27

Pilot Maxtor Grizzly (500G)

Release F/W Package 30 & CPLD

Release Maxtor Sabre G2 F/W

Pilot Seagate Tonka 1.5 (250G)

RoSH (R5) BIPPilot Seagate Tonka 2 (500G)

Release F/W Package 32

Reality Check…….process & product continually evolve….not all changes are validated, or know about.

What is the risk mitigation plan?

Vo

lum

e

5

Drive "X" DPPM

0

1000

2000

3000

4000

5000

6000

7000

Apr-07 May-07 Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08

DP

PM

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

To

tal D

PP

M

2X 0 0 0 0 0 0 1595 0 0 0 0 0 0

DNR 0 241 0 0 152 0 598 0 0 163 2465 264 261

Glist 0 0 178 0 0 0 0 0 0 0 0 0 131

Hardw are 0 0 178 0 0 83 0 0 0 0 0 0 0

Other 0 0 711 0 0 498 997 1228 5821 4412 4621 527 261

Process Induced 0 0 1245 0 0 0 1196 0 0 0 0 0 0

Recoverable 0 0 0 0 0 0 0 0 0 0 0 0 0

SMART 0 241 356 0 0 0 0 614 0 0 0 0 392

Unrecoverable 1413 2410 1245 1279 610 912 598 1842 0 490 1232 396 784

Total DPPM 1413 2892 2668 1279 762 1493 3788 3683 5821 5065 8318 1187 1828

Apr-2007 May-2007 Jun-2007 Jul-2007 Aug-2007 Sep-2007 Oct-2007 Nov-2007 Dec-2007 Jan-2008 Feb-2008 Mar-2008 Apr-2008

Mature product, failure mode introduction.

������������ ������������

Xyratex Confidential

6

Xyratex Hard Disk Drive Reliability : Failure rate comparison

Xyratex Confidential

Industry AFR experience

Annual Failure Rate (AFR) by drive class

0%

1%

2%

3%

4%

5%

6%

7%

8%

9%

10%

AR

R (

%)

XYR ATA FR 0.92% 0.98% 1.04% 1.27% 2.19%

XYR Enterprise FR 0.46% 0.51% 0.73% 1.32% 0.68% 1.04% 1.10%

Google paper Base ~AFR 2.80% 1.80% 1.75% 8% 8.70% 6% 7.40%

3 Months 6 Months 1 Year 2 Years 3 Years 4 Years 5 Years

Enterprise target

ATA target

7

Example of the effect of CERT on FC HDD Field Reliability

� Drives subjected to our integration process with CERT, have a lower failure rate in the field.

Xyratex Confidential

Cumlative failure rate over time

Time, (years)

Cumulative failure rate (%

)

0 51 2 3 40%

5%

1%

2%

3%

4%

Drive X-CERTWeibull-2PMLE SRM MED FM

Data Points

Drive X-NO CERTWeibull-2PMLE SRM MED FM

Data Points

Drive Y-CERTWeibull-2PMLE SRM MED FM

Data Points

Drive Y-NO CERTWeibull-2PMLE SRM MED FM

Data Points

Drive X No CERT

Drive X with CERT

Drive Y with CERT

Drive Y No CERT

8

Example of the effect of CERT on ATA HDD Field Reliability

Xyratex Confidential

Cumulative rate over time

Time, (months)

Cumulative failure rate (%)

0 123 6 90.0%

1.5%

0.5%

1.0%

Drive Z-CERTWeibull-2PMLE SRM MED FM

Data Points

DRIVE Z-NO CERTWeibull-2PMLE SRM MED FM

Data Points

Drive Z No CERT

Drive Z with CERT

9

“Conventional” Process Flow……….backed by deep storage knowledge

� 4 Stage Test Process

� Configure product prior to test

� Testing at subsystem level

� Apply environmental stresses

� CERT is key

� Targeted Workloads

� Automation & Data Collection

Final Assembly

BATs

CERT

Functional

Test

Hi-Pot &

Safety Gnd

Config

ORTPack/Ship

Basic Power-On Check

Production configuration

and specification

validation

Reliability Test

Safety Check

Xyratex Confidential

10

� Test Function & Design Approach based on FMEA, Failure Analysis & Experience

� Identify faults associated with

� Design

� Process

� Quality

� Interoperability

Targeted Testing Design Approach

Stress Type Drive Servo Rd/Wr Hd. Disk

Motor /

Bearing Card Elec. PSU Fans I/O Card

Cabling /

Connectors

Plastics /

Mech Memory Card Elec. Processor

IOPS

H H L L H L L H L L H M H

Thru Put MB/s

H H H L H L L H L L H M H

Power Cycle

M M L H H H M H L L H H H

Thermal

Variation

H H M H H H H H H M H H H

Voltage

Variation

L L L M H M L H L L M H L

Vibration

H M L L L M L L M M L L L

Redundancy

Variation

L L L L L H H H L L L M L

Hard Disk Drive Other Storage Subsystem Components

CERT Profile

0

5

10

15

20

25

30

35

40

45

1 2 3 4 5 6 7 8 9 10

11

12

14

15

16

17

18

1 2 3 4 5 6 7 8 9 10

11

12

14

15

16

17

18

1 2 3

Stage

Tem

p (°C

), (line)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Cycle 1 Cycle 2 Cycle 3

Data Compare SIOP Banded Sequential Write/Read Compare Test Certify Test Unit : Verify Disk

Data Erasure Test Incremental Butterfly Write Test Simulated Workload SIOPs : File Server

Simulated Workload SIOPs : OLTP System Ready Test Zero Disk

Write NetApp Metadata Metadata Verification Certify Test Unit : Write Disk

Temperature Profile

Cycle 1 Cycle 2Cycle 3

Xyratex Confidential

11

Summary

� Can not become complacent with testing regime�Evolution of a product only stops when a product end of life is announced and the last product shipped.

� Infant mortality failures can be screened out�A combination of system stress testing, customer centric test suites and environmental stress can precipitate early life failure modes.

� Quoted MTBF rates can be achieved�Our data confirms a decrease in early life failures for drives subject to Combined Environmental Reliability Test.

� Nearline drives currently not as reliable as Entreprise�Nearline & Entreprise drives……the same difference?

12

Thank you.

Questions?

End

Xyratex Confidential