best practices for evaluating storage using workload ......storage performance testing and...

White Paper

Best Practices for Evaluating Storage Using Workload Testing & Validation

Version 1.0 l November 2019

When evaluating new or changed storage platforms to determine the optimal storage technology, product, or configuration for your organization, three factors are critical:

• The workload must accurately reflect the environment; it must be realistic

• The environment must be properly configured for each platform tested

• The tests must follow a consistent methodology

This paper offers a proven methodology, focused primarily on block-based systems, for framing these tests to provide a fair and accurate representation of the new or changing technologies, and how these technologies should be tested and validated using workload profiles representative of the environment into which they will be installed.

2

TABLE OF CONTENTSIntroduction 3

Objectives 3

The Need to Test and Validate 4

Characteristics of a Production Workload 4

Defining the Test Project and Building Your Own Benchmark 6

Document Subjective Factors 6

DefineTestingObjectives 6

DefineEvaluationCriteriaforProof-of-ConceptTesting,akaBake-offs 7

IdentifyTestEnvironment/Protocol 7

DefineGrowthAssumptions 7

Develop Testing Schedule 8

CandidateSelection 8

Notify Suppliers 8

Testing 9

OptimalTestEnvironment 9

StorageTestingWorkflow 10

Importing and Exporting Workloads 10

PrepareandConfigureStorageTarget–Precondition 11

ExecuteTests-CurrentProductionWorkload 11

ExecuteTestswithTDEformodelsthatexceedWorkloadWisdom-Efunctionality 12

ReviewResults 12

Common Testing Pitfalls 13

#1SkippingConditioning 13

#2UsingBadWorkloadModels 13

#3ForcingUnnaturalArrayConfigurations 15

#4NotAgreeingonTerminology 15

Conclusions 16

3

INTRODUCTION

Virtanaistheleadingproviderofstorageperformancetestingandvalidationsolutions,servingITorganizationsaswellasleading storage technology vendors. Storage performance testing and validation matters to IT storage teams because it’s criticalforimprovingtheirjobperformance–enablingthemtomoreintelligentlypurchase,configureanddeploynetworkedstorage.Formanyyears,otherITdisciplineslikenetworkingandapplicationmanagementhaveenjoyedrelativelymodernpre-productiontestingandperformancevalidationtools.Thestorageteamjusthadfreewaretoolswithnoprofessionalsupport.So,theyweremostlycondemnedtoexplainingwhyI/Oisslow…ontheproductionfloor.

Estimatingcapacityneedsisprettyeasy;rarelyhaveweheardofITorganizationsunexpectedlyrunningoutofstorage.Butit’s a good bet that everyone reading this document has personally experienced the effects of a “performance outage” or “brownout”.It’spartiallybecauseit’sreallydifficulttoprojectperformanceneedsunlessyouhavetherightperformancetoolsandtherightmethodology.So,overprovisioninghappensnotjustforcapacity,butjustasoften,tomitigateperformancerisks.

Doing storage performance testing well matters because it helps to answer these common questions:

• CanIimproveapplicationperformancebyimplementingnewstoragetechnologiesorproducts,orchangingconfigurations?Ifso,byhowmuch?

• CanIaffordtheperformanceimprovement?Willnewtechniqueslikecompression/deduplication/snapshotsreducetheeffective$/GBwithoutsubstantiallyimpactingperformance?

• HowdoIselectthebesttechnology,product,orconfigurationtomatchmyapplicationworkloads?

• Whichofmyworkloadswillgainthemostbymovingtonewarchitecturesorproducts?

• Wherearetheperformancelimitsofpotentialnewconfigurations?

• Howwillstoragebehavewhenitreachesitsperformancelimits?

• Whataretheperformancelimits?

Theprimaryeconomicbenefitsofstorageperformancetestingcomefromthreeareas:(1)improvedapplicationperformance,(2)storagecostoptimization,and(3)riskmitigation,whichallowsITarchitectstoinnovatefaster.

Sowhyaren’tmorepeopledoingthistoday?Simple:ithasbeenhardandtootime-consumingusingDIYorsharewaretools,andfrankly,theresultssimplydonotreflectreality.Thesetoolscannotaccuratelysimulatereal-worldproductionworkloads.That’swhyVirtanasellsLoadDynamiXEnterpriseandspecifically,whywewrotethisdocument,toexposestorageengineersandarchitectstoaBestPracticesMethodologytoaccompanytheuseofLoadDynamiXEnterprise.

OBJECTIVES

What are the specific objectives of this document? To go beyond the technology and to:

•Provideguidanceforcustomerstoensureafairevaluationofnewtechnologiesandvendorproducts

•Ensurearchitecturaldifferencesinnewtechnologies,vendorplatformsandvendor-specificconfigurationbestpractices are accurately represented in customer testing

•Ensurecustomers’needsareclearlycommunicatedtovendors,andvendorsprovideaconfigurationtomeetthosespecificneedsforthetestingprocess

•Allowtechnologiestobeaccuratelyevaluatedbasedontestingresultsandevaluationcriteria

•Providecustomerstheopportunitytoincludetheinitialarraysetupandconfigurationprocessintheir evaluation criteria

•Discusscommonpitfallsassociatedwithtestingandwhytheseshouldbeavoided

4

THE NEED TO TEST AND VALIDATE

There are several benefits and use-cases for test and validation:

• Movingtonewplatforms,likefromfibrechanneltoiSCSI,ortoSoftwareDefinedStorage,caninvolveleapsoffaith,andrisk.Testandvalidationbyfarthebestwaytomitigatethisrisk.

• Ifstoragevendorscouldcustom-buildproductsandconfigurationsbasedonindividualcustomerapplicationprofiles,theremightnotbesuchaneedtotest.Buttheydonot,andsostorageengineersandarchitectsmustgobeyondthesimpleperformanceprofilesthatappearinspecsheets,andtheguidelinesprovidedby“industrystandard”benchmarks.Customersmusthavedatathat’srelevanttotheirenvironment.

•Storageworkloadsaredifferentacrossdifferentapplicationsandplatforms.Noonewouldexpectastreamingvideoapplicationtostressastoragearrayinthesamewayasanemailserver.

• Applicationshavewildlydifferentrequirementsforstorageperformance.Latencyforsomeapplicationshavetobemeasuredinmicroseconds,whileforsome,multipleseconds.

• Noonestoragearray,technologyorconfigurationcanuniversallyprovidethebestprice/performanceforallapplicationtypes.Differentstoragesystemsperformdifferentlyfordifferentworkloads:thereisnoonestoragethatfitsall.Infact,weknowthatbothstoragevendorsandcomponentmanufacturersoptimizetheirarraysforparticularworkloadtypes.Buttheirsalesteamsgenerallyarenotawareofthislevelofdetail,andoftensella“onesizefitsall”solution.

• Itisimportanttoknowhowastoragesystemperformsinyourenvironment,includingexpectedgrowth.

• Itcanliterallysavemillionsofdollarsinnewstorageprocurementstounderstandtheconfigurationbestsuitedforyourapplication.Wehaveexampleswherecustomershalvedthenumberofscale-outnodesthatwereoriginallyproposed by their vendors.

•Industrystandardbenchmarksareknowntobe“gamed”byvendorswhorightlywanttopresenttheirproductsinthebestpossiblelight.Often,configurationsaretestedandreportedon,whichbearnoresemblancetorealcustomerconfigurations.

•Vendorsareunderpressurefromtheirmarketingdepartmentssotheyroutinelypublishspecsthatarenotreal-worldinnature.AhugeIOPSnumbermaybeoffered,butitmayhavebeenachievedbyusingblocksizesthatarenotrepresentativeofmostapplications.Ortheymayclaim“randomaccess”numbersbasedonasmalldatasetordatareadbycache.Andthereneverisoneblocksizeforthecompositeworkloadsrunningonmodernarrays.It’sadistributionofsizesfordifferentLUNs,readsandwrites,anditchangesovertime.Runningatestshowingonlyanaverageblocksizeisn’thelpfultothecustomer.

CHARACTERISTICS OF A PRODUCTION WORKLOAD

ProductionworkloadshavevariationsofIOsizes,rates,sequentialityandworkingsetsovertimeonasmall(milliseconds)andlarge(hours)scaleacrosshundredsofLUNs,andaffectstensofterabytesofdataduringa24hworkcycle.Inthefollowinggraphs,youcanseetheworkloadI/Oratesandsizevariationsovertime.Youcanseevariationsinthroughputfrom200toover1400MB/s,andinIOPSfrom15Kto50K.Amoretypicalgraphyoumightseefromsimpleperformancemeasurementtoolswouldaveragethesenumbers,producingaflatter,andmuchlessaccurateportrayalofperformance.

5

Pastsimplesystemlevelmetrics,it’sextremelyusefultomeasureSANcomponents,suchasFront-End-PortsandLUNs.Inthegraphbelow,youcanseeareal-worldexampleofhowproductionworkloadscanuseLUNsunevenly.Inthisexample,averysmallnumberofLUNsarehandlingnearlyalltherealworldI/O.Itisimportantthatthesedifferencesaremaintainedinthetest.Inoneexample,8workloadswerecreatedfordifferentlevelsofLUNactivityandtypesofaccessversus1workloadthattreatedallLUNsequally.TheworkloadwheretheratiosofLUNsthathadworkweremaintainedthearraycouldruntheworkloadwithreasonablelatencyalldaylong.ThesingleworkloadthattreatedallLUNsthesameresultedin130mslatencies.

6

DEFINING THE TEST PROJECT AND BUILDING YOUR OWN BENCHMARK

Document Subjective Factors

Tofairlyandaccuratelyperformtheevaluation,youshoulddocumentanypotentialfactorsnon-essentialtotheevaluation.Considertheinfluence(ifany)thesefactorsmayhaveonyourresults.Theyarevalidconsiderations,butshouldbefactoredoutwhendoingapurelyperformancetest.Examples of subjective factors include:

• Existing professional relationships

•Lengthofrelationshipwithincumbentvendors

• Perceived risks and complexities of changing technologies or migration

•Unsubstantiatedthirdpartyreviews

• Unsubstantiated “rules of thumb”

•Internalconcernsoverchangeandwhatthismightmeantoexistingprocesses

•Benchmarksthatprofileotherapplications

•Currentknowledgeaboutmanagingandmaintainingparticularvendorsorproducts

• Maturity and feature sets of the array and vendor management and monitoring tools

Define Testing Objectives

Commonexamples:

•Determinethebestperformingplatformforyourexistingworkloadat1x

•Determinebestperformingplatformforsameworkloadwhenscaled(1.25X,1.5X,1.75X,2x,etc.)It’sagoodideatoscaleinsmall,regularincrements.Forinstance,ifyouthinkyourworkloadmightgrow4xoverthelifeofthearray,considertestingat2x,and3xaswell.

•Determineplatformsandconfigurationsthatsatisfyapplicationperformancerequirements

•Identifythepointatwhichworkloadexceedsperformancerequirementsofaplatform

Atthispoint,it’sgoodtoaskyourselfifyourtestingobjectivesinfluencedbysubjectivefactors,andfactorthemout.

7

Define Evaluation Criteria for Proof-of-Concept Testing, aka Bake-offs

AswithObjectives,askyourselfifyourevaluationcriteriaareinfluencedbysubjectivefactors.

Business

• Long-termviabilityofthevendor

•Existingrelationships(quidproquo)

• Industry reputation

Financial

•Costs(Acquisition,Maintenance,TCO)

• Price / Performance

Operational

• Reliability

•Supportandsupportability(considerboththelocalserviceandremotesupportteams,andtheirabilitytocoordinateamongstyourlocalandworld-wideteams

Performance

• Identify the highest acceptable average latency

•KPI:Latency,Throughput,IOPS

Scalability

•DefinethehighestIOPSandthroughputatacceptablelatencyforeachplatform

Identify Test Environment / Protocol

Bespecific,andcommitthesedetailsinwriting.Getspecificrecommendationsfromallinterestedparties,includingyourDBandappsteams,serverandnetworkingteams,yourvendors,andofcourse,yourtestingvendor.Askforexamplesofprevioustest environments similar to yours. Then specify:

•Whattestswillberun?

•Howwilltestresultsbecollected?Here’swheretheLoadDynamiXEnterpriserepositorywillbeinvaluable.

•Howwillresultsbeevaluated,andbywhom?

•Whatarethesuccesscriteriaforeachtest?Again,getinputfromotherteams.

•Whatresultsareyouexpecting,basedonwhatthevendorhasstated/published?

Define Growth Assumptions

What is the anticipated growth of the workload over the expected life of the array?

•Canyoucorrelateyourgrowthinworkloadtobusinessgrowth?Barringotherinput,it’sagoodplacetostart.Askforrevenueornumber-of-client-userprojections.

•Areyourworkloadgrowthestimatesreasonable,givenpastgrowthpatternsandbusinessgrowth?

•Arethereanyongoinginitiativesthatcanwillimpacttheseforecasts(i.e.cloudmigrationsorM&Aactivity?)

8

Consider the possibility the array remains in service longer than expected.

•Ifthearrayremainsinservicelongerthanexpected,whatisareasonabletimeline?

•What(ifany)growthwouldbeexpectedduringthisextendedtimeperiod?

•Whatareyourexpectationsforperformanceandstability?

•HowdoesthiscomparetothevendorsexpectedEOLdate?

•Atwhatpointwillyoustopaddingtothearray?(serversorVM’s,storagecapacity,workload,etc.)

Is consolidation required at a future date?

•Arethestoragearrayportssizedcorrectlytoaccommodateconsolidation?

•HowwillLUNconsolidationschangethecharacteristicsoftheworkloads?

Develop Testing Schedule

Ascarefulasyouare,theschedulecanchangeforanumberofreasons.Trytobuildinflexibility;there’snothingmorefrustratingthanfindingsomequestionyoudidnotanticipateandnothavingthetimetofullyexploreyouroptions.

• Whatisthedeadlinefortestingcompletion?Softdeadlineorwillyouloseaccesstothetesttarget?

• Whatisthesequenceoftestingthatwilloccurinthetimeline?Arealltheresourceslinedup?

• Willthevendororcomponentsupplierberequiredtoperformthesetupactivities?Reconfigurations?

• Howlongiseachtest?Includeset-uptime,timetobuildthemodels,timetoexecute,andtocreatecustomreports,ifneeded.Askyourtestingvendorforestimatesifyouarenewatthis.Virtanahasyearsofrelevantexperiencetodrawfrom.

•Whattimebufferisavailableiftestsrunlongerthanexpected?

Candidate Selection

•Whatweretheselectioncriteriafordefiningthecandidatestoincludeinthetest?

•Whywerethesecriteriaimportant?

•Whatproductswillbeincludedinthetest?

Notify Suppliers

Althoughstoragevendorsorcomponentsupplierswilllikelynotbeinvolvedintheactualtesting,vendorswillsizeandconfiguretheirrecommendedoptimalconfiguration,andwillberesponsibleforthesetupandconfiguration.AndyouWANTthemtodothesetupandconfiguration.

Also,considerthattheselectedplatformmayremainaftertestingandbecometheproductionsystem,soimplementationofthearrayshouldfollowallbestpractices.

Whenengagingyoursupplier,besuretoclearlyreviewthesepoints.Itwillenablethemtodoabetterjobofproposingtherightarrayandconfiguringitproperly.So, review:

• Testing objectives

• Duration of the entire testing process

• Evaluation criteria

• Description of the testing environment

• Equipmenttheywillbeexpectedtoprovide

• Services they are expected to provide

• Expected schedule of testing for their components

• Services they are expected to provide

• Expected schedule of testing for their components

9

TESTING

Optimal Test Environment

Tosome,whenanewboxarrivesontheloadingdock,it’saneventnotunlikeabirthdayorChristmasmorning.Withoutdelay,andfrequentlywithoutaproperinventorycontrolprocess,cartonsfullofshinynewtechnologyareunboxed,rackedornot,andconnectedtowhateverserverhappenstobeavailable.Storagedevicesareconfiguredandmanualsconsultedonlyintheunfortunateeventthattheuserinterfacesaren’tproperlyintuitive.Whathappensnextmaybeperformancetesting,butitmayalso be an exercise in building storage performance anecdotes.

Whydowesaythis?Absentofadefinedtestprocessandcontrolledtestingenvironment,youwillbeabletodeclareonlythat“IdidXandYwastheresult”.Thisisasubjectivestatement,anditmaycontainvaluable,experientialdata.Itisnot,however,an objective analysis. The accuracy and interpretation of such

resultsarenotdefinitiveofthesubjectundertest.Thatisnottosaythatthissortofexperimentationhasnovalue;itshoulddemonstrateoutoftheboxfunctionality(orlackthereof)andmayprovidevaluableinsightintoease-of-use.Whatitdoesnotdoisprovideevidenceofarigoroustestingmethodology,includingcontrolledinputsandoutputs,repeatabilityanderroranalysis.

Testingcan,anddoes,implyameansbywhichthequalityandfunctionofasubjectisevaluated.Inthestrictestsensethe“kickthetires”testprocessdescribedabovefitsthatdefinition.However,trulyobjectivetestingrequiresacontrolledtestprocessandacontrolledtestenvironment,ortestbedif,youprefer.

Atestbedprovidesameansofisolatingthesystemundertestfromexternalinfluencesandprotectingnon-testresourcesfromany fallout from the testing process. These are important factors for the generation of the reliable test data and for maintaining businesscontinuity.Atestbedmustbedefinedanddocumentedforrepeatabilityandtosimplifytheanalysisofresults.

Storage performance testing is disruptive and should never be performed in compute environments that support any kind ofproductionorrevenue-generatingcomputing.Norshouldperformancetestingcoincidewithothertesting,suchasuservalidation or ease of use testing. Meaningful storage performance data can only be generated in a controlled testbed and the mostrigoroustestingwilladheretowellrecognizedstandardsandpractices.Theserequirementstypicallycallforadedicatedtest environment to be set aside for the purpose of validating storage performance.

Awell-designedtestbedwillincorporatesimilarconditionsasexpectedintheenvironmentsconsideredforeventualdeploymentofthedevicesundertest.Conditionsdesignedtomaximizesomemetricinawaythatisnotreproducibleoutsidethetestlabcreatesresultsthatmaylookgoodonamarketingbrochure,butwhichholdlittlevaluetoyou,thecustomer.

A comprehensive test environment will include the following elements and attributes:

• Aspaceforthetestingtooccur,separatefromconflictingactivities

• Amechanismorenginetodriveworkloads(LoadDynamiXappliance)

• Infrastructure to support applicable connection protocols

•Amechanismtorecordandanalyzetestdata(LoadDynamiXEnterprise)

• AlibraryortoolboxofworkloadstogeneratedesiredI/Opatterns(LoadDynamiXEnterprise)

• The number of storage ports involved in testing should be comparable to the target production environment

•Switchesandnetworkgeartoemulateproduction

•MultipathaccesstotheLUNs(Active/Active)

•ThenumberofLUNsshouldbeofthesameorderasinproduction.Forinstance,iftheproductionenvironmentisexpectedtohave2,000LUNs,don’ttestwith2LUNs,shootfor~200.

10

•Replicationorotherinternalprocesses(deduplication,compression,garbagecollection,etc.)enabledinproductionshouldbeenabledduring the test runs

And,optionally

• Asourceforthemodelingdata(VirtualWisdomorexistingproductionstoragearray)

•Acontrolledpowersource

• Powermeasuringdevices

•Aclimatecontrolledenvironment

Thetestenvironmentshouldberepresentativeofthewaytheproposedstoragewillbeusedinproductionandshouldbecentralizedandeasytoaccessanduse.Here’s a simple example of a test topology.

Importing and Exporting Workloads

Oneofthefirstissuestounderstandistheworkloadsampleyou’restartingwith.withthusfar.Often,testersstartoutthinkingtheyshouldgrabaweek’sworthofdataandrunthat.Formostusecases,thatisawasteoftime.Itisbesttopickafewkeytimeperiods,maximumIOPs,maximumthroughput,andmaximumlatencyandgrab1to4hoursofthisdatatorunagainstthearray.Thiswillprovidethemostimportantresultsintheleastamountoftime.

Thedaysofspendingdozensofhoursparsinglogfiles,intheoften-futileattempttocreaterealisticworkloadmodelsareover.TheWorkload Data Importer module of Enterprise is designed to import andanalyzeworkloaddatafromstorageinfrastructures,automatically,tocreateextremelyrealisticworkloadmodels.TheworkloaddatacanbeexportedfromVirtualWisdom,storagevendormonitoringtoolsorserverutilitiessuchIOStatasCommaSeparatedVariable(CSV)textfiles.ThedataisanalyzedtodetectworkloadIOprofilestounderstandtheworkloadcharacteristicsintermsofIOPs,Throughput,Read/Writeratioetc.TheWDIGUIishere(right):

Storage Testing Workflow

Here’saworkflowillustration.We’lldiscusseachstepinorder,below.

The ideal data source, to create the most hyper-realistic model, is VirtualWisdom. It can obtain up to 30 different metrics.

11

Prepare and Configure Storage Target – Precondition

Storageplatformsundertestshouldbeconfiguredaccordingtothevendor’sbestpractices.

SAN storage arrays should be “preconditioned” prior to the testing:

•Datamustbewrittentoanarraypriortoreadingit

•AFAsolidstatestorageshouldbebrokenintobringittothestateitisinproduction

•Utilizethepurpose-builtpre-conditioningworkloadinWorkloadWisdom-E5.3andlater.

• Dependingontheconfigurationandthevendor,thestoragearraymightbeleftforaperiodoftimetoprocessthedataoffline.Thevendorwillofferadvice.

Execute Tests – Current Production Workload

Performabaselinetestofworkloadacrossallsystemsusingthecurrentproductionworkloadat100%.Precisionshouldbetothe highest possible extent to produce useful results. This includes:

•ReadandwriteratesandtheirvariationsintimeandacrosstheLUNs

•RequestsizesandtheirvariationsintimeandacrosstheLUNs

•Theamountofcapacity(logicaladdressspace)readfromandwrittentoduringthetest

Workloadscalingteststhestorageperformanceathigherloadsusingthecharacteristicsofthecurrentproductionworkload.

•Consideriterativelyperformingtheworkloadtestsgraduallyincreasingscaletoreflecttypicalgrowth.

•Example:Ifplanningtotestupto3xgrowthinworkload,performtestsusingincrementsofloadscaleof.5x

•The“sweetspot”ofastoragearrayvaries,andyoumayfindthearraythatperformswellat1xor1.5x,maydegradeat2x,whileotherarraysarejustwarmingup.Thisgranularitywillhelpidentifythis,anddefinetheoptimalworkloadrangesforaproduct.

Testmanagementenvironmentshouldbecentralizedandeasytouse.Thescreenshotsbelowdemonstratehowsimplyacomplex,compositeworkloadmaybeillustrated.Thegraphontheleftlistsmultipleworkloads,andthegraphontherightshowstherelationshipsbetweenclientsandtargets,inasingletestenvironment.

12

Execute Tests with TDE for models that exceed WorkloadWisdom-E functionality

Here’saTDEProjectcorrespondingtotheWorkloadWisdom-Ecasepresentedabove,justasanexample.

Review Results

Generateeasilyreadablereports,typicallyinvolvinglatency,throughputandIOPs.Here,inamulti-steptest,youmaydetermine if some products simply shouldn’t move on to the next round of testing because of their shortcomings.

Below,averageresponsetimeandIOPSforabaselineandscaled-uptestina2-vendorbake-off.Inthefirstgraphbelow,it’seasyforthereadertoseetheeffectonlatency,ofincreasingtheworkload(byincreasingIOPS).Inthiscase,vendor2scalesmoreeffectively.Ata3xworkload,vendor1combinedresponsetimeis4ms,whilevendor2isapprox2.3ms(exactfiguresareavailable).

13

COMMON TESTING PITFALLS

#1 Skipping Conditioning

Thepreconditioningofthestorageisamustpre-requisiteofthetestingforthefollowingreasons:

•Storagearraystypically“know”whetheracertainaddresshasbeenpreviouslywrittentoornot.Incasethetestworkloadreadsfromtheaddressthathasnowdata,thearrayreturnszeroswithoutconsuminginternalresources.Theresponsetimeisunrealisticallylow.Thetestresultsareskewedtowardslowresponsetimes

•Fresharrayshaveemptymetadatatables.Accesstothemconsumesresourcesandmightaffectresponsetimes.Havingthememptyorcontainingfewentries(bywritingzeros,forexample)willleadtounrealisticallylowresponse times in the test

•AFAtobetestedasinworkingconditionsneedtohavetheirsolid-statestoragebeinitializedandwrittentoperformasitperformsinyourreal-lifeproductionenvironment.

#2 Using Bad Workload Models

Misconception

•Productionworkloadcanbesimulatedusingscriptsand/ortoolslikeIometerorVdbenchinatestlab.WeoftentalkaboutanearlyuserofLoadDynamiX,GoDaddy.GoDaddytriedtouseIometerandgotartificiallyhighperformanceresultsonthetargetarray.Sadly,whendeployedintoproduction,actualuserexperiencewaspoor.ThiswasmostlyduetoIometer’sinabilitytorecognizethehugeimpactmetadataI/Ohadontheworkload.ThesametestsrunusingLoadDynamiXturnedouttobeextremelyrealisticandgaveGoDaddytheresultsthatresultingingooddeploymentdecisions,andhappyGoDaddycustomers.

Reality

•Thereisnosubstituteforactualproductionworkload

• Building a test environment is resource intensive

• True production conditions may not be accurately captured in the test environment

•Incorrectassumptionsanddeviationsintestingstepsinthesimulationcanleadtothewrongdecision.

High-level steps of building the test environment (the hard way)

• Identifyphysicalinfrastructuretobeused(server,san,storage)

• Considermake,model,age,OSversion/patchlevels,firmwarelevels,driverlevels,etc.

• Build/updatethetestenvironmenttoreflectproduction

• Ensure components are at supported levels for the testing platform

•ConfigureSANconnectivity

• Configurematchingnumberofports,portspeeds,cabletypes,patchpanels,etc.

•Createscriptstosimulateworkload

•Distributescriptstosystemsthatwillrunthescriptsanddrivetheworkload

Setup (the hard way)

•OperatingSystemUpdates

•HBADrivers&Firmware

14

• LoadBalancingConfiguration

•IPAddresses,DNSRecords

• Testing Scripts

• Zoning & Masking

Execute (the hard way)

• Execute scripts on servers/VMs

• Monitor performance

•Collect&carefullyaggregatetestresultsfromthevariousservers

Why this approach fails:

•Workloadtestingispartiallyanartandpartiallyascience.WiththeDIYapproach,youhaveonlyyourownexperiencetodependon.Andperhapsthekindnessofforumparticipants.When,notif,questionscomeup;it’sBestPracticetohaveateamwithdecadesofexperiencetorelyon.

•Conventionaltoolsandscriptsattempttoapproximateandsimulateworkloads;andforallbutthesmallest,simplestworkloads,theyfail.

•Setupeffortincreasesexponentially,themoreaccuratelyyoutrytosimulateworkloads.

•Testingisnotcentralized;resultsneedtobecollectedandcorrelatedtominimizemistakesandmakeyoureffortsrepeatable.Atestthat’snotrepeatableis,bydefinition,notagoodtest.

•Duetothecomplexityofsetup,execution,collection,andanalysis,customersgenerallyperformfewertestingiterations,resultinginfewerusefuldatapoints.

•Testarebasedonassumedcharacteristics(akaguesses),whicharethenscaled;theeffectofmultipleinaccurateassumptions is compounded and multiplied.

• Youwon’tknowifyourassumptionsaretrulycorrectuntilyouareinproductionanddrivingworkload.

Whyspendsignificantlymoreeffort,time,andmoneytocreateanapproximation,whenyoucanuseyour productionworkload?

15

#3 Forcing Unnatural Array Configurations

“Wearegoingtoconfigureallarraysexactlythesame,testthesameworkload,andcomparetheresults”isnotaBestPractice,thoughitseemsfair.It’snot.Imposingalike-for-likeconfigurationacrossdifferentproducts:

•Oftendeviatesfromthebestpracticesforoneormoreproductsinthetestgroup

•Wouldnotmatchthesolution/configurationyoupurchaseorputintoproduction

•MayreflectatechnicallyUNSUPPORTEDconfiguration

•Producesub-optimalperformancemetrics

Thetargettestingenvironmentshouldreflecttheconfigurationyouwouldputinproduction,eveniftheconfigurationisnotadirectcomparisontotheotherconfigurationsinthetest.Bysharingyourobjectiveswithyourvendor,youwillgettheiryearsofexperienceinsystemconfiguration.

#4 Not Agreeing on Terminology

Therearesomecommonandnot-so-commontermsthatgetthrownaround;andwithoutacommondefinition,can causebarriersbetweenvendors,managers,engineers,andotherinvolvedparties.Herearesometermsweshouldallbefamiliarwith:

AllFlashArray(AFA) An all-flash array is a solid-state storage disk system that contains multiple flash memory drives instead of spinning hard disk drives.

Array metadata table A table, that has properties that store metadata such as variable names, row names, descriptions, and variable units.

Data compression Encoding information using fewer bits than the original representation.

Deduplication A specialized data compression technique for eliminating duplicate copies of repeating data

Garbage Collection Solid state storage garbage collection (GC) is the process by which a solid-state drive (SSD) improves write performance. Garbage collection, like TRIM, pro-actively eliminates the need for whole block erasures prior to every write operation. When a file is deleted from a computer, most operating systems (OSs) delete the table of contents entry, but do not delete the actual data blocks from the storage media. Hard disk drives will simply overwrite the unneeded data blocks. Flash SSDs, however must erase the unneeded data blocks before new data can be written. Working in the background, garbage collection systematically identifies which memory cells contain unneeded data and clears the blocks of unneeded data during off-peak times to maintain optimal write speeds during normal operations.

Multi-pathing Multipath I/O is a fault-tolerance and performance-enhancement technique that defines more than one physical path between the CPU in a computer system and its mass-storage devices through the buses, controllers, switches, and bridge devices connecting them.

Pre-conditioning Preconditioning is the writing of data (random pattern) to the entire flash device to “break it in”. Data is written to the flash memory in units called pages (made up of multiple cells). However, the memory can only be erased in larger units called blocks (made up of multiple pages). If the data in some of the pages of the block are no longer needed (also called stale pages), only the pages with good data in that block are read and re-written into another previously erased empty block. Then the free pages left by not moving the stale data are available for new data.

Sequentiality Degree to which data I/O stays in sequence

Working Sets Defines the amount of memory that a process requires in a given time interval

©10/2019 Virtana. All rights reserved. WorkloadWisdom and VirtualWisdom are trademarks or registered trademarks in the United States and/or in other countries. All other trademarks and trade names are the property of their respective holders.

Virtana2331 Zanker RoadSan Jose, CA 95131Phone: +1.408.579.4000

View more Resources

Contact a Virtana Expert

Learn more about VirtualWisdom

CONCLUSIONS

ByfollowingtheoutlinedBestPractices,youcanrealizethefollowingbenefits:

•Performanceassurance:EnsurestoragesolutionswillmeetperformanceSLAsunderspecificworkloadsandconfidentlychoosetheoptimalsolutionforthoseworkloads.

•Reducedstoragecosts:Reduceover-provisioningbychoosingthelowestcostsystemsforspecificworkloads;quantifythebenefitandeffectsofeverysystem.

•Increaseduptime:Identifyproblemsinthedevelopmentlabpriortoproductiondeployment;validateallinfrastructurechangesagainstworkloadrequirementsandtroubleshootmoreeffectivelybyre-creatingfailure-inducingworkloadconditionsinthetestanddevelopmentlab.

•Acceleratenewapplicationdeployments:Acceleratetimetomarketbyvalidatingnewapplicationsonnewsystems,makingdeploymentdecisionsfasterandmoreconfidently.

CONTRIBUTORS:

Michael Bello

John Cryan

Carmine DellAquila

Craig Foster

Yang Gao

Peter Murray

Scott Newkirk

Elizaveta Tavastcherna

Jim Bahn

https://www.virtana.com/resources/

mailto:support%40www.virtana.com?subject=Contact

https://www.virtana.com/virtualwisdom/

best practices for evaluating storage using workload ......storage performance testing and...

Documents