dmi314 determine storage performance limit validation of storage design hardware burn-in build-out...
Post on 01-Jan-2016
215 Views
Preview:
TRANSCRIPT
Neil JohnsonSenior ConsultantMicrosoft Services, UK
Jetstress Notes From the Field
DMI314
Determine storage performance limitValidation of storage designHardware burn-inBuild-out tests
What is Jetstress For?
When Should I Use Jetstress?
ENVISION PLAN BUILD STABILISEDEPLOY
Use Jetstress during solution design to understand precisely how storage will behave
Use Jetstress during build out to check for build issues and hardware defects
Jetstress uses ESE.DLL to generate an Exchange workload
How Does it Work?
Extensible Storage Engine(ESE)
Storage Subsystem
Background Database Maintenance
Transactional I/O
Win
dow
s I/
O M
anag
er
Dev
ice
Driv
ers
Jetstress Application
Auto tuning
Thread Dispatcher
Background Log Checksummer
Offline Log & Database Checksummer
Windows Operating System Hardware
Windows Performance Counters
Reporting and VerificationPe
rfor
man
ce D
ata
Test a disk subsystem throughput (Recommended)Easy to configureDatabase configuration manually setWorkload controlled by thread count
Test an Exchange mailbox profile Uses Profile for configurationDatabase configuration manually setWorkload controlled by thread count
Test Types
Test ModesPerformance Test “Strict” mode (<= 6 hour test)Average Database Read Latency: 20msAverage Log File Write Latency: 10msMax Database Read Latency: 100ms (6 x Spikes)Max Log File Write Latency: 100ms (6 x Spikes)
Stress Test “Lenient” mode (> 6 hour test)Same Read/Write LatencyMax Database Read Latency: 200ms (6 x Spikes)Max Log File Write Latency: 200ms (6 x Spikes)
What is it?Proving that the storage platform will perform adequately, even if a common failure scenario is experienced
What type of failures?Single spindle failure (Raid)Multi-Path failuresDual controller
What should I expect?The test should still pass*
Failure Mode Testing
Test Process
Installation Initialisation Testing Cleanup
Copy ESE FilesInstall JetstressConfigure XML
Create Databases
Set Thread countRun 2hr testRun 24hr testRun degradedEvaluate results
Remove JetstressRemove dataReboot
1 2 3 4
Installation / ConfigurationUse latest versionCopy ESE files (ESE.DLL, ESEPERF.HXX, ESEPERF.XML, ESEPERF.INI) into installation directory.
Jetstress treats everything as an Active databaseThe test must account for every Active, Standby or Lagged database
Use “Test disk subsystem throughput”Easier to configure
InitialisationInitialisation takes roughly 24hrs per 10TB of Data on SAN*Try to arrange your testing schedule to kick this off over a weekendJetstress generates one database and then copies the rest in parallelWith JBOD The more disk spindles you have, the faster the copy process will beCopy throughput can very high on DAS (950MB/sec; ~70TB in 24Hrs)
DATA (TB) 1TB 2TB 5TB 10TB 50TB 100TB
TIME (Hours) 2.4 4.8 12.0 24.1 120.3 240.6
TIME (Days) 0.1 0.2 0.5 1.0 5.0 10.0
TestingSett thread countStart low and work up until the test failsUse short test duration (0.5 = 30 minutes) to set thread countJetstress generates roughly 30 Random IOPS/thread
Perform 2hr testPerform 24hr testPerform degraded mode 2hr test (If appropriate)Raid array rebuildingDegraded IO pathsDegraded storage controllers
Report WalkthroughThe following data is from a real customer testThanks Boris
Walkthrough of a test report
2HR 24 HR DEGRADED
Success Criteria
Meet or Exceeds IOPS
Meet or Exceed Latency Recommendations
Complete Test Run Without Error or Corruption
Clean-upCopy test data somewhere “safe”Uninstall JetstressRemove test databasesThere are some scripts hidden in the field guide that can helpCreate-JetstressDataFolders.ps1Remove-JetstressDataFolders.ps1Both require Jetstress.XML file for parsing
Remove Jetstress installation folderReboot
Changes in Jetstress 2013The Event log is captured and logged.Errors are logged against the volumeA single IO error anywhere will fail the test.Detects -1018, -1019, -1021, -1022, -1119, hung IO, DbtimeTooNew, DbtimeTooOld.Threads, which generate IO, are now controlled at a global level.This means Auto-Tuning should work again*
Cannot use Jetstress 2013 with Exchange 2010
Basically things are the same as Jetstress 2010 with some bugs fixed and better error handling.
Jetstress – Known IssuesCU1 DLL’s and more than 38 DB’s
Server stack trace: at Microsoft.Exchange.Jetstress.Performance.PerfLog.AddCounterWildcard(String wildcardPath) at System.Collections.Generic.List`1.ForEach(Action`1 action) at Microsoft.Exchange.Jetstress.Performance.PerfLog..ctor(String fileName, Boolean binaryLog, Boolean includeJetDatabase, Int32 millisecInterval, TextWriter outWriter, TextWriter errorWriter) at Microsoft.Exchange.Jetstress.Core.StressEngine.ExecuteTest() at System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr md, Object[] args, Object server, Object[]& outArgs) at System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(IMessage msg, IMessageSink replySink)Exception rethrown at [0]: at System.Runtime.Remoting.Proxies.RealProxy.EndInvokeHelper(Message reqMsg, Boolean bProxyCase) at System.Runtime.Remoting.Proxies.RemotingProxy.Invoke(Object NotUsed, MessageData& msgData) at System.Threading.ThreadStart.EndInvoke(IAsyncResult result) at Microsoft.Exchange.Jetstress.Core.StressEngine.EndExecuteTest() at Microsoft.Exchange.Jetstress.MainConsole.Main(String[] args)
Use SP1 ESE.DLL to workaround this issue.
Jetstress – Known IssuesFaulty Logical Disk Performance Counters…
22.10.2013 17:18:00 -- Microsoft Exchange Jetstress 2013 Core Engine (version: 15.00.0775.000) detected. 22.10.2013 17:18:00 -- Windows Server 2012 Standard (6.2.9200.0) detected. 22.10.2013 17:18:00 -- Microsoft Exchange Server Database Storage Engine (version: 15.00.0712.008) was detected. 22.10.2013 17:18:00 -- Microsoft Exchange Server Database Storage Engine Performance Library (version: 15.00.0712.008) was detected. 22.10.2013 17:18:58 -- Jetstress testing begins ... 22.10.2013 17:18:58 -- Preparing for testing ... 22.10.2013 17:18:59 -- Attaching databases ... 22.10.2013 17:18:59 -- Preparations for testing are complete. 22.10.2013 17:18:59 -- Jetstress testing failed. Error: Jetstress found the following faulty logical disk performance counters: C:\ExchangeDatabases\DAG01-MDB-1. Ensure that all logical disk performance counters are working correctly with System Monitor.
Error: Instance 'C:\ExchangeDatabases\DAG01-MDB-1' does not exist in the specified Category.
Related SessionsHow to uncover the secrets of Disk LatencySession: MNG.302 Date: Wednesday Time: 4:45 PM - 6:00 PM Room: MR 19ab Session
Q&A
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
top related