disk usage and trend analysis of multiple ......8.10 pie chart for vmax-1 .....48 8.11 pie chart for...
TRANSCRIPT
DISK USAGE AND TREND ANALYSIS OF MULTIPLE SYMMETRIX ARRAYS CONFIGURED WITH FAST/VPMumshad MannambethStorage Operations Specialist, EMC Global Services
Nasia UllasService Delivery Consultant,Hewlett Packard Global Soft Pvt Ltd
2014 EMC Proven Professional Knowledge Sharing 2
Table of Contents
Table of Contents ....................................................................................................................... 2
1. Introduction ......................................................................................................................... 5
2. Virtual provisioning and FASTVP introduction ..................................................................... 7
2.1 Virtual Provisioning ...................................................................................................... 7
2.1.1 What makes reporting difficult in Virtual Provisioning? .......................................... 8
2.2 FAST VP (Fully Automated Storage Tiering for Virtual Pools) ...................................... 9
2.2.1 FASTVP ................................................................................................................ 9
2.2.2 Elements of FASTVP ...........................................................................................10
2.2.3 What makes capacity utilization reporting difficult with FASTVP? ........................10
3. Architecture ........................................................................................................................12
3.1 Script to collect information from Symmetrix ...............................................................12
4. Report Generation Procedure ............................................................................................14
4.1 Collect thin pool details ...............................................................................................14
4.2 Differentiate storage groups based on application type ...............................................14
4.3 Collect Total Written Capacity details per storage group .............................................15
4.3.1 Deciphering the output of the SYMCLI command ................................................15
4.4 Calculate Total Written Capacity for each thin pool per application type .....................16
4.5 Calculate wastage.......................................................................................................17
4.6 Analysis Graphs ..........................................................................................................19
4.6.1 Usage of Thin Pool Per Application ......................................................................19
4.6.2 Distribution of Application across Pools ...............................................................20
4.6.3 Disk usage per Application across Symmetrix Arrays ..........................................21
4.7 In depth Analysis per Host ..........................................................................................22
4.7.1 List all existing storage groups in the Symmetrix ..................................................22
4.7.2 Allocated space is calculated for each of these storage groups separately ..........22
4.7.3 In depth Report ....................................................................................................23
2014 EMC Proven Professional Knowledge Sharing 3
4.8 Trend Reporting ..........................................................................................................25
4.8.1 Sample Trend Reports and Analysis: ...................................................................25
4.8.2 Pie Chart ..............................................................................................................27
5. Automating Report Generation using SQLite and GNUPlot ................................................29
5.1 Import Data into SQLite Database ..............................................................................29
5.2 Generating Graphs Using Gnuplot ..............................................................................32
6. Analyzing the Report to answer Customer Queries ............................................................35
7. Conclusion .........................................................................................................................38
8. Appendix ............................................................................................................................39
8.1 Pool usage per application for VMAX-1 .......................................................................39
8.2 Pool Usage per Application on VMAX-2 ......................................................................39
8.3 Pool Usage per Storage Group for In-Depth Analysis in VMAX-1 ...............................40
8.4 Pool Usage per Storage Group for In-Depth Analysis in VMAX-2 ...............................41
8.5 Pivot Table and Chart analysis for Storage Group distribution ....................................41
8.6 Pivot Table and Pivot Chart analysis for Storage Group pool utilization in VMAX-2 ....42
8.7 Analysis Graphs ..........................................................................................................43
8.7.1 Usage of Thin Pool per Application for VMAX-1 ...................................................43
8.7.2 Distribution of Application across Pools for VMAX-1 ............................................43
8.7.3 Usage of Thin Pool per Application for VMAX-2 ...................................................44
8.7.4 Distribution of Application across Pools for VMAX-2 ............................................44
8.7.5 Usage of Thin Pool per Application across Arrays ...............................................45
8.7.6 Distribution of Applications across Pools for multiple VMAX Arrays .....................45
8.8 Trend Report for VMAX-1 ...........................................................................................46
8.9 Trend Report for VMAX-2 ...........................................................................................46
8.10 Pie Chart for VMAX-1 .................................................................................................48
8.11 Pie Chart for VMAX-2 .................................................................................................48
8.12 Script for Trend Report for applications .......................................................................49
2014 EMC Proven Professional Knowledge Sharing 4
8.12.1 Powershell script to generate Capacity Report .....................................................49
8.12.2 Script for in-depth analysis per storage group ......................................................50
References ...............................................................................................................................53
Disclaimer: The views, processes, or methodologies published in this article are those of the
authors. They do not necessarily reflect EMC Corporation’s views, processes, or
methodologies.
2014 EMC Proven Professional Knowledge Sharing 5
Introduction
The advent of multiple layers of virtualization has made it increasingly difficult to educate
customers on how the different disk technologies in an environment are being utilized by various
applications. In the case of multiple applications residing on the same Symmetrix® array or a
single application spread across multiple arrays, the customer, who may not be a storage
expert, would want to know how much of each application is residing on their different disk types
such as Enterprise Flash Disks (EFD), Fibre Channel (FC), Serial ATA (SATA), etc. However,
virtualization technologies such as Meta LUN, Virtual Provisioning (VP), and Fully Automated
Storage Tiering (FAST) make it difficult for a storage administrator to satisfactorily answer the
customer or create a direct disk utilization report.
Questions raised by customers regarding space utilization include:
How much of my EFD is being used by the SAP application?
If the usage is more than expected, which host in the SAP group is responsible for it?
Which application is using most of my Fibre Channel storage?
If my applications are only using 2TB of SATA disk, why is the total usage 4TB? Where
is the extra space being used?
Why is my ECM application using EFD space when they are not billed for it?
What is the trend of capacity utilization of the EXCHANGE application across multiple
arrays?
The authors of this article have developed a procedure to collate, analyze, and generate a
report which can answer the above questions and more. This report is simple enough to be
understood by an end user or customer who doesn’t know the complex architecture of FAST
VP/Thin Provisioning. Moreover, factoring in the ‘FAST usage trend report’ with the existing
capacity planning method gives us a more accurate capacity management technique.
This report is vital as no tool is currently available to analyze and report FAST VP movement of
data at a sub LUN tier level, or to create a trend report per application based on FAST
movement. Nor do existing tools help in reporting when an application is spread across multiple
physical arrays. We share a set of scripts written in MS DOS batch and PowerShell that can be
easily deployed to generate these reports which are then fed into an Excel spreadsheet and a
free open source SQLite database for long term analysis. The database enables the user to run
2014 EMC Proven Professional Knowledge Sharing 6
complex queries to extract information in the required manner and plot graphs using the freely
available GNU plot tool.
2014 EMC Proven Professional Knowledge Sharing 7
Introduction to Virtual Provisioning and FASTVP
This section provides a basic introduction to some of the concepts of Virtual Provisioning and
FAST VP.
Virtual Provisioning
Virtual provisioning, commonly known as Thin Provisioning, is where storage is allocated on
demand to an application. With virtual provisioning, while the host sees the entire space
allocated to it, in reality only the actual space written to it is utilized at the back end on disks.
Figure 1 shows an example of virtual provisioning.
Figure 1: Virtual Provisioning[1]
Three components which comprise thin/virtual provisioning are:
1. Thin device: This is the device allocated to the host. The host sees and treats it as any
normal Symmetrix logical volume. However, no actual data is saved on the device.
2. Data device: Data written to a thin device is saved on to a group of Symmetrix hyper
volumes called data devices. These devices are not visible to the host and are non-
addressable private devices.
3. Thin Pool: Data devices are grouped together to form a storage pool called Thin Pool. A
thin device must be “bound” to a thin pool before it can be used. Binding makes the thin
device ready and any data written to a bound thin device will be re-directed to the data
devices which are part of the thin pool[2].
2014 EMC Proven Professional Knowledge Sharing 8
Figure 2: Virtual Provisioning Components[2]
Advantages of Thin Provisioning
Improved Capacity Utilization: Only space actually utilized by the application is used at
the storage end, thus reducing waste.
Storage is allocated from a common pool of storage (Thin Pool), This pool is shared
across multiple servers. If a server chooses to not use space allocated to it, it will be
available for use by other servers. All the unused storage from the pool can be shared.
Storage is striped across the whole thin pool, providing an increase in performance.
However, these advantages make it difficult to report capacity utilization in environments where
Virtual Provisioning is deployed.
What makes reporting difficult in Virtual Provisioning?
While the host “sees” the whole space allocated to it as available, from the storage end
only the actual space to which data is written is utilized. Consequently, for a Storage
Administrator, the traditional method of checking the size of the LUN will not give an idea
of space utilization.
2014 EMC Proven Professional Knowledge Sharing 9
A single thin LUN is spread across all data devices in the thin pool. Since a thin pool
could have hundreds of devices, it becomes difficult to pinpoint which disk or how much
of a physical disk type is being utilized by a particular thin device.
Fully Automated Storage Tiering for Virtual Pools (FAST VP)
Automated tiering is the need of the hour. Here, activity of the devices are monitored in the
backend with highly active devices non-disruptively moved to a higher tier of storage (i.e. EFD
disk or FC disk) while inactive devices are moved to a lower cost storage tier (i.e. SATA disks)
as shown in Figure 3 .
Figure 3: FAST Advantages[1]
Here we can see that a small number of active devices are candidates for Flash drives, while a
large number of inactive devices are candidates to be moved to SATA drives. Moving the active
data from FC to EFD will improve performance drastically, while moving inactive data to SATA
will free up space in FC for re-use. Also, SATA disks are 80% cheaper than FC disks, thus
immensely helping lower storage costs.
FAST VP
FAST VP adds finer granularities of performance measurement and data movement. As
performance data is gathered at the sub-LUN level and extents are moved based on activity, the
data from a single Thin Device under FAST VP control can be spread across multiple tiers.
FAST VP can put less-frequently accessed data on more cost-effective drives and only the
busiest extents on higher performing drives[1].
2014 EMC Proven Professional Knowledge Sharing 10
This enables the most active 20% of data to be placed on EFD for greater performance while
the 80% of less active data can be placed on Fibre Channel and SATA drives at a lower cost.
At any time, the hot regions of a thin device managed by FAST VP may be mapped to an
EFD tier, and the warm parts may be mapped to an FC tier. Meanwhile, the cold parts may
be mapped to a SATA tier. By more effectively exploiting drive technology specializations,
FAST VP delivers better performance and greater cost efficiency than FAST.
Elements of FAST VP
Figure 4: Elements of FAST VP[1]
FAST VP Tier: This can contain from one to four thin pools.
FAST VP Policy: This can contain up to three VP Tiers. This policy determines how
much of each tier can be utilized by the devices associated with the policy.
Storage Groups: Storage groups are associated with a particular FAST VP policy. The
thin devices will then be moved to different tiers based on their activity levels and policy
settings.
What makes capacity utilization reporting difficult with FASTVP?
Sub-LUN movement of thin LUNs: This means a particular thin LUN, (which is spread
across hundreds of data devices in the thin pool), can reside on multiple pools, as
extents and not the whole LUN are moved across tiers as part of FAS TVP movement.
Although a storage administrator can know which pool the thin LUN is originally bound
to, that gives no indication to the actual space utilized by the thin LUN in various thin
pools after a FAST VP movement.
2014 EMC Proven Professional Knowledge Sharing 11
A combination of multiple layers of virtualization involving thin LUNs and FAST VP thus
makes it tough for the storage administrator to create a disk capacity utilization report that
can be understood by the end customer who may not be a storage guru.
2014 EMC Proven Professional Knowledge Sharing 12
Architecture
Figure 5 shows the basic block diagram of this architecture.
PowerShell/MD DOS Batch scripts interact with the array to collect information regarding
capacity utilization. This information is then fed into an Excel spreadsheet to generate readable
graphs and charts.
Figure 5: Architecture - Block Diagram
Script to collect information from Symmetrix
The MS DOS Batch and PowerShell scripts are used to extract capacity-related information
from the array. The scripts execute symcli commands to collect information regarding thin pools,
thin pool allocated capacity, storage groups, capacity consumed per storage group, and
application-wise capacity utilization. Detailed steps are given in the flow chart in Figure 6.
2014 EMC Proven Professional Knowledge Sharing 13
Figure 6: Flowchart
Further details regarding these steps along with examples are explained in the next section.
Collect Thin Pool Details
Differentiate storage groups based on application type
Collect Usage details per storage group
Sum actual allocated capacity per tier for each application
Calculate wastage
Identify devices not presented to host but consuming pool space
Calculate pool usage per host for in depth analysis
Export data to Excel/SQLite database
2014 EMC Proven Professional Knowledge Sharing 14
Report Generation Procedure
This section will discuss each of the steps outlined in the flow chart for preparing a capacity
utilization report in detail. We will later discuss automating these.
Collect thin pool details
First, we collect details of thin pools present on the VMAX®, using the command below. This
gives us the name of the thin pools available and its total and used capacity as seen in Figure 7.
Command:
symcfg –sid <> list –thin -pool
Figure 7: Screenshot - Thin pool capacity
Differentiate storage groups based on application type
List all storage groups, differentiating them based on application. This differentiation can be
done using nomenclature, i.e. a naming convention is introduced or currently practiced where
each storage group is named as per the application that is utilizing it. In the event that such a
nomenclature doesn’t exist, the inputs can be derived from a text file, where storage groups are
separated into different text files, based on the application utilizing it. Figure 8 shows an output
where applications are named environment-wise.
Command:
symaccess –sid <sid> list –type stor | select-string “sap” | select-string “_sg”
2014 EMC Proven Professional Knowledge Sharing 15
Figure 8 Screenshot: Storage Groups by Application
Collect Total Written Capacity details per Storage Group
In order to know the actual amount of data written per pool for each device, we use the SYMCLI
command symcfg list –tdev –detail. This will give us the space (in MBs) utilized by a device
across each of the thin pools. Thus, in cases where FAST VP has moved extents of a device to
multiple pools, we can get the actual data written to each pool per device using this command.
Deciphering the output of the SYMCLI command
For example, device 078F in Figure 9 shows Total MBs is 65531. This is the total space seen by
the host, and the device is bound to the SATA thin pool. However, if we move over to column 7,
which mentions Total Written MBs, we see that only 39601 MBs is actually written to this device.
This difference in Total Written MBs and Total MBs is due to virtual/thin provisioning. Looking at
column 6, we can see that out of the 39601 MBs written to device 078F not all of it is in SATA
thin pool. FAST VP has moved 471 MB of it to SATA_1TB pool and 38390 MB to FC_R5 pool,
while 800 MB stays in SATA pool.
This command is vital for collecting and understanding Virtual Provisioning and FASTVP to
know the data distribution across thin pools for each thin device. This command can be run per
storage group as well as per device. We collect the Pool Allocated MBs for each storage group
to show the actual written data on each device and the different thin pools on which it is written.
Command:
symcfg –sid <> list –tdev –detail –mb –sg <>
Figure 9 shows the distribution of data per device across each thin pool. This command is run
per storage group.
2014 EMC Proven Professional Knowledge Sharing 16
Figure 9: Pool Allocated MBs per storage Group
Calculate Total Written Capacity for each thin pool per application type
From the output shown in Figure 9, we can calculate the Total Written Capacity corresponding
to each storage group. The command is run on storage groups which have been differentiated
per application. Thus, if we have a list of Storage Groups which have been differentiated as
belonging to the SAP application, we can generate the report shown in Figure 10 by running the
command symcfg –sid <> list –tdev –detail –mb –sg <> on each of those Storage Groups
and collating the data to an Excel spreadsheet.
This report shows the data distribution across thin pools per application. For example, the SAP
application utilizes 29861 GB in the SATA pool, while it uses no space in the EFD thin pool.
Also, with a single look, It can be deciphered that the EFD thin pool is most utilized by the
EXCHANGE application.
2014 EMC Proven Professional Knowledge Sharing 17
Figure 10: Total Written Capacity Per Application
Calculate waste
There may be a set of devices which are bound to a pool, and utilizing space in a thin pool,
while not being visible to a host. This could include devices which need to be reclaimed or they
could be replication devices, such as TimeFinder® Clone Targets or BCVs which are not
presented to any host. However, exclusively looking at application data use can result in
missing these devices which contribute to space utilization.
This report calculates these set of devices, enabling the unused devices to be reclaimed and
heightening awareness of how much space is being used by replication devices.
To segregate these devices, all devices which are bound to the thin pool are compared with the
devices bound to host visible storage groups. This provides a list of devices not belonging to
any storage group, as seen in Figure 11, which shows a sample list of devices which do not
belong to any storage group.
2014 EMC Proven Professional Knowledge Sharing 18
Figure 11: Sample list of Devices
If any devices appear more than once in this list, they are removed and total space usage for
these devices is calculated using symcfg list –tdev –detail –dev. Here, we use the command per
device, as against per storage group which was run earlier, to calculate the Pool Allocated MBs
for all the devices not visible to host. This is appended to the report, created earlier in Figure 10.
Command:
symcfg –sid <> list –tdev –detail –mb –dev <>
2014 EMC Proven Professional Knowledge Sharing 19
Figure 12: Space Wastage Calculation
This value can be used to plan reclamations, and also analyze the space usage by replication
technologies such as TimeFinder and VP Snap.
Analysis Graphs
Multiple Graphs are created with the data which has been collated in Figure 10 and Figure 12.
Graphs are vital for customers who want to get a quick overall idea of the space utilization
without having to go through numerical analysis.
Use of Thin Pool per Application
This graph helps us and the end user understand the thin pool/disk utilization for each
application. In Figure 13, we can easily decipher that EXCHANGE servers utilize most of the
space across SATA and EFD pools. It can also be seen that FC disks are utilized more by the
SAP application.
2014 EMC Proven Professional Knowledge Sharing 20
Figure 13: Usage of Thin Pool per Application
Distribution of Application across Pools
A variant of Figure 13, here we analyze how each application is distributed across the thin
pools. Figure 14 depicts that EXCHANGE resides mostly on the SATA pool, while SAP
application equally utilizes the SATA as well as Fibre Channel disks. Figure 15 shows the report
for multiple VMAX arrays.
Figure 14: Distribution of Application across Pools
2014 EMC Proven Professional Knowledge Sharing 21
Figure 15: Capacity Utilization Sample Report for two VMAX Arrays
Disk usage per Application across Symmetrix Arrays
There are scenarios where a single application has its data spanned across multiple arrays.
From the graphs in Figure 15, customers get an overview of the data distribution across arrays,
and the disk (SATA/FC/EFD) utilization by an application across arrays.
Figure 16 helps us compare the consumption of thin pools by the Exchange application across
multiple VMAX arrays. While the consumption of SATA pool is high and Fibre Channel pool is
low in VMAX-1, it is the opposite in VMAX-2.
Figure 16: Sample Graph for Two Symmetrix Arrays
2014 EMC Proven Professional Knowledge Sharing 22
In-depth Analysis per Host
Suppose we need to know which particular host is utilizing more space or a substantial increase
or decrease is seen in space utilization over the past few weeks, and need to pinpoint the
specific application which caused it. A thin pool utilization report per storage group/host can be
generated. Steps to do that are given below.
List all existing storage groups in the Symmetrix
We list all the existing storage groups on a Symmetrix. Unlike before, the storage groups are not
differentiated by application type.
Command:
symaccess -sid <s/n> list -type stor
Figure 17: List of Storage Groups
Allocated space is calculated for each of these storage groups separately
The pool allocated capacity for each of the storage groups is collated and a sum of the allocated
capacity per thin pool is calculated for each storage group. This command will give us the per
device pool allocated MBs.
Command:
symcfg -sid <s/n> list -tdev -detail -sg <name_of_sg> -mb
2014 EMC Proven Professional Knowledge Sharing 23
Figure 18: Pool Allocated MBs o/p
In-depth Report
The in-depth report per host can help pinpoint any particular host which is utilizing more space
than usual, or which has experienced unusual growth in the past weeks.
Figure 19: Storage Group Report
2014 EMC Proven Professional Knowledge Sharing 24
A pivot chart is plotted in Figure 20 to help analyze the data in Figure 19. With this, the server
that is utilizing most of the space in pools can be easily determined.
Figure 20: Pivot Chart per Host
Selecting a pool in the pivot chart
From the pivot chart in Figure 20, we can select a particular thin pool to determine its usage by
each host. Figure 21 shows the data after selecting a particular thin pool.
Figure 21: Pool Utilization per Host
Here, we have selected the Thin Pool EFD. It can be seen that EXCHANGE_CLUSTER_SG
utilizes EFD much more than the average of other hosts.
Selecting a storage group in the pivot chart
Particular storage groups can also be selected to see how its data is spread across all the
pools.
2014 EMC Proven Professional Knowledge Sharing 25
Figure 22: Thin Pool Utilization per Host
Here, we select the storage group EXCHANGE_ENDT_SG and find that this host has most of
its storage on the FC pool.
Trend Reporting
Creating a trend report which can predict future utilization of disks is a key feature of this report.
Creating trends which predict future growth becomes especially difficult in environments which
have FAST VP implemented, as the movement of data is based on algorithms not known to the
user. Also, extent-level movement makes predicting usage of data of any disk type or tier more
difficult. However, this report accurately calculates the total written data per disk type per
application. This data can be used to create reliable trend reports on environments which use
FAST VP.
Sample Trend Reports and Analysis
Figure 23 depicts the usage trend of FC/EFD/SATA disks by the SAP application. Around the
time of 16th November, a decrease in SATA usage is seen, accompanied by a spike in FC
usage. This is attributed to the manual migration of data to FC, as per customer request. Spikes
in one disk type use is always accompanied by a decrease in another disk type use, due to data
movement between tiers. A similar trend is observed in Figure 24.
2014 EMC Proven Professional Knowledge Sharing 26
Figure 23: SAP Trend Reporting
Figure 24 depicts the usage trend of Oracle application across disk types. As time goes on,
more FC disks than SATA are needed, and if the FC pool is currently filling up, new FC disks
should be purchased.
Figure 24: Oracle Trend Reporting
Figure 25 shows the Exchange trend report, which is more or less linear. The highest utilized
disk type is SATA. It is expected that utilization of all three disks types to increase over time.
2014 EMC Proven Professional Knowledge Sharing 27
Figure 25: Exchange Trend Report
Pie Chart
A pie chart provides an overview of disk utilization across applications. Figure 26 shows how FC
disks are used by each application. Oracle seems to use almost no FC disks.
Figure 26: Fibre Channel Drive Usage
Figure 27 shows that Flash disks are almost exclusively used by the Exchange application.
Figure 28 shows that Oracle application presently uses very little space on FC/EFD or SATA
disks.
2014 EMC Proven Professional Knowledge Sharing 28
Figure 27: Flash Disk Usage
Figure 28: SATA Disk Usage
If an in-depth analysis is required to find out why the EXCHANGE utilization of SATA has
increased in a particular month, we can refer back to Figure 21 and Figure 22.
2014 EMC Proven Professional Knowledge Sharing 29
Automating Report Generation using SQLite and GNUPlot
The methods described in the previous few sections are good enough to analyze capacity of
multiple Symmetrix arrays and answer customer queries. However, they depend on Microsoft
Excel for data consolidation and graph generation. Some of the steps include manual
collaboration of data. This makes the solution applicable only to administrators working in a
Windows environment. To broaden the support of this solution across operating system
boundaries we propose a new methodology utilizing a SQLite database to store information and
GNUPlot to generate graphs. Some of the advantages of using a SQLite database to store
capacity information include:
Public Domain – It’s Free! The source code for SQLite is in the public domain. No claim
of copyright is made on any part of the core source code.
Zero-Configuration – SQLite requires no installation or configuration to work.
Serverless – With SQLite, the process that wants to access the database reads and
writes directly from the database files on disk. There is no intermediary server process.
Single Database File – A SQLite database is a single ordinary disk file that can be
located anywhere in the directory hierarchy.
Stable Cross-Platform Database File – The SQLite file format is cross-platform[4]
Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX,
VMS, and many other platforms. The source code is copyrighted but freely distributed (i.e. you
don't have to pay for it)[5]. Gnuplot supports many different types of 2D and 3D plots. We will
now see how we can use SQLite and Gnuplot to further simplify report generation.
Import Data into SQLite Database
Reports generated from the PowerShell scripts (as described in section Error! Reference
source not found.) are imported to SQLite database. If not already downloaded, the
precompiled binaries of SQLite may be downloaded from http://www.sqlite.org/download.html.
Once downloaded, create a database named VMAX_Capacity_DB.db3 as shown in Figure 29:
SQLite - Create Database
2014 EMC Proven Professional Knowledge Sharing 30
Figure 29: SQLite - Create Database
Create a table to hold array capacity-related information per pool as shown in Figure 30: SQLite
- Create Table.
Figure 30: SQLite - Create Table
The output of the PowerShell script is a csv file named capacity_report.csv containing capacity
utilization per pool:
Figure 31: SQLite - Input File - capacity_report.csv
Set separator to “,” and import the contents of the csv file to the newly created table as shown in
Figure 32.
Figure 32: SQLite - Import data to table
This process can be followed whenever a new report is generated on a weekly, biweekly, or
monthly basis. Run the Select command to view the contents of the database. Figure 33 shows
a set of sample data captured over a couple of days.
2014 EMC Proven Professional Knowledge Sharing 31
Figure 33: SQLite - View Details
Queries can now be written to get the required information in any format needed. The results of
the queries can be exported to csv format which can be used as input for Gnuplot to generate
graphs. For example, below is a query that can be used to generate a report for capacity
utilization trend analysis for an application—in this case SAP. The output can be exported to a
csv file and used as an input for Gnuplot to generate trend graphs. More examples of
generating graphs using Gnuplot will be seen later in this section.
select p.TS,
TSATA.SAP as "SATA ",
TSATA1TB.SAP as "SATA1TB ", TFC_R5.SAP as "FC",
TEFD_R5.SAP as "EFD"
from (select distinct TS from VMAX_pool_app_capacity) p
left outer join VMAX_pool_app_capacity TSATA on
p.TS = TSATA.TS and TSATA.Pool = "SATA" left outer join VMAX_pool_app_capacity TSATA1TB on
p.TS = TSATA1TB.TS and TSATA1TB.Pool = "SATA_1TB"
left outer join VMAX_pool_app_capacity TFC_R5 on p.TS = TFC_R5.TS and TFC_R5.Pool = "FC_R5"
left outer join VMAX_pool_app_capacity TEFD_R5 on
p.TS = TEFD_R5.TS and TEFD_R5.Pool = "EFD_R5";
2014 EMC Proven Professional Knowledge Sharing 32
Figure 34: SQLite - Query Trending for SAP
Generating Graphs Using Gnuplot
The advantage of using Gnuplot is that it is free and it supports non-interactive uses such as
scripting. Scripts can be written to generate graphs and these can be reused at a later time.
These may also be configured to run automatically and resulting graphs can sent to the
administrator’s email account. We will now discuss how to generate a capacity reporting graph
using Gnuplot.
The capacity report file generated by the PowerShell script (as described in Section Error!
Reference source not found.) may be used to generate a graph using Gnuplot. The input file
capacity_report.csv ( Figure 31: SQLite - Input File - capacity_report.csv contains capacity
utilization of separate pools for each application.
2014 EMC Proven Professional Knowledge Sharing 33
Figure 35: Gnuplot - Command File
Generate graphs by loading the command file in Gnuplot using the command:
A sample histogram generated using Gnu_plot is shown in Figure 36: Gnuplot - Graph
Figure 36: Gnuplot - Graph
2014 EMC Proven Professional Knowledge Sharing 34
A query can be written in SQLite to retrieve required information and the results may be used to
generate a graph using Gnuplot. Custom scripts can be developed for these and these scripts
may be scheduled to run automatically at the end of each month, eliminating manual effort
required to generate reports.
2014 EMC Proven Professional Knowledge Sharing 35
Analyzing the Report to answer Customer Queries
Customer queries previously difficult for Storage Administrators to answer due to multiple layers
of virtualization can now be answered easily by the customer by simply looking at the report.
Below, we try to answer a few sample questions using the report. For the entire report, refer to
the Appendix sheet.
How much of my EFD disk is currently used by the ECM Application in VMAX-1?
Answer:
From the data in Appendix 0, we see that around 3.6 TB of EFD space is utilized by the
ECM application.
Figure 37: Pool Utiization Report - ECM
Use of Exchange hosts is more than expected in VMAX-2. Which host or cluster is
responsible for it?
Answer:
From Appendix 0, we can easily determine that EXCHANGE_CLUSTER_SG has the
maximum usage among the listed Exchange hosts.
Figure 38: Graph - Thin Pool Utilization per Storage Group
2014 EMC Proven Professional Knowledge Sharing 36
Which application in VMAX-1 utilizes Fiber Channel the most?
Answer:
From Appendix 0, we find that most Fibre Channel storage is being used by the
Exchange application.
Figure 39: Graph – Distribution of Application across Pools
In VMAX-1, only 10 TB of my ECM data resides on FC RAID 5 disks. Why is total
usage 14 TB?
Answer:
From Appendix 0, we see that around 4 TB of bound devices are not visible to host and
hence, can be reclaimed to reduce waste.
Figure 40: Report - Space Waste
I want to know if the SAP application in VMAX-2 is using EFD disk as they are not
billed for it and are not supposed to use it.
Answer:
From the data in Appendix 0, we can see that SAP doesn’t use EFD at all. This ensures
that the SAP applications across multiple VMAX arrays in the customer environment are
adhering to the Service Level Agreements.
2014 EMC Proven Professional Knowledge Sharing 37
Figure 41: Report - SAP Capacity Utilization
What is the trend of capacity utilization of Exchange in VMAX-1?
Answer:
From Appendix 0, we can easily check the trend graphs of Exchange and can conclude
that demand for SATA and FC drives is on the rise.
Figure 42: Graph - Trend Analysis
2014 EMC Proven Professional Knowledge Sharing 38
Conclusion
In order to analyze a complex environment with multiple levels of virtualization, we have devised
a reporting mechanism which can be used to generate disk capacity utilization reports in a short
time. This was a request made by my current management, as it was difficult to generate a
report which could be deciphered by people unfamiliar with the underlying storage technologies.
After implementing this reporting mechanism, management gained a better and more detailed
understanding of disk usage. They were able to understand the trends, cross-verify billing
information, and predict data usage by looking into the report without having to contact the
Storage Administrator for details.
The reporting procedure was further simplified using SQLite database to collate data and
Gnuplot to plot graphs. Since these tools can be scripted, the implementation of the overall
procedure eliminated manual efforts to generate reports. Scripts are now scheduled to run at
regular intervals and the reports are emailed to management directly. Since the tools are free,
this proved to be a cost-effective solution. This procedure can further be improved to fit other
EMC products such as VNX® and CLARiiON® arrays as well as multi-vendor arrays such as
Hitachi, HP, and IBM products. With all data being populated to SQLite database, we are
presented with a multitude of analysis and planning possibilities.
2014 EMC Proven Professional Knowledge Sharing 39
Appendix
Disk use and Trend Report Sample for two VMAX arrays
Pool usage per application for VMAX-1
Pool Usage per Application on VMAX-2
Pool Allocated size in GB
SATA_2TB_R6 1854.208008
SATA_1TB_R6 2255.454102
FC450_R1 7886.637695
FC450_R5 11620.64453
FC300_R5 259.34375
SATA_2TB_R6 1844.348633
SATA_1TB_R6 1256.197266
FC450_R1 1412.530273
FC450_R5 526.9101563
FC300_R5 53.27636719
SATA_2TB_R6 16292.75293
SATA_1TB_R6 30252.39551
FC450_R1 12737.03613
FC450_R5 34153.15137
FC300_R5 820.8818359
ECM_SATA_2TB 19329.14453
ECM_BKPSATA2 9829.328125
ECM_FC450_R1 7687.802734
ECM_FC450_R5 10966.06738
ECM_EFD_R5 3610.457031
ECM_SATA_2TB 0
ECM_BKPSATA2 0
ECM_FC450_R1 0
ECM_FC450_R5 3639.311523
ECM_EFD_R5 0.000976563
SAP
Oracle
Exchange
ECM
Wastage (Bound devices not visible to hosts)
Pool Allocated size in GB
SATA 29861.19238
SATA_1TB 9394.287109
FC_R5 30369.18652
EFD_R5 0
SATA 294.3095703
SATA_1TB 16.87792969
FC_R5 457.8935547
EFD_R5 12.04394531
SATA 36289.49023
SATA_1TB 20464.80762
FC_R5 19070.74121
EFD_R5 2607.21582
SATA 1029.486328
SATA_1TB 2636.6875
FC_R5 248.8339844
EFD_R5 0
SAP
Oracle
Exchange
Wastage (Bound devices not visible to hosts)
2014 EMC Proven Professional Knowledge Sharing 40
Pool Usage per Storage Group for In-Depth Analysis in VMAX-1
SG Pool Total MBs Bound Pool FAST Allocated MBs
Exchange_BL460_SG SATA_1TB_R6 0 0
Exchange_BL460_SG SATA_2TB_R6 0 0
Exchange_BL460_SG FC450_R1 0 0
Exchange_BL460_SG FC450_R5 1032480 170027
Exchange_BL460_SG FC300_R5 0 0
Exchange_DMZ_MPV6600_6601_SG SATA_1TB_R6 0 88627
Exchange_DMZ_MPV6600_6601_SG SATA_2TB_R6 0 28506
Exchange_DMZ_MPV6600_6601_SG FC450_R1 0 5748
Exchange_DMZ_MPV6600_6601_SG FC450_R5 1032450 116739
Exchange_DMZ_MPV6600_6601_SG FC300_R5 0 4106
Exchange_ENDT_SG SATA_1TB_R6 0 62399
Exchange_ENDT_SG SATA_2TB_R6 0 11751
Exchange_ENDT_SG FC450_R1 0 442
Exchange_ENDT_SG FC450_R5 1048613 22762
Exchange_ENDT_SG FC300_R5 0 4
Exchange_ENDUR_CYD_SG SATA_1TB_R6 0 15989
Exchange_ENDUR_CYD_SG SATA_2TB_R6 0 17261
Exchange_ENDUR_CYD_SG FC450_R1 0 74
Exchange_ENDUR_CYD_SG FC450_R5 17051559 10463067
Exchange_ENDUR_CYD_SG FC300_R5 0 178527
Exchange_MediaServers_SG SATA_1TB_R6 0 94634
Exchange_MediaServers_SG SATA_2TB_R6 0 12149
Exchange_MediaServers_SG FC450_R1 0 11199
Exchange_MediaServers_SG FC450_R5 4129800 126764
Exchange_MediaServers_SG FC300_R5 0 3027
Exchange_MPV6577_SG SATA_1TB_R6 0 301
Exchange_MPV6577_SG SATA_2TB_R6 0 1
Exchange_MPV6577_SG FC450_R1 0 9
Exchange_MPV6577_SG FC450_R5 10240 398
Exchange_MPV6577_SG FC300_R5 0 11
2014 EMC Proven Professional Knowledge Sharing 41
Pool Usage per Storage Group for In-Depth Analysis in VMAX-2
Pivot Table and Chart analysis for Storage Group distribution
SG Pool Total MBs Bound Pool FAST Allocated GBs
HOUTS696_SG SATA 0 30.88085938
HOUTS696_SG SATA_1TB 0 0
HOUTS696_SG FC_R5 96795 0
HOUTS696_SG EFD_R5 0 0
0
MPS2646_MNET_SG SATA 225851 137.8837891
MPS2646_MNET_SG SATA_1TB 516225 1.881835938
MPS2646_MNET_SG FC_R5 0 398.9902344
MPS2646_MNET_SG EFD_R5 0 0.220703125
0
MPS3417_SG SATA 0 10.15722656
MPS3417_SG SATA_1TB 0 8.2109375
MPS3417_SG FC_R5 1032450 44.22167969
MPS3417_SG EFD_R5 0 0.864257813
0
MTS2671_2672_SG SATA 0 114.2763672
MTS2671_2672_SG SATA_1TB 0 5.8046875
MTS2671_2672_SG FC_R5 177458 4.361328125
MTS2671_2672_SG EFD_R5 0 0.806640625
0
SAP_App01_DR_SG SATA 131062 0.069335938
SAP_App01_DR_SG SATA_1TB 0 0.06640625
SAP_App01_DR_SG FC_R5 0 71.1796875
SAP_App01_DR_SG EFD_R5 0 0
0
SAP_App02_DR_SG SATA 131062 0.202148438
SAP_App02_DR_SG SATA_1TB 0 0.14453125
SAP_App02_DR_SG FC_R5 0 54.97558594
SAP_App02_DR_SG EFD_R5 0 0
0
SAP_App03_DR_SG SATA 131062 0.061523438
SAP_App03_DR_SG SATA_1TB 0 0.033203125
SAP_App03_DR_SG FC_R5 0 48.05078125
SAP_App03_DR_SG EFD_R5 0 0
Storage Group EFD_R5 FC_R5 SATA SATA_1TB
HOUTS696_SG 0 0 30.88085938 0
MPS2646_MNET_SG 0.220703125 398.9902344 137.8837891 1.881835938
MPS3417_SG 0.864257813 44.22167969 10.15722656 8.2109375
MTS2671_2672_SG 0.806640625 4.361328125 114.2763672 5.8046875
SAP_App01_DR_SG 0 71.1796875 0.069335938 0.06640625
SAP_App02_DR_SG 0 54.97558594 0.202148438 0.14453125
SAP_App03_DR_SG 0 48.05078125 0.061523438 0.033203125
2014 EMC Proven Professional Knowledge Sharing 42
Pivot Table and Pivot Chart analysis for Storage Group pool utilization in VMAX-2
Storage Group EFD_R5 FC_R5 SATA SATA_1TB
Exchange_CLUSTER_NEW_SG 55.22949219 3666.290039 1309.970703 1302.970703
EXCHANGE_CLUSTER_SG 2333.088867 10498.24414 21322.26172 16547.46289
EXCHANGE_ENDT_SG 0 95.04492188 0 0
EXCHANGE_ENDUR_SG 157.2480469 1199.698242 11204.93164 1830.357422
EXCHANGE_UCS_ORACLE_SG 62.85644531 3606.210938 2453.328125 787.2441406
2014 EMC Proven Professional Knowledge Sharing 43
Analysis Graphs
Usage of Thin Pool per Application for VMAX-1
Distribution of Application across Pools for VMAX-1
Pool SAP Oracle Exchange
SATA_2TB_R6 1.81075 1.801122 15.91089
SATA_1TB_R6 2.2025919 1.226755 29.54335
FC450_R1 7.7017946 1.379424 12.43851
FC450_R5 11.348286 0.514561 33.35269
FC300_R5 0.2532654 0.052028 0.801642
VMAX 1
2014 EMC Proven Professional Knowledge Sharing 44
Usage of Thin Pool per Application for VMAX-2
Distribution of Application across Pools for VMAX-2
Pool SAP Oracle Exchange
SATA 29.161321 0.287412 35.43896
SATA_1TB 9.1741085 0.016482 19.98516
FC_R5 29.657409 0.447162 18.62377
EFD_R5 0 0.011762 2.546109
VMAX 2
2014 EMC Proven Professional Knowledge Sharing 45
Usage of Thin Pool per Application across Arrays
Distribution of Applications across Pools for multiple VMAX Arrays
Pool SAP Oracle Exchange
SATA 42.348771 9742.576 100.8784
FC 48.960754 33251.03 65.21661
EFD 0 4355.987 2.546109
Multiple VMAX Arrays
2014 EMC Proven Professional Knowledge Sharing 46
Trend Report for VMAX-1
Trend Report for VMAX-2
2014 EMC Proven Professional Knowledge Sharing 47
2014 EMC Proven Professional Knowledge Sharing 48
Pie Chart for VMAX-1
Pie Chart for VMAX-2
2014 EMC Proven Professional Knowledge Sharing 49
Script for Trend Report for applications
Powershell script to generate Capacity Report
############################
# VARIABLE DECLARATION
############################
$dic_app_pool = @{}
$dev_mode = "ON"
$symid = "4332"
#Define file names
$symaccess_list_stor = "symaccess_list_stor.TXT"
$all_tdev_detail = "all_tdev.xml"
$symdev_list_file = "devlist.txt"
$symaccess_list_stor_dev = "symaccess_list_stor_dev.xml"
$outcsvfile = "../SQLite Stuff/capacity_report.csv"
$all_pools = @()
############################
# FUNCTION ADD-TO-DICTIONARY
############################
$capacity_detail = @{
pool_name = ""
app_name = ""
capacity = ""
}
function add_to_dictionary([hashtable]$dictionary,$key,$value){
if($dictionary.containsKey($key)){
$temp_object = $dictionary.Get_Item($key)
$temp_object.capacity = ([int]$temp_object.capacity+[int]$value)
$dictionary.Set_Item($key,$temp_object)
}
else{
$temp_object = (New-Object PSObject -Property $capacity_detail)
$temp_object.app_name = ($key -split "_")[0]
$temp_object.pool_name = ($key -split "_",2)[1]
$temp_object.capacity = $value
$dictionary.add($key,$temp_object)
}
}
###############################
# FUNCTION PROCESS APPLICATION
###############################
function
process_application($app_name="sap",$storage_group_list=@("BASHFUL_SG","jega_S
G","SG_RAYDENT")){
$lun_array = $stor_dev.SymCLI_ML.Symmetrix.Device | %{$devname = $_.dev_name;
$_.storage_group} | ?{ $storage_group_list -match $_.Group_Info.group_name} |
%{$devname} | sort -unique
$tdev_xml.SymCLI_ML.Symmetrix.ThinDevs | % {$_.Device} | ?{$lun_array -match
$_.dev_name} | %{$_.pool} | %{
add_to_dictionary $dic_app_pool ($app_name+"_"+$_.pool_name)
$_.alloc_tracks_mb
}
}
###############################
# MAIN PROGRAM
###############################
2014 EMC Proven Professional Knowledge Sharing 50
$start_date = date
if ($dev_mode -eq "OFF"){
symcfg list > $symcfg_list_file
Remove-Item $symaccess_list_stor -ErrorAction silentlyContinue
Remove-Item $symaccess_list_view_detail -ErrorAction
Remove-Item $all_tdev_detail -ErrorAction silentlyContinue
symdev -sid $symid list | out-file $symdev_list_file -encoding ASCII
$last_dev = ((get-content $symdev_list_file|select -last 2|select -first
1) -split " ")[0]
"Gathering List of Storage Groups"
symaccess -sid $symid list -type stor > $symaccess_list_stor
"Gathering Storage Group of device"
symaccess -sid $symid list -type stor -dev 0000:$last_dev -output xml >
$symaccess_list_stor_dev
"Gathering Thindev Capacity Utilization Detail"
symcfg -sid $symid list -tdev -detail -mb -output xml > $all_tdev_detail
}
$symcli_out_date = date
#$sap_sg_u = get-content $symaccess_list_stor | select-string "sap" | select-
string "_" | select-string -notmatch "fast"
#$esx_sg_u = get-content $symaccess_list_stor | select-string "esx" | select-
string "_" | select-string -notmatch "sap"
#$win_sg_u = get-content $symaccess_list_stor | select-string "_" | select-
string -notmatch "sap","esx","mpv","fast"
"Reading Storage Group Details to XML Variable.."
[xml]$stor_dev = Get-Content $symaccess_list_stor_dev
"Reading TDEV Details to XML Variable.."
[xml]$tdev_xml = Get-Content $all_tdev_detail
$Global_time = date -format "MM/dd/yyyy"
$Values = $dic_app_pool.GetEnumerator() | %{$_.Value}
$pool_details = @{ Time = ""; Array_ID = ""; pool_name = ""}
$Values | %{$_.app_name} | sort -u | %{$pool_details.add($_,"")}
$Values | %{$_.pool_name} | sort -u | %{
$pool = $_
$temp_pool = (New-Object PSObject -Property $pool_details)
$all_pools += $temp_pool
$temp_pool.pool_name = $pool
$temp_pool.Time = $Global_time
$temp_pool.Array_ID = $symid
$Values | ?{$_.pool_name -eq $pool} | %{
$app = $_.app_name
$temp_pool.$app = $_.capacity
}
}
$all_pools | select time,array_id,pool_name,sap,Oracle,Exchange | Export-Csv
$outcsvfile -NoTypeInformation
Script for in-depth analysis per storage group
@echo off & setlocal enableextensions enabledelayedexpansion
del thin_pool_fast.txt
symcfg -sid 332 list -thin -pool |find /i "EI">thin_pool.txt
for /f %%A in (thin_pool.txt) do (
2014 EMC Proven Professional Knowledge Sharing 51
del %%A.txt
del %%A_fast.txt
set tp=%%A
echo !tp!_fast >>thin_pool_fast.txt
)
symaccess -sid 332 list -type stor|find /i "_"|find /i /v "fast">view.txt
echo SG,Pool,Total MBs Bound,Pool FAST Allocated
MBs>//mpt2989/C$/Storage_Report/sg_vmax1.csv
for /F %%S in (view.txt) do (
set server=%%S
symcfg -sid 332 list -tdev -detail -sg !server! -mb>total_all.txt
powershell.exe ./rep.ps1 total
for /f "tokens=2,4" %%A in (total_ps.txt) do (
set bound_pool=%%A
set total_mb=%%B
echo !total_mb!>>!bound_pool!.txt
)
REM SUM OF TOTAL ALLOCATED
for /f %%L in (thin_pool.txt) do (
set t=%%L
set /a su=0
for /f %%M in (!t!.txt) do (
set nu=%%M
set /a su+=nu
)
set sum_of_bound_!t!=!su!
echo !su! is Originally allocated MBs for !server! from pool !t!
REM echo ,!t!,!su!>>//mpt2989/C$/Storage_Report/sg_vmax1.csv
set /a su=0
)
REM FAST MOVEMENT ALLOCATION
for /f "tokens=2,6" %%A in (total_ps.txt) do (
set bound_pool_fast=%%A
set FAST_mb=%%B
echo !FAST_mb!>>!bound_pool_fast!_fast.txt
)
REM SUM OF FAST ALLOCATED
for /f %%L in (thin_pool_fast.txt) do (
set t=%%L
set /a su=0
for /f %%M in (!t!.txt) do (
set nu=%%M
set /a su+=nu
)
echo !su! is FAST pool allocated MBs for !server! in pool !t!
set sum_of_fast_!t!=!su!
)
echo!server!,SATA,!sum_of_bound_SATA!,!sum_of_fast_SATA_fast!>>//mpt2989/C$/S
torage_Report/sg_vmax1.csv
echo!server!,SATA_1TB,!sum_of_bound_SATA_1TB!,!sum_of_fast_SATA_1TB_fast!>>//
mpt2989/C$/Storage_Report/sg_vmax1.csv
2014 EMC Proven Professional Knowledge Sharing 52
echo!server!,FC_R5,!sum_of_bound_FC_R5!,!sum_of_fast_FC_R5_fast!>>//mpt2989/C
$/Storage_Report/sg_vmax1.csv
echo!server!,EFD_R5,!sum_of_bound_EFD_R5!,!sum_of_fast_EFD_R5_fast!>>//mpt298
9/C$/Storage_Report/sg_vmax1.csv
echo ,,,,>>//mpt2989/C$/Storage_Report/sg_vmax1.csv
set /a su=0
for /f %%A in (thin_pool.txt) do (
del %%A.txt
del %%A_fast.txt
)
)
2014 EMC Proven Professional Knowledge Sharing 53
References
[1] Full Automated Storage Tiering for Disk Groups(FAST) and for Virtual Pools (FASTVP) –
Student Guide
[2] Virtual Provisioning Concepts and Planning, Module 06, Symmetrix Configuration
Management Course
[3] Symmetrix Integration with Microsoft Exchange, Symmetrix Expert Course
[4] Distinctive Features of SQLite, SQLite , http://www.sqlite.org/different.html
[5] GNUPlot HomePage, http://www.gnuplot.info/
EMC believes the information in this publication is accurate as of its publication date. The
information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION
MAKES NO RESPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO
THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an
applicable software license.