it assets outage predictive analytics
TRANSCRIPT
Predictive analytics in IT – Assets Outage
Vaidyanathan Sivasubramanian
09/June/2016
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 1
Table of Contents
Contents 1. Background ................................................................................................................................2
2. Problem Statement.....................................................................................................................2
3. Solution Architecture ..................................................................................................................2
4. Implementation..........................................................................................................................3
4.a Extract data from Service-Now ............................................................................................3
4.b Create an Azure storage account .........................................................................................3
4.c Provision a HDI cluster ........................................................................................................4
4.d Load data into WASB ..........................................................................................................8
4.e Perform ETL tasks ...............................................................................................................9
4.f Download Hive output ........................................................................................................9
4.g Visualize in Tableau ............................................................................................................9
4.h Delete the HDI cluster ....................................................................................................... 11
5. Inferences ................................................................................................................................ 12
6. Conclusion ............................................................................................................................... 12
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 2
1. Background Enterprise IT in any company is overwhelmed with assets maintenance. The biggest challenge is to recognize risks
and consequences of losing critical assets. Shrinking IT budgets mean more pressure to ensure run and maintain
operations go on smoothly. Big Data Analytics can help an enterprise IT in predicting outages of any class of
assets. This white paper discusses a Hadoop based Big Data PoC solution to analyze historical data and perform predictive analytics.
2. Problem Statement IT organization in any enterprise is tasked with high reliability and availability (typically five 9’s requirement) of
assets. IT teams classify critical assets and treat any outage on them as most important with highest priority
assigned to resolve the issue. While this approach is reactive in nature, we want to ensure that we can do
predictive and if possible, prescriptive solutions to assets outage based on historical data from production databases.
Most IT organizations have some operational software, like Service-NOW, to track incidents. Manually going
through the incidents tickets to figure out the assets outage patterns will be impractical. This is where an
automated solution will help to crawl through all the relevant tickets and figure out the appropriate events which
triggered critical outages.
3. Solution Architecture Big Data with Hadoop provides a repeatable, fault tolerant and reasonably fast response solution to the problem.
Here we will explore a solution set with HDInsight with Azure.
Most enterprises use Service-NOW for IT operational tracking and monitoring. The data from Service-NOW is
extracted, specifically the Incidents and associated Assets data, in CSV file formats. In parallel, we will create an
Azure storage account and provision a HDInsight cluster.
As next step, the data from Service-NOW will be loaded into the Azure Blob container. Once done, Hive tables will
be created to help query the input data with criteria that all assets associated with P1 and P2 incidents (mapping
to “Critical” and “High” events) over a period of time will be analyzed and output for further action.
The figure below illustrates the same:
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 3
4. Implementation As mentioned previously, the solution was implemented using HDInsight cluster within Azure. The various implementation steps are described below.
4.a Extract data from Service-Now
From Service-NOW, extract data from the “Incidents” table and “Assets” table. For this PoC to be meaningful,
incidents classified as “Critical” and “High” are chosen. As a refresher, Priority of an incident is dependent on the “Impact” and “Urgency” of an outage event.
Impact measure the effect of an incident on business processes, like:
amount of affected users
potential financial losses amount of affected services
deficiency of rules and laws enterprise reputation
Urgency is the time it takes an incident to have a significant impact on the business.
Priority is then classified as “P1 – Critical” wherein the event causes a critical functionality to be completely
unavailable to the entire organization with an immediate and sustained effort using all available resources till
resolved. “P2 – High” refers to an event causing severe functional degradation affecting major portions of the
organization where significant amount of effort and resources are utilized to solve the issue.
For this PoC the files were downloaded as Excel files.
4.b Create an Azure storage account
HDInsight uses Azure Blob Storage for storing data. For provisioning a HDI cluster, an Azure account needs to be
created. A specific Blob container from the account is used as the default file system, just like HDFS. Creating an
Azure free trial account can be done by following the steps here:
https://azure.microsoft.com/en-us/free/
After creation, the account can be verified that it has been created properly by logging onto
https://portal.azure.com. A sample screen shot is below:
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 4
4.c Provision a HDI cluster
A HDInsight cluster can be created online. However, for repeatability purpose, I have written an Azure Windows PowerShell script which provisions the cluster on-demand:
#==============================================================================
# Name: Provision_HDInsight_Cluster_Windows
# Date: 26.May.2016
# Author: Vaidyanathan Sivasubramanian
#==============================================================================
# Get the start time to log how long the script took to execute
$Start_Time = Get-Date
# Parameters
$nameToken = "****" #<---- Name for Azure and HDI
$httpUserName = "****" #<---- HDI Credentials
$httpPassword = "********" #<---- HDI Credentials
$resourceGroupName = $nameToken + "ResourceGroup" #<---- HDI Resource Group Name
$hdinsightClusterName = $nameToken #<---- HDI Cluster Name
$defaultStorageAccountName = $nameToken #<---- Azure Storage Account Name
$defaultBlobContainerName = $nameToken #<---- WASB Container Name
$location = "East US 2" #<---- Azure location
$clusterSizeInNodes = 1 #<---- Azure Cluster Size
# Treat all errors as terminating
$ErrorActionPreference = "Stop"
Write-Host " "
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 5
Write-Host "***************************************************************"
Write-Host "Connecting to the Azure subscription..."
Write-Host "***************************************************************"
try{Get-AzureRmContext}
catch{Login-AzureRmAccount}
Write-Host " "
Write-Host "***************************************************************"
Write-Host "Creating the resource group..."
Write-Host "***************************************************************"
New-AzureRmResourceGroup -Name $resourceGroupName -Location $location
Write-Host " "
Write-Host "***************************************************************"
Write-Host "Preparing the default Storage Account and Container..."
Write-Host "***************************************************************"
New-AzureRmStorageAccount `
-ResourceGroupName $resourceGroupName `
-Name $defaultStorageAccountName `
-Type Standard_GRS `
-Location $location
$defaultStorageAccountKey = (Get-AzureRmStorageAccountKey `
-ResourceGroupName $resourceGroupName `
-Name $defaultStorageAccountName)[0].Value
$defaultStorageContext = New-AzureStorageContext `
-StorageAccountName $defaultStorageAccountName `
-StorageAccountKey $defaultStorageAccountKey
New-AzureStorageContainer `
-Name $hdinsightClusterName.ToLower() -Context $defaultStorageContext
Write-Host " "
Write-Host "***************************************************************"
Write-Host "Creating the HDI cluster..."
Write-Host "***************************************************************"
$httpPW = ConvertTo-SecureString -String $httpPassword -AsPlainText -Force
$httpCredential = New-Object System.Management.Automation.PSCredential(
$httpUserName,$httpPW)
New-AzureRmHDInsightCluster `
-ResourceGroupName $resourceGroupName `
-ClusterName $hdinsightClusterName `
-Location $location `
-ClusterSizeInNodes $clusterSizeInNodes `
-ClusterType Hadoop `
-OSType Windows `
-Version "3.2" `
-HttpCredential $httpCredential `
-DefaultStorageAccountName "$defaultStorageAccountName.blob.core.windows.net" `
-DefaultStorageAccountKey $defaultStorageAccountKey `
-DefaultStorageContainer $hdinsightClusterName
# Verify the cluster
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 6
Write-Host " "
Write-Host "***************************************************************"
Write-Host -NoNewLine "Cluster:" $hdinsightClusterName "info"
Get-AzureRmHDInsightCluster -ClusterName $hdinsightClusterName
Write-Host "***************************************************************"
# Get the end time and duration to display how long it took to execute the script
$End_Time = Get-Date
$Duration = NEW-TIMESPAN –Start $Start_Time –End $End_Time
Write-Host " "
Write-Host "***************************************************************"
Write-Host -NoNewLine "It took" $Duration.TotalMinutes "minutes to complete"
Write-Host ""
Write-Host "***************************************************************"
exit
The output after running the script should look like this:
PS C:\Users\***********\Documents\hdi> Provision_HDInsight_Cluster.ps1
***************************************************************
Connecting to the Azure subscription...
***************************************************************
Environment : AzureCloud
Account : [email protected]
TenantId : ************************************
SubscriptionId : ************************************
CurrentStorageAccount :
***************************************************************
Creating the resource group...
***************************************************************
ResourceGroupName : ************************************
Location : eastus2
Resources : {}
ResourcesTable :
ProvisioningState : Succeeded
Tags : {}
TagsTable :
ResourceId : ************************************
***************************************************************
Preparing the default Storage Account and Container...
***************************************************************
ResourceGroupName : ************************************
StorageAccountName : ************************************
Id : ************************************
Location : eastus2
Sku : Microsoft.Azure.Management.Storage.Models.Sku
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 7
Kind : Storage
Encryption :
AccessTier :
CreationTime : 6/1/2016 10:27:05 AM
CustomDomain :
LastGeoFailoverTime :
PrimaryEndpoints : Microsoft.Azure.Management.Storage.Models.Endpoints
PrimaryLocation : eastus2
ProvisioningState : Succeeded
SecondaryEndpoints :
SecondaryLocation : centralus
StatusOfPrimary : Available
StatusOfSecondary : Available
Tags : {}
Context : Microsoft.WindowsAzure.Commands.Common.Storage.AzureStorageContext
CloudBlobContainer : Microsoft.WindowsAzure.Storage.Blob.CloudBlobContainer
Permission : Microsoft.WindowsAzure.Storage.Blob.BlobContainerPermissions
PublicAccess : Off
LastModified : 6/1/2016 10:27:41 AM +00:00
ContinuationToken :
Context : Microsoft.WindowsAzure.Commands.Common.Storage.AzureStorageContext
Name : *******
***************************************************************
Creating the HDI cluster...
***************************************************************
Name : ********
Id : ********
Location : East US 2
ClusterVersion : 3.2.7.941
OperatingSystemType : Windows
ClusterTier : Standard
ClusterState : Running
ClusterType : Hadoop
CoresUsed : 12
HttpEndpoint : *********.azurehdinsight.net
Error :
DefaultStorageAccount :
DefaultStorageContainer :
ResourceGroup : ****************
AdditionalStorageAccounts :
***************************************************************
Cluster: ******** info
Name : ******
Id : ******
Location : East US 2
ClusterVersion : 3.2.7.941
OperatingSystemType : Windows
ClusterTier : Standard
ClusterState : Running
ClusterType : Hadoop
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 8
CoresUsed : 12
HttpEndpoint : *********
Error :
DefaultStorageAccount : *********
DefaultStorageContainer : *********
ResourceGroup : *********
AdditionalStorageAccounts : {}
***************************************************************
***************************************************************
It took 24.8811853566667 minutes to complete
***************************************************************
4.d Load data into WASB
The next step involves loading the data into the Blob container files. Again for repeatability purpose, below is an Azure Windows PowerShell script I have written:
$AccountName = "*********"
$resourceGroupName = $AccountName + "ResourceGroup"
$storageAccountName = $AccountName
$containerName = $AccountName
# Get the storage account key
Login-AzureRmAccount
$storageAccountKey = (Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -Name
$storageAccountName)[0].Value
# Create the storage context object
$destContext = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey
$storageaccountkey
# Copy the files from local workstation to the Blob container
$fileName = "C:\Users\***********\Documents\HDI\OA\SNOW
data\Consolidated_Incidents_Shortlisted_With_Mock_Config_Item.txt"
$blobName = "OA/Incidents/Incidents.txt"
Set-AzureStorageBlobContent -File $fileName -Container $containerName -Blob $blobName -context
$destContext
$fileName = "C:\Users\***********\Documents\HDI\OA\SNOW data\Assets_Shortlisted.txt"
$blobName = "OA/Assets/Assets.txt"
Set-AzureStorageBlobContent -File $fileName -Container $containerName -Blob $blobName -context
$destContext
The script’s output of the loading is as below:
PS C:\Users\***********\Documents\HDI\oa> Load_data_into_WASB.ps1
Environment : AzureCloud
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 9
Account : **************
TenantId : **************
SubscriptionId : **************
CurrentStorageAccount :
ICloudBlob : Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob
BlobType : BlockBlob
Length : 502828
ContentType : application/octet-stream
LastModified : 6/13/2016 10:50:26 AM +00:00
SnapshotTime :
ContinuationToken :
Context : Microsoft.WindowsAzure.Commands.Common.Storage.AzureStorageContext
Name : OA/Incidents/Incidents.txt
ICloudBlob : Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob
BlobType : BlockBlob
Length : 564545
ContentType : application/octet-stream
LastModified : 6/13/2016 10:50:27 AM +00:00
SnapshotTime :
ContinuationToken :
Context : Microsoft.WindowsAzure.Commands.Common.Storage.AzureStorageContext
Name : OA/Assets/Assets.txt
The Blob containers can be verified to be correctly loaded either in the Azure portal or in the HDI cluster portal.
4.e Perform ETL tasks
Once it is verified that the Blobs are loaded properly, create Hive jobs to create External tables for eventual
querying. For this PoC purpose, I have created two external tables to hold Incidents and Assets data. Another
external table OA has the data needed from joining these two tables for P1 and P2 events:
INSERT INTO TABLE OA
SELECT A.Configuration_Item, A.Model_Category, A.Display_Name, A.Vendor, A.Manufacturer,
B.Priority, B.Opened_Date_YM, Count(B.Configuration_Item)
FROM Assets as A, Incidents as B
WHERE A.Configuration_Item = B.Configuration_Item AND
( ( B.Priority RLIKE "Critical" ) OR ( B.Priority RLIKE "High" ) )
GROUP BY A.Configuration_Item, A.Model_Category, A.Display_Name, A.Vendor, A.Manufacturer,
B.Priority, B.Opened_Date_YM;
4.f Download Hive output
From the Hive external table OA, download the records into a text file. This can be accomplished by writing a
simple Select Hive query to dump all the records from the OA table.
4.g Visualize in Tableau
The analyzed records from OA table downloaded into the text file from above can loaded onto any visualization
tool like Tableau. Below are sample screen shots from Tableau
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 10
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 11
4.h Delete the HDI cluster
HDInsight clusters billing is pro-rated per minute, whether being used or not. So it is a best practice to delete the
cluster after using it and recreate when needed again. Deleting a cluster can be done online from Azure portal.
For repeatability, below is an Azure Windows PowerShell script:
#===============================================================================================
# Name: Delete_HDInsight_Cluster_And_Storage_Account
# Date: 26.May.2016
# Author: Vaidyanathan Sivasubramanian
#===============================================================================================
# Get the start time to log how long the script took to execute
$Start_Time = Get-Date
# Parameters
$nameToken = "**********" #<---- Name for Azure and HDI
$httpUserName = "*****" #<---- HDI Credentials
$httpPassword = "******" #<---- HDI Credentials
$hdinsightClusterName = $nameToken #<---- HDI Cluster Name
$defaultStorageAccountName = $nameToken #<---- Azure Storage Account Name
$defaultResourceGroup = $nameToken + "ResourceGroup" #<---- Azure Resource Group
# Treat all errors as terminating
$ErrorActionPreference = "Stop"
Write-Host " "
Write-Host "***************************************************************"
Write-Host "Connecting to the Azure subscription..."
Write-Host "***************************************************************"
try{Get-AzureRmContext}
catch{Login-AzureRmAccount}
Write-Host " "
Write-Host "***************************************************************"
Write-Host "Deleting the HDI cluster and Azure Storage Account..."
Write-Host "***************************************************************"
Remove-AzureRmHDInsightCluster -ClusterName $hdinsightClusterName
Remove-AzureRmStorageAccount -Name $defaultStorageAccountName -ResourceGroup
$defaultResourceGroup
Remove-AzureRmResourceGroup -Name $defaultResourceGroup -Force
# Get the end time and duration to display how long it took to execute the script
$End_Time = Get-Date
$Duration = NEW-TIMESPAN –Start $Start_Time –End $End_Time
Write-Host " "
Write-Host "***************************************************************"
Write-Host -NoNewLine
Write-Host "It took" $Duration.TotalMinutes "minutes to complete"
Write-Host "***************************************************************"
exit
Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 12
A sample output from the above script is as below:
PS C:\Users\***********\Documents\hdi> Delete_HDInsight_Cluster_Storage_Account.ps1
***************************************************************
Connecting to the Azure subscription...
***************************************************************
Environment : AzureCloud
Account : *********
TenantId : *********
SubscriptionId : ********
CurrentStorageAccount :
***************************************************************
Deleting the HDI cluster and Azure Storage Account...
***************************************************************
ErrorInfo :
State : Succeeded
RequestId : f3dfc11a-d95e-4839-88cb-5f51c9543412
StatusCode : OK
***************************************************************
It took 7.71634777666667 minutes to complete
***************************************************************
5. Inferences For this PoC, a combination of real production data couple with mock values was collated for six months. From
the Hive output, the following can be inferred:
In the preceding six months, Jan 2016 and Feb 2016 had the highest number of “Critical” and “High”
incidents.
80% of the incidents related to assets category “Computer” (Laptop / Desktop) and “Server”
Majority of the incidents occurred on assets manufactured by ******, ******, ***** and ******
Six of the assets contributed to the highest number of outages.
Further inferences and prescriptive actions can be taken based on the overall number of hours impacted
productivity and the cost associated.
6. Conclusion In this POC, we have analyzed Service-NOW incidents data for last two quarters. Assets which had highest
percentage of outages and tagged with “Critical” and “High” priority were shortlisted. Based on the inferences,
relevant corrective and preventive actions can be taken so that overall outages are brought down as well as
productivity loss / costs associated with them can be controlled.