it assets outage predictive analytics

13
Predictive analytics in IT – Assets Outage Vaidyanathan Sivasubramanian 09/June/2016

Upload: vaidyanathan-sivasubramanian

Post on 15-Apr-2017

117 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: IT Assets Outage Predictive Analytics

Predictive analytics in IT – Assets Outage

Vaidyanathan Sivasubramanian

09/June/2016

Page 2: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 1

Table of Contents

Contents 1. Background ................................................................................................................................2

2. Problem Statement.....................................................................................................................2

3. Solution Architecture ..................................................................................................................2

4. Implementation..........................................................................................................................3

4.a Extract data from Service-Now ............................................................................................3

4.b Create an Azure storage account .........................................................................................3

4.c Provision a HDI cluster ........................................................................................................4

4.d Load data into WASB ..........................................................................................................8

4.e Perform ETL tasks ...............................................................................................................9

4.f Download Hive output ........................................................................................................9

4.g Visualize in Tableau ............................................................................................................9

4.h Delete the HDI cluster ....................................................................................................... 11

5. Inferences ................................................................................................................................ 12

6. Conclusion ............................................................................................................................... 12

Page 3: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 2

1. Background Enterprise IT in any company is overwhelmed with assets maintenance. The biggest challenge is to recognize risks

and consequences of losing critical assets. Shrinking IT budgets mean more pressure to ensure run and maintain

operations go on smoothly. Big Data Analytics can help an enterprise IT in predicting outages of any class of

assets. This white paper discusses a Hadoop based Big Data PoC solution to analyze historical data and perform predictive analytics.

2. Problem Statement IT organization in any enterprise is tasked with high reliability and availability (typically five 9’s requirement) of

assets. IT teams classify critical assets and treat any outage on them as most important with highest priority

assigned to resolve the issue. While this approach is reactive in nature, we want to ensure that we can do

predictive and if possible, prescriptive solutions to assets outage based on historical data from production databases.

Most IT organizations have some operational software, like Service-NOW, to track incidents. Manually going

through the incidents tickets to figure out the assets outage patterns will be impractical. This is where an

automated solution will help to crawl through all the relevant tickets and figure out the appropriate events which

triggered critical outages.

3. Solution Architecture Big Data with Hadoop provides a repeatable, fault tolerant and reasonably fast response solution to the problem.

Here we will explore a solution set with HDInsight with Azure.

Most enterprises use Service-NOW for IT operational tracking and monitoring. The data from Service-NOW is

extracted, specifically the Incidents and associated Assets data, in CSV file formats. In parallel, we will create an

Azure storage account and provision a HDInsight cluster.

As next step, the data from Service-NOW will be loaded into the Azure Blob container. Once done, Hive tables will

be created to help query the input data with criteria that all assets associated with P1 and P2 incidents (mapping

to “Critical” and “High” events) over a period of time will be analyzed and output for further action.

The figure below illustrates the same:

Page 4: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 3

4. Implementation As mentioned previously, the solution was implemented using HDInsight cluster within Azure. The various implementation steps are described below.

4.a Extract data from Service-Now

From Service-NOW, extract data from the “Incidents” table and “Assets” table. For this PoC to be meaningful,

incidents classified as “Critical” and “High” are chosen. As a refresher, Priority of an incident is dependent on the “Impact” and “Urgency” of an outage event.

Impact measure the effect of an incident on business processes, like:

amount of affected users

potential financial losses amount of affected services

deficiency of rules and laws enterprise reputation

Urgency is the time it takes an incident to have a significant impact on the business.

Priority is then classified as “P1 – Critical” wherein the event causes a critical functionality to be completely

unavailable to the entire organization with an immediate and sustained effort using all available resources till

resolved. “P2 – High” refers to an event causing severe functional degradation affecting major portions of the

organization where significant amount of effort and resources are utilized to solve the issue.

For this PoC the files were downloaded as Excel files.

4.b Create an Azure storage account

HDInsight uses Azure Blob Storage for storing data. For provisioning a HDI cluster, an Azure account needs to be

created. A specific Blob container from the account is used as the default file system, just like HDFS. Creating an

Azure free trial account can be done by following the steps here:

https://azure.microsoft.com/en-us/free/

After creation, the account can be verified that it has been created properly by logging onto

https://portal.azure.com. A sample screen shot is below:

Page 5: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 4

4.c Provision a HDI cluster

A HDInsight cluster can be created online. However, for repeatability purpose, I have written an Azure Windows PowerShell script which provisions the cluster on-demand:

#==============================================================================

# Name: Provision_HDInsight_Cluster_Windows

# Date: 26.May.2016

# Author: Vaidyanathan Sivasubramanian

#==============================================================================

# Get the start time to log how long the script took to execute

$Start_Time = Get-Date

# Parameters

$nameToken = "****" #<---- Name for Azure and HDI

$httpUserName = "****" #<---- HDI Credentials

$httpPassword = "********" #<---- HDI Credentials

$resourceGroupName = $nameToken + "ResourceGroup" #<---- HDI Resource Group Name

$hdinsightClusterName = $nameToken #<---- HDI Cluster Name

$defaultStorageAccountName = $nameToken #<---- Azure Storage Account Name

$defaultBlobContainerName = $nameToken #<---- WASB Container Name

$location = "East US 2" #<---- Azure location

$clusterSizeInNodes = 1 #<---- Azure Cluster Size

# Treat all errors as terminating

$ErrorActionPreference = "Stop"

Write-Host " "

Page 6: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 5

Write-Host "***************************************************************"

Write-Host "Connecting to the Azure subscription..."

Write-Host "***************************************************************"

try{Get-AzureRmContext}

catch{Login-AzureRmAccount}

Write-Host " "

Write-Host "***************************************************************"

Write-Host "Creating the resource group..."

Write-Host "***************************************************************"

New-AzureRmResourceGroup -Name $resourceGroupName -Location $location

Write-Host " "

Write-Host "***************************************************************"

Write-Host "Preparing the default Storage Account and Container..."

Write-Host "***************************************************************"

New-AzureRmStorageAccount `

-ResourceGroupName $resourceGroupName `

-Name $defaultStorageAccountName `

-Type Standard_GRS `

-Location $location

$defaultStorageAccountKey = (Get-AzureRmStorageAccountKey `

-ResourceGroupName $resourceGroupName `

-Name $defaultStorageAccountName)[0].Value

$defaultStorageContext = New-AzureStorageContext `

-StorageAccountName $defaultStorageAccountName `

-StorageAccountKey $defaultStorageAccountKey

New-AzureStorageContainer `

-Name $hdinsightClusterName.ToLower() -Context $defaultStorageContext

Write-Host " "

Write-Host "***************************************************************"

Write-Host "Creating the HDI cluster..."

Write-Host "***************************************************************"

$httpPW = ConvertTo-SecureString -String $httpPassword -AsPlainText -Force

$httpCredential = New-Object System.Management.Automation.PSCredential(

$httpUserName,$httpPW)

New-AzureRmHDInsightCluster `

-ResourceGroupName $resourceGroupName `

-ClusterName $hdinsightClusterName `

-Location $location `

-ClusterSizeInNodes $clusterSizeInNodes `

-ClusterType Hadoop `

-OSType Windows `

-Version "3.2" `

-HttpCredential $httpCredential `

-DefaultStorageAccountName "$defaultStorageAccountName.blob.core.windows.net" `

-DefaultStorageAccountKey $defaultStorageAccountKey `

-DefaultStorageContainer $hdinsightClusterName

# Verify the cluster

Page 7: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 6

Write-Host " "

Write-Host "***************************************************************"

Write-Host -NoNewLine "Cluster:" $hdinsightClusterName "info"

Get-AzureRmHDInsightCluster -ClusterName $hdinsightClusterName

Write-Host "***************************************************************"

# Get the end time and duration to display how long it took to execute the script

$End_Time = Get-Date

$Duration = NEW-TIMESPAN –Start $Start_Time –End $End_Time

Write-Host " "

Write-Host "***************************************************************"

Write-Host -NoNewLine "It took" $Duration.TotalMinutes "minutes to complete"

Write-Host ""

Write-Host "***************************************************************"

exit

The output after running the script should look like this:

PS C:\Users\***********\Documents\hdi> Provision_HDInsight_Cluster.ps1

***************************************************************

Connecting to the Azure subscription...

***************************************************************

Environment : AzureCloud

Account : [email protected]

TenantId : ************************************

SubscriptionId : ************************************

CurrentStorageAccount :

***************************************************************

Creating the resource group...

***************************************************************

ResourceGroupName : ************************************

Location : eastus2

Resources : {}

ResourcesTable :

ProvisioningState : Succeeded

Tags : {}

TagsTable :

ResourceId : ************************************

***************************************************************

Preparing the default Storage Account and Container...

***************************************************************

ResourceGroupName : ************************************

StorageAccountName : ************************************

Id : ************************************

Location : eastus2

Sku : Microsoft.Azure.Management.Storage.Models.Sku

Page 8: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 7

Kind : Storage

Encryption :

AccessTier :

CreationTime : 6/1/2016 10:27:05 AM

CustomDomain :

LastGeoFailoverTime :

PrimaryEndpoints : Microsoft.Azure.Management.Storage.Models.Endpoints

PrimaryLocation : eastus2

ProvisioningState : Succeeded

SecondaryEndpoints :

SecondaryLocation : centralus

StatusOfPrimary : Available

StatusOfSecondary : Available

Tags : {}

Context : Microsoft.WindowsAzure.Commands.Common.Storage.AzureStorageContext

CloudBlobContainer : Microsoft.WindowsAzure.Storage.Blob.CloudBlobContainer

Permission : Microsoft.WindowsAzure.Storage.Blob.BlobContainerPermissions

PublicAccess : Off

LastModified : 6/1/2016 10:27:41 AM +00:00

ContinuationToken :

Context : Microsoft.WindowsAzure.Commands.Common.Storage.AzureStorageContext

Name : *******

***************************************************************

Creating the HDI cluster...

***************************************************************

Name : ********

Id : ********

Location : East US 2

ClusterVersion : 3.2.7.941

OperatingSystemType : Windows

ClusterTier : Standard

ClusterState : Running

ClusterType : Hadoop

CoresUsed : 12

HttpEndpoint : *********.azurehdinsight.net

Error :

DefaultStorageAccount :

DefaultStorageContainer :

ResourceGroup : ****************

AdditionalStorageAccounts :

***************************************************************

Cluster: ******** info

Name : ******

Id : ******

Location : East US 2

ClusterVersion : 3.2.7.941

OperatingSystemType : Windows

ClusterTier : Standard

ClusterState : Running

ClusterType : Hadoop

Page 9: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 8

CoresUsed : 12

HttpEndpoint : *********

Error :

DefaultStorageAccount : *********

DefaultStorageContainer : *********

ResourceGroup : *********

AdditionalStorageAccounts : {}

***************************************************************

***************************************************************

It took 24.8811853566667 minutes to complete

***************************************************************

4.d Load data into WASB

The next step involves loading the data into the Blob container files. Again for repeatability purpose, below is an Azure Windows PowerShell script I have written:

$AccountName = "*********"

$resourceGroupName = $AccountName + "ResourceGroup"

$storageAccountName = $AccountName

$containerName = $AccountName

# Get the storage account key

Login-AzureRmAccount

$storageAccountKey = (Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -Name

$storageAccountName)[0].Value

# Create the storage context object

$destContext = New-AzureStorageContext -StorageAccountName $storageAccountName -StorageAccountKey

$storageaccountkey

# Copy the files from local workstation to the Blob container

$fileName = "C:\Users\***********\Documents\HDI\OA\SNOW

data\Consolidated_Incidents_Shortlisted_With_Mock_Config_Item.txt"

$blobName = "OA/Incidents/Incidents.txt"

Set-AzureStorageBlobContent -File $fileName -Container $containerName -Blob $blobName -context

$destContext

$fileName = "C:\Users\***********\Documents\HDI\OA\SNOW data\Assets_Shortlisted.txt"

$blobName = "OA/Assets/Assets.txt"

Set-AzureStorageBlobContent -File $fileName -Container $containerName -Blob $blobName -context

$destContext

The script’s output of the loading is as below:

PS C:\Users\***********\Documents\HDI\oa> Load_data_into_WASB.ps1

Environment : AzureCloud

Page 10: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 9

Account : **************

TenantId : **************

SubscriptionId : **************

CurrentStorageAccount :

ICloudBlob : Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob

BlobType : BlockBlob

Length : 502828

ContentType : application/octet-stream

LastModified : 6/13/2016 10:50:26 AM +00:00

SnapshotTime :

ContinuationToken :

Context : Microsoft.WindowsAzure.Commands.Common.Storage.AzureStorageContext

Name : OA/Incidents/Incidents.txt

ICloudBlob : Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob

BlobType : BlockBlob

Length : 564545

ContentType : application/octet-stream

LastModified : 6/13/2016 10:50:27 AM +00:00

SnapshotTime :

ContinuationToken :

Context : Microsoft.WindowsAzure.Commands.Common.Storage.AzureStorageContext

Name : OA/Assets/Assets.txt

The Blob containers can be verified to be correctly loaded either in the Azure portal or in the HDI cluster portal.

4.e Perform ETL tasks

Once it is verified that the Blobs are loaded properly, create Hive jobs to create External tables for eventual

querying. For this PoC purpose, I have created two external tables to hold Incidents and Assets data. Another

external table OA has the data needed from joining these two tables for P1 and P2 events:

INSERT INTO TABLE OA

SELECT A.Configuration_Item, A.Model_Category, A.Display_Name, A.Vendor, A.Manufacturer,

B.Priority, B.Opened_Date_YM, Count(B.Configuration_Item)

FROM Assets as A, Incidents as B

WHERE A.Configuration_Item = B.Configuration_Item AND

( ( B.Priority RLIKE "Critical" ) OR ( B.Priority RLIKE "High" ) )

GROUP BY A.Configuration_Item, A.Model_Category, A.Display_Name, A.Vendor, A.Manufacturer,

B.Priority, B.Opened_Date_YM;

4.f Download Hive output

From the Hive external table OA, download the records into a text file. This can be accomplished by writing a

simple Select Hive query to dump all the records from the OA table.

4.g Visualize in Tableau

The analyzed records from OA table downloaded into the text file from above can loaded onto any visualization

tool like Tableau. Below are sample screen shots from Tableau

Page 11: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 10

Page 12: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 11

4.h Delete the HDI cluster

HDInsight clusters billing is pro-rated per minute, whether being used or not. So it is a best practice to delete the

cluster after using it and recreate when needed again. Deleting a cluster can be done online from Azure portal.

For repeatability, below is an Azure Windows PowerShell script:

#===============================================================================================

# Name: Delete_HDInsight_Cluster_And_Storage_Account

# Date: 26.May.2016

# Author: Vaidyanathan Sivasubramanian

#===============================================================================================

# Get the start time to log how long the script took to execute

$Start_Time = Get-Date

# Parameters

$nameToken = "**********" #<---- Name for Azure and HDI

$httpUserName = "*****" #<---- HDI Credentials

$httpPassword = "******" #<---- HDI Credentials

$hdinsightClusterName = $nameToken #<---- HDI Cluster Name

$defaultStorageAccountName = $nameToken #<---- Azure Storage Account Name

$defaultResourceGroup = $nameToken + "ResourceGroup" #<---- Azure Resource Group

# Treat all errors as terminating

$ErrorActionPreference = "Stop"

Write-Host " "

Write-Host "***************************************************************"

Write-Host "Connecting to the Azure subscription..."

Write-Host "***************************************************************"

try{Get-AzureRmContext}

catch{Login-AzureRmAccount}

Write-Host " "

Write-Host "***************************************************************"

Write-Host "Deleting the HDI cluster and Azure Storage Account..."

Write-Host "***************************************************************"

Remove-AzureRmHDInsightCluster -ClusterName $hdinsightClusterName

Remove-AzureRmStorageAccount -Name $defaultStorageAccountName -ResourceGroup

$defaultResourceGroup

Remove-AzureRmResourceGroup -Name $defaultResourceGroup -Force

# Get the end time and duration to display how long it took to execute the script

$End_Time = Get-Date

$Duration = NEW-TIMESPAN –Start $Start_Time –End $End_Time

Write-Host " "

Write-Host "***************************************************************"

Write-Host -NoNewLine

Write-Host "It took" $Duration.TotalMinutes "minutes to complete"

Write-Host "***************************************************************"

exit

Page 13: IT Assets Outage Predictive Analytics

Predictive Analytics – IT Assets Outage 09/Jun/2016 Vaidya 12

A sample output from the above script is as below:

PS C:\Users\***********\Documents\hdi> Delete_HDInsight_Cluster_Storage_Account.ps1

***************************************************************

Connecting to the Azure subscription...

***************************************************************

Environment : AzureCloud

Account : *********

TenantId : *********

SubscriptionId : ********

CurrentStorageAccount :

***************************************************************

Deleting the HDI cluster and Azure Storage Account...

***************************************************************

ErrorInfo :

State : Succeeded

RequestId : f3dfc11a-d95e-4839-88cb-5f51c9543412

StatusCode : OK

***************************************************************

It took 7.71634777666667 minutes to complete

***************************************************************

5. Inferences For this PoC, a combination of real production data couple with mock values was collated for six months. From

the Hive output, the following can be inferred:

In the preceding six months, Jan 2016 and Feb 2016 had the highest number of “Critical” and “High”

incidents.

80% of the incidents related to assets category “Computer” (Laptop / Desktop) and “Server”

Majority of the incidents occurred on assets manufactured by ******, ******, ***** and ******

Six of the assets contributed to the highest number of outages.

Further inferences and prescriptive actions can be taken based on the overall number of hours impacted

productivity and the cost associated.

6. Conclusion In this POC, we have analyzed Service-NOW incidents data for last two quarters. Assets which had highest

percentage of outages and tagged with “Critical” and “High” priority were shortlisted. Based on the inferences,

relevant corrective and preventive actions can be taken so that overall outages are brought down as well as

productivity loss / costs associated with them can be controlled.