using elk explore defect data

Use ELK Explore Defect Data

Xu Yabin

Singapore

Content

Customer requirements and defect KPI definition

ELK solution

ELK compared to traditional analytics

method

Customer Requirement

• Online web applications which need to be deployed frequently

• Serious defects and quality issues

• Not enough test before applications deployed

• Defects are always out of control after applications deployed

• Serious defects are always found after the application deployed

• Serious defects are not fixed on time

• Implement Continuous integration and defect management system

• What the result is and how to do continuous assessment for DevOps

activities

Defect KPI Definition

• Based on the customer’s requirements, the defect KPI is

defined as below

• Defect number and distribution

• Defects number before and after applications deployed

• Serious defects number before and after applications

deployed

• Serious defects fixed time

Data analytics tools requirement

• What data analytics tools do we need

• Easily import defect data from current defect system

• Easily configure and calculate to get the KPI data

• Explore defect data without any data model preparation

• Easily dig into the detailed information

• Easy to maintain

• We choose ELK (Elasticsearch, Logstash, Kibana)

Content

Customer requirement and defect KPI definition

ELK solution


method

ELK Solution

Defect Management System

Distributed data storage and search engine

Original defect data

Logstash Elasticsearch Kibana

Data collector Data analytics and result

• Most of the works are done through configuration, not coding

Original defect data

• Original defect data is from customer’s defect management system,

XML format

ELK Data collector： Logstash

• Collect defect data using Logstash

• Compared to traditional data collector (much code work is needed), Logstash

need no code, only several lines of configuration

• Defect data is put into Elasticsearch through Logstash pipeline

ELK User interface configuration: Kibana

• When data is imported into Elasticsearch, UI configuration

can be done using Kibana

• UI configuration is focused on what will be displayed

• Configuration is in a very natural way

• No business data model is needed before doing the configuration

ELK : User interface

• Easily add query conditions and filters to dig into the data

ELK: Filter and dig into the data：defect distribution by time

• The defect data

view shows all

defect data

Most defects are created

in the year 2015, use the mouse to drag

the area

The defect data is

filtered by the

The defect data is filtered by

the time you selected

ELK: Filter and dig into the data：defect distribution by product

• The defect data

view shows all

defect data

Green part is one product

Double click the green product

The defect data is

filtered by the green product

The defect data view can be

changed to green product

defects

ELK: Multidimensional analysis: defect distribution by product

• Defects

• Defects of different products，different color stands for different products

ELK: Defect KPI displayed• Severity

• Defect before or after release

• Defect close time

Content

Customer requirement and defect KPI definition

ELK solution


method

ELK: Advantages

• Analyze data without coding

• Fast deliver and low cost

• High flexibility to analyze data

• Easy deploy and maintain

• Learn business data before the data model is created

• Explore and dig the data step by step based on your understanding of

the business

• Big data method

• Performance

• High Availability

• Extendable

• Collect and import data easily

ELK: Why analyze data without coding

• Data analyzing and display

• Traditional method

• The bottle neck is related database

• Aggregated analysis can’t be done by database itself

• We need code using SQL statement like group by and count

• Even simple code make the analytics difficult, because the data,

data process and UI are coupled with the code

• ELK solution

• Powerful aggregated analysis and search capability

• UI is not coupled with data

• Query conditions and filter can be easily added to current

query

• Simple and powerful aggregated analysis，as SQL

group by

• Business concept can be learned from data

aggregation

• Below is Elasticsearch aggregating code

GET _search

{

"aggs" : {

“product": {

"terms": {"field": "{parsed_xml.product}"}

}

}

}

• The search result can be used for another query

"query_string": { "query":

"parsed_xml.product:\“drivers\" AND (*)" }

ELK: Query from configuration not coding

• Traditional data query issues:

• Too much data returned from select statement

• The main reason is that people don’t know how much data

will be returned before doing select

• The data is not filtered

• Too much data in one single table

• If one table is divided, the query code need to be modified

to merge the query result

• Too much influence to existed program

• Not easy to be extended when data increases

Traditional data query issues

• Big data method and concept

• When the amount of data can not be processed or handled by a

single point of resources (machines, CPU, etc.), The data and

the processing power and can be horizontal split, and does not

substantially affect the existing architecture

• ELK solution:

• Too much data returned from select statement

• Count before query

• Filtered before query using aggregating result

• Too much data in a single table

• One table can be divided, no need to change query

statement

• Time sequence is supported, easy to divide the time serous

data

• Easy to be extended through distributed data storage

How ELK deal with the data query issues

• From:

https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-

shards.html

• Elasticsearch allows you to start small and scale horizontally as you

grow. Simply add more nodes, and let the cluster automatically take

advantage of the extra hardware.

• Elasticsearch clusters are resilient — they will detect new or failed

nodes, and reorganize and rebalance data automatically, to ensure

that your data is safe and accessible.

ELK data storage: Elasticsearch distributed data storage

https://www.elastic.co/guide/en/elasticsearch/guide/current/replica-shards.html

• Traditional data collector issues:

• The database is strictly defined by data types (schema)

• Same data may has different data types in different system

• The data schema relationship (data mapping) between

different system should be defined correctly before data

import

• Or the data import will be failed

Traditional data collector issues

• ELK solution:

• Schema less data import

• No consider data type before data import

• If default data type is not right, it can be changed

How ELK deal with the data collector issues

• With the existing plug-ins, much less programming or no programming

• Filtering, processing and increased data can be easily added to an existing

collection pipe line

• Input and output contents are flexible and extendable

ELK data import: Logstash pipe line

Input：defect data file

Filter1：normalize XML format

Filter2: Get and parser defect data

Filter3： Change time format of the input data

Output：Elasticsearch

Input：defect data file

Filter4： Add a defect fixed time field calculated by defect close time minus defect open time

Output：Elasticsearch

Filter1 Filter2 Filter3

Want to add Filter4 to get defect fixed time

• From: https://www.elastic.co/guide/en/logstash/1.5/deploying-and-

scaling.html

ELK data import: Logstash architecture

using elk explore defect data

Data & Analytics