politecnico di milano · 2016-12-15 · bpm basics concept ..... 6 2.2. data quality in business...
TRANSCRIPT
POLITECNICO DI MILANO
School of Industrial and Information Engineering
MSc. in Management Engineering
“A Framework for Data Quality Risk Assessment and Improvement of Business Processes in Information
Systems”
Counselor Professor:
Eng. Cinzia Cappiello
Masters Degree Tesina by:
Angie Paola Quintero Atara - 817618
Iván Ricardo Jiménez Lopera - 813363
Academic Year 2014/2015
1 Politecnico di Milano - 2015
Contents
1.1. The concept of Data Quality ................................................................................................................ 3
1.2. Scope and objective of our work contribution .................................................................................... 4
1.3. Structure and presentation of the current document ........................................................................ 5
2.1. BPM Basics Concept ............................................................................................................................ 6
2.2. Data Quality in Business Processes ..................................................................................................... 7
2.3. Data Quality Dimensions ..................................................................................................................... 8
2.4. Data Quality Breaches ......................................................................................................................... 9
2.5. The Cost of allowing poor Data Quality and its implications ............................................................ 11
2.6. Direct and Hidden Costs caused by low Data Quality ....................................................................... 13
2.7. Data Quality English TIQM Costs ....................................................................................................... 16
3.1. Failure Modes and Effect Analysis (FMEA) ........................................................................................ 18
3.2. FMEA development ........................................................................................................................... 19
3.3. FMEA Procedure ................................................................................................................................ 19
3.3.1. Identification of the product or process .................................................................................... 19
3.3.2. Creating a Chart/Map/Diagram ................................................................................................. 20
3.3.3. FMEA Worksheet ....................................................................................................................... 20
3.3.4. Severity ...................................................................................................................................... 21
3.3.5. Causes of Failure Mode ............................................................................................................. 22
3.3.6. Occurrence ................................................................................................................................ 22
3.3.7. Detection ................................................................................................................................... 23
3.3.8. Risk Priority Numbers ................................................................................................................ 24
4.1. Errors Classification, Data Quality Breaches and Failures ................................................................. 25
4.2. Definition of a Cost Based FMEA ....................................................................................................... 28
4.2.1. Introduction of Cost Quality Factors ......................................................................................... 28
4.2.2. The Proposed System of Cost Evaluation .................................................................................. 29
4.3. Advantage of re-designing the FMEA Ranking Criteria ..................................................................... 34
2 Politecnico di Milano - 2015
4.4. Mapping and translating the possible Variables ............................................................................... 35
4.5. Severity Ranking Criteria ................................................................................................................... 36
4.6. Occurrence Ranking Criteria .............................................................................................................. 37
4.7. Detection Ranking Criteria ................................................................................................................ 39
4.8. Implementation of Improvement Actions ......................................................................................... 40
4.9. Recalculation of RPN parameter after Improvement Actions ........................................................... 42
4.10. Guidelines to carry out a Data Quality Risk assessment using FMEA ........................................... 42
5.1. Creating a Business Process Prototype Model ....................................................................................... 45
5.2. Definition of Improvement Actions ........................................................................................................ 47
5.3. Allocation of improvement actions in the Business Process .................................................................. 48
5.4. Cost Based FMEA and Data Quality Analysis in a Business Process ....................................................... 49
5.5. Decision criteria about which improvement actions might be exercised .............................................. 53
3 Politecnico di Milano - 2015
1.
Nowadays, many companies worldwide are giving much more importance to the approaching of
business processes (BPMN), as a way to model the reality in which they are working on a daily basis
with the aim of identifying the primary (core) and secondary processes, the key performance
indicators and opportunities for improvement among others. In this context, since Data is used in
almost all business processes performed within a company and it is used as a basis for decision-
making a sensible topic emerges: Data Quality.
In the era of Big Data, a company’s’ competitiveness will be relying more on its ability to offer
customized products or services based on an increasingly fine segmentation of its customer
database, and it is here where the Data Quality Risk Assessment plays an important role. The
consequences of Poor Data Quality could be devastating for a company while on the other hand, an
excellent Data Quality Management could set the pace of a successful growing corporate path.
1.1. The concept of Data Quality
In order to come up with a clear definition of the term Quality, first, it is important to look deep
into the context in which it is applied. The term is frequently used in the manufacturing industry
when trying to achieve objectives through the management of the production process. In fact, in
the manufacturing industry there are some statistical techniques aimed to optimize the Quality of
the products (statistical processes under control).
Even though for physical objects is more easily to understand the concept of Quality, this is not the
situation for data. In the case of data, the concept of Quality is more related to intangible
properties such as completeness and consistency. In the end, Data is the output of a production
process and the way in which this process is performed, can have a significant influence in the
Quality of Data.
When trying to understand the concept of Data Quality, some authors refer to this term dividing it
into subcategories and dimensions. Ballou and Pazer (1995) for instance, divide Data Quality in four
4 Politecnico di Milano - 2015
dimensions: accuracy, timeliness, completeness and consistency. They argue that accuracy is an
easier dimension to measure, by simply comparing actual values versus correct values. As for the
timeliness, they also say that it can be measured in a similar way. Completeness assessment can be
done by taking into account predefined data completeness level and as long as the focus is on
whether the data is complete or not. The consistency is a little bit more complex to evaluate as
different schemes are needed to make an effective comparison.
All in all, by looking into the literature the definition of Data Quality, the term comprises many
dimensions, so that the Quality of Data can be defined as a multidimensional concept. The Quality
Dimensions have the aim of capturing a system behavior from a particular point of view depending
on the Quality Dimension selected.
1.2. Scope and objective of our work contribution
Within the scope of the present document, a model for the evaluation of risks associated with Data
Quality in Business Processes is proposed taking as foundations a well-known methodology used in
the quality systems of manufacturing companies mainly, but also applied in other areas of
engineering as an useful engineering instrument tool, to recognize and detect failures (FMEA:
Failure Mode and Effects Analysis). The proposed model will also include a risk assessment
procedure formulating some guidelines to effectuate the analysis and integrate it to the
methodology while at the same time also including different types of costs associated with the
context of risk evaluation in Data Quality.
Starting with the definition of different parameters and variables to introduce in the model such as
error types with the associated failures, data breaches and data quality dimensions affected, the
methodology proposed would be on one hand a qualitative model, considering the qualification
and ranking values of subjective factors such as severity criteria, occurrence and accordance
detection mechanisms, while on the other hand, being also a quantitative model from the
perspective of including a numerical algorithm with the values of the mentioned variables to
provide the calculation of the RPN number as a measurement of data quality. Furthermore, the
model will adapt the influence of preventive and detective controls analyzing their impact in terms
of risk evaluation and improvements to Data Quality.
5 Politecnico di Milano - 2015
1.3. Structure and presentation of the current document
The general organization of the present work attempts to display a well-structured document starting with
the definition of the main objectives, scope of the performed research and the development of the
contribution work.
Beginning with the definition of the baseline topics that consolidate the theoretical background in chapter
N°2 and serve as an input to formulate the model criteria, the variables to be analyzed, the foundations of
Data Quality as part of Business Process Management, Data Quality dimensions, breaches, risk assessment
and cost theory analyzing the implications of low data quality.
Then after in chapter N°3 the foundations of the (FMEA-Failure Mode and Effects Analysis) original
methodology is presented, summarizing the concepts of the general procedure, application on
specific processes, how to identify possible failure modes, the evaluation of the model criteria as:
Severity, occurrence, detection and the importance of defining and calculating a Risk Priority
Number based on the critical variables, the analysis on the numerical analysis and how it defines an
overall evaluation of Quality.
In chapter N°4 a cost based FMEA methodology is presented to evaluate data quality in business
processes, the model is considered to be our own work contribution adapting the original FMEA
methodology with focus in manufacturing production processes and re-defining it into the context
of Information Systems and the concept of Data Quality for business processes under this scope.
The adaptation of the new model includes the mapping of the variables under analysis, the
redefinition of the ranking criteria parameters: Severity, occurrence, detection and the
improvement of the original model by including a cost analysis methodology to strategically
evaluate the implications of poor Data Quality in Business processes, introduction of possible
improvement actions and finally the definition of the general guidelines to carry out a Data Quality
Risk assessment using the improved cost based model would be presented.
An exemplification of the model is presented in chapter N°5 showing an application of the
methodology to evaluate the data quality in the context of a typical business process transaction, a
simulation in excel will be presented to execute the analysis with the relevant variables, definitions
and finally calculation of the data quality criteria with improvement actions. The excel simulation
will serve as a based template and guidelines for implementing the model and creating future
evaluations of Data Quality in generic Business Process. Finally conclusions and overall final
recommendations would be described in chapter N°6.
6 Politecnico di Milano - 2015
2.
2.1. BPM Basics Concept
BPM stands for Business Process Management and the term is attached to all the current business
excellence models used in the companies intended to simplify the processes in a way that the
companies can make some improvements either internally or externally. Elzinga et. al (1995) argues
that many companies are focused in finding a way in which their productivity, the product quality
and operations can be improved and a new area that might sheds the lights on these
improvements, is the business process management (BPM).
According to Zairi (1997), BPM must be governed by certain rules among which it is important to
mention the following:
The principal activities have to be properly identified and documented.
BPM creates horizontal linkages focusing on customers.
BPM must ensure discipline, consistency and repeatability of quality performance.
BPM relies on KPI-s to evaluate the performance of the processes, to set goals and deliver
output levels intended to meet corporate objectives.
The continuous approach to optimization through problem solving must be considered as a
base in which the BPM can rely on.
Trough continuous improvement and best practices the BPM must ensure a
competitiveness enhancement.
One of the first big companies in applying the business process management was Hewlett-Packard
and some authors defined this application as a “Plan, Do, Check, Act, (PDCA) Cycle” since the
approach of the company consisted in defining some metrics for their processes, doing a tracking of
those metrics including management reports and taking the corrective actions where needed.
From the latter, BPM could be considered as a customer-focused approach to the systematic
7 Politecnico di Milano - 2015
management, measurement and improvement of all the company processes by means of a cross-
functional teamwork and employment empowerment.
2.2. Data Quality in Business Processes
Without doubts, one of the most important things in enterprises nowadays, concerns the Quality of
Data affecting directly their business processes. Enterprises that can manage in an efficient and
effective way their business processes along with the Quality of Data are more successful in doing
businesses as this allows for instance, to increase the revenues and have a more reliable Data for
their customer database (enhancement of CRM and/ or ERP systems). This is one of the reasons for
allocating more corporate annually investments in the budget of the enterprises pertaining to Data
warehousing intended to improve the CRM or ERP systems within the companies.
In the era of Big Data, the information has become an important asset to increase the capital value
of any firm by means of the Data Quality Management. An excellent Data Governance policy could
bring important advantages in terms of business processes and hence for the companies:
Improvement in the Quality of products or services and the enhancement of the decision-
making procedures.
General costs reductions.
Improving the ability to change the strategies of the company in fast-paced environments
(increase of the competitiveness).
Improvement of the Business Intelligence tools.
Increase in the customer service level ( i.e. customer satisfaction)
Increase on the positioning of a company in the market (i.e. brand positioning).
The Data Quality Management within an enterprise is a very expensive task, but the prevention of
errors could cost ten times less than one single error. The costs of Poor Quality Data for enterprises
can account to a range from 10% to 25% either of the total revenues or from the total budget in an
organization1. The consequences or losses in terms of money, due to Poor Data Quality are very
complex to estimate as the Data have either impacts on tangible and intangible factors of business
processes. Nevertheless, without at least an estimation of these costs due to Poor Data Quality the
companies are unable to act or eager to take some actions towards the solution of the problems
regarding the Data Quality.
1 Kovacic Andreja ; Business renovation: business rules (still) the missing link.,Business Process Management, pages
158–170, 2004.
8 Politecnico di Milano - 2015
Some studies have estimated that these costs due to Poor Data Quality, without any intervention
from the companies amounts to approximately 20% of the revenues of the company2.
2.3. Data Quality Dimensions
When performing an assessment of Data Quality it is important to keep in mind the client who will
be impacted for the data (i.e. data consumers who use the data). Data consumers have now more
chances of selection and the kind of data they use. So in order to make an approach to a Data
Quality problem within an organization it is necessary to have a broader view, beyond the stored
data which will be part of an intrinsic view, to include instead data in production and utilization
processes (Strong et al., 1997).
The Data Quality can be defined as a concept “fitness for use”, which means that the term Data
Quality is subjective. In other words, Data Quality must be seen and evaluated from the
perspective of the user in a way to find to which extent the data will serve the purpose of the user.
According to this usefulness and usability of data (Strong et al., 1997) provides a classification of
Data based on high Data Quality and four categories: intrinsic, accessibility, contextual and
representational aspects as showed as follows:
DQ Category DQ Perspective
Intrinsic DQ
Accuracy, Objectivity, Believability, Reputation.
Accessibility DQ
Accessibility, Access Security
Contextual DQ
Relevancy, Value-Added, Timeliness, Completeness, Amount of Data
Representational DQ
Interpretability, Ease of understanding, Concise representation, Consistent representation.
Table 2.1– Classification of Data Quality Dimensions3
Wand and Wang (1996) provides another classification for Data Quality, having as a foundation an
intrinsic view and therefore defining four intrinsic dimensions: completeness, unambiguousness,
meaningfulness and correctness. The latter dimensions were argued by (Haug et. al 2009) who
discussed the representational data quality dimension, because according to him, this dimension
2 Pierce Elizabeth M, Information Quality, AMIS, 2006
3 (Strong et al., 1997)
9 Politecnico di Milano - 2015
can be seen as an accessibility data quality and not a category on his own. Having said this, he
proposed three dimensions: intrinsic, accessibility and usefulness.
Since there is no only one definition, but many for every Quality Dimension and there are ways
types to measure them (indicators), for the purpose of this work, the Quality Dimensions proposed
by (Strong et al., 1997) has been used when developing the Data Risk Assessment model, because
these dimensions have the dynamic component which is very important when evaluating Data
Quality Scenarios within the business processes.
2.4. Data Quality Breaches
The model proposed in chapter 4, takes into account 10 quality breaches described by Strong, Lee
and Wang4, therefore it is important to describe briefly every Data Quality problem as these
information Quality problems could have a major impact in one or more Data Quality Dimensions
(as described above in table 2.1) at a time.
1) Multiples sources of the same information produce different values: this Quality breach
affects the dimensions of consistency and believability. The problem here is simply since
different sources of information can create confusion for Data consumers and thus Data
might present inconsistency. The solution in this case is to establish common definitions
and consistent definition by reviewing the information production process.
2) Information is produced using subjective judgments, leading to bias: the dimensions
affected by this Information Quality problem are objectivity and believability. This
information Quality problem is about including subjective values in the Data construction
process and leaving therefore misleading information for data consumers. The solution to
this problem consists on doing continuous improvements to the activities involving
subjective evaluations.
3) Systemic errors in information production leads to lost information: dimensions affected
by this problem are correctness, completeness and relevancy. Systemic errors are those
repetitive errors that can affect the entire system and thus influencing the whole
information production process. The typical solution consists in applying a statistical process
control like the one performed for manufacturing processes. (e.g. acceptance sampling).
4 Diane M. Strong, Yang W. Lee, and Richard Y. Wang. 1997. 10 Potholes in the Road to Information
Quality. Computer 30, 8 (August 1997), 38-46. DOI=http://dx.doi.org/10.1109/2.607057
10 Politecnico di Milano - 2015
4) Large volumes of stored information make it difficult to access information in a
reasonable time: when large volumes of Data are managed, the dimensions affected are
concise representation, timeliness, value-added, and accessibility. Big amount of
information is difficult to manage as some information that must be retrieved might
experiment significance delays. The proposed solution to this problem is to filter and
analyzed the information in order to make regular backups of information regularly
according to the needs.
5) Distributed heterogeneous systems lead to inconsistent definitions, formats and values:
this Information quality problem affects the dimensions of consistent representation,
timeliness and value-added. This problem appears when the Data users are trying to get or
consolidate the information from many sources, which can cause data inconsistencies and
delays in retrieving the information. In this case, the solution is data warehouses that can
help by pulling information from old systems or different sources, executing routines and
solving inconsistencies at a time.
6) Nonnumeric information is difficult to index: the quality dimensions affected by this
problem are concise representation, value-added and accessibility. Representing
nonnumeric information concisely and easy to access is the main problem described here.
The solution to this problem is to evaluate the benefits of electronic storage when compare
with the costs to input and storage the information, in order to determine the feasibility of
doing it.
7) Automated content analysis across information collections is not yet available: this
problem affects analysis requirements, consistent representation, relevance and value-
added quality dimensions. This quality data issue consists on having an easy access to the
information from the various sources in a way that the Data user can manipulate the Data
for constructing reports, analysis, trends, etc. The solution is the awareness of new analysis
routines to compute trends across different databases that will come with electronic
storage development.
8) As information consumers tasks and the organizational environmental change, the
information that is relevant and useful changes: this Data Quality issue affects the
dimensions of relevance, value-added and completeness. This Quality breach is about the
dynamic process of information as it changes since the Data consumers changes constantly,
thus generating mismatches between the information provided by the systems and the
11 Politecnico di Milano - 2015
information required for the Data user. In this case, the solution is to anticipate the changes
in the processes and information systems according to consumer needs.
9) Easy access to information may conflict with requirements for security, privacy and
confidentiality: the dimensions affected in this Information Quality problem are security,
accessibility and value-added. This Data Quality Breach lies on the problem that some kind
of information is important for some users (restricted information) but at the same time,
the same information has some barriers that prevent them to see it (security barriers). The
proposed solution in this case is to develop security settings for the information, as it is
entered for the first time to the system.
10) Lack of sufficient computing resources limits access: dimensions affected by this Quality
Breach are accessibility and value-added. As the words “lack of” indicates, this problems
regards the scarce computing resources available to access Data, making the transactions
more difficult to be executed with the consequence of losing value for delays. The solution
in this case is not to acquire more computers but developing upgrade policies in order to
make the use of the equipment more efficient.
2.5. The Cost of allowing poor Data Quality and its implications
In today’s business environment in the Big Data era, information is the most valuable asset of
companies; it is used in almost all the activities in the business context and it is considered the basis
for decisions on operational and strategic levels, having high quality data is a relevant factor to a
company's success.
Information Technology has leveraged in the recent years up to reaching a level where
organizations have to gather and store huge amounts of data information. Nonetheless, as data
volumes increase, so the complexity to manage information and to apply the appropriate
techniques to store more complex information collected from different technological resources in
the organization and this certainly increases the risk of having poor data quality.
Another data related issue usually mentioned, is that companies often manage data at a local level,
for example at the level of different internal areas or locations of the organization and this implies
the creation of 'information silos' in which data are redundantly stored, managed and processed.5
5 Lee et al., 2006; Smith, 2008; Vayghan et al., 2007
12 Politecnico di Milano - 2015
In this way, data silos imply that many companies face a multitude of inconsistencies in data
definitions, data formats and data values, which makes it almost impossible to easily process,
classify and interpret the data analytics to extract and use relevant data.
Organizations typically overestimate the quality of their data and underestimate the cost of errors,
and it is therefore very important to analyze the impact of having poor data quality since this could
bring significantly negative monetary impacts on the efficiency of an organization6. The implications
of poor quality data carry negative effects to business users through reducing customer
satisfaction, increased running costs, inefficient decision-making processes, lower performance and
lowered employee job satisfaction7.
Poor data quality also increases operational costs since time and other resources are compromised
during the execution of detecting and correcting errors tasks. Information is created and used in all
daily operations, data is a critical input to almost all decisions and poor data quality also means that
it becomes difficult to build trust in the company data, which may imply a lack of user acceptance
of any initiatives based on such data8.
From a solution perspective, today leading IT companies offer Data warehousing solutions
performing the business analytics and data integration of large volumes of structured data for
complex warehouse environments, for example Hadoop Software9, is open source software to
manage large data sets across multiple clusters and repositories, it can be scale from a single to
multiple servers configuration. It also enables applications to work with large volumes of data
stored and combined in different servers of massive clusters. Together with the new generation of
software architectures and technologies as Big Data, today it is possible to manage the complexity
of data integration and accessibility through Business Intelligence and Data Mining tools for the
monitoring of strategic business process, performance management, KPI’s metrics and data
analytics reporting functionalities by dashboards, scorecards to be delivered, customized and
visualized according to the role of the person in the organization, management-executive level, IT
department, business operations, finance or a customer view. In this way, the relevant information
can easily be accessed, processed and interpreted to facilitate the decision making process based
on the visualization of the relevant data.
6 Ballou et al., 2004; Wang & Strong, 1996
7 Kahn et al., 2003; Leo et al., 2002; Redman, 1998
8 Levitin & Redman, 1998; Ryu et al., 2006
9 Hadoop Software Corporate brand
13 Politecnico di Milano - 2015
2.6. Direct and Hidden Costs caused by low Data Quality
The costs of poor data quality are significant in many companies, only very few studies
demonstrate how to identify, categorize and measure such costs10. In practice, low quality data can
bring monetary damages to an organization in a variety of ways. Some authors11 have offered
different categorizations of costs in relation to information quality assessment and classify typical
data quality problems from Data and the User perspective, as shown in the following table.
Data Perspective User Perspective
Context-independent
Spelling error Missing data Duplicate Data Incorrect value Inconsistent data format Outdated data Incomplete data format Syntax violation Unique value violation Violation of integrity constraints Text formatting
The information is inaccessible The information is insecure The information is hardly retrievable The information is difficult to aggregate Errors in the information transformation
Context-dependent
Violation of domain constraint Violation of organization’s business rules. Violation of company and government regulations. Violation of constraints provided by the database administrator.
The information is not based on facts The information is of doubtful credibility The information presents an impartial view. The information is irrelevant to the work The information has inconsistent meanings. The information is incomplete. The information is compactly represented. The information is hard to manipulate. The information is hard to understand.
Table 2.2– Classification of Data Quality problems identified in general literature12
On the issue of data quality management, the authors mention that this is an intersection between
the fields of quality management, information management and knowledge management. Finally,
on the issue of contextual data quality they provide an overview of which publications relate to
different data application contexts, which include: database, information management systems,
accounting, data warehouse, decision-making, enterprise resource planning, customer relationship
management, finance, e-business systems among others.13
10
Eppler & Helfert, 2004; Kim & Choi, 2003 11
Ge and Helfert et al 2007 12
Ge and Helfert et al 2007 13
Haug, A., Zachariassen, F., & Van Liempd, (2011). The costs of poor data quality. Journal of Industrial Engineering and Management, 4(2), 168-193.
14 Politecnico di Milano - 2015
The authors14 review and categorize the potential costs associated with low quality data. They
propose a classification framework and a cost progression analysis to support the development of
quantifiable measures of data quality costs for researchers, they identify 23 examples of costs
resulting from poor quality data, which amongst others are: higher maintenance costs, excess labor
costs, assessment costs, data re-input costs, loss of revenue, costs of losing current customers,
higher retrieval costs, higher data administration costs, process failure costs, information scrap and
rework costs and costs due to increased time of delivery. Additionally, they identify 10 cost
examples of assuring data quality, which are 1) information quality assessment or inspection costs,
2) information quality process improvement and defect prevention costs, 3) preventing low quality
data, 4) detecting low quality data, 5) repairing low quality data, 6) costs of improving data format,
7) investment costs of improving data infrastructures, 8) investment costs of improving data
processes, 9) training costs of improving data quality know-how and lastly 10) management and
administrative costs associated with ensuring data quality. Finally, the authors state that data
quality costs consist of two major types: improvement costs and costs due to low data quality.
Based on this, they provided a simple classification of data quality costs, as shown in the following
table.
Data Quality Costs
Costs caused by low Data
Quality
Direct Costs
Verification costs
Re-entry costs
Compensation costs
Indirect Costs
Costs based on lower reputation
Costs based on wrong decision or actions
Stuck investment costs
Costs of improving or
assuring Data Quality
Prevention Costs
Training costs
Monitoring costs
Standard development and deployment costs
Detection Costs
Analysis costs
Reporting costs
Repair Costs Repair planning costs
Repair implementation costs
Table 2.3– A Data Quality cost taxonomy15
14
Eppler and Helfert et al 2004 15
Eppler and Helfert et al 2004
15 Politecnico di Milano - 2015
Furthermore, a classification of costs inflicted by poor quality data has been proposed and it is
related to how visible the costs are for the organization. In the context of manufacturing processes,
a categorization of the costs is proposed in Direct and Hidden costs. Direct costs can be defined as
costs that are immediately present and visible to the C level of an organization, this could be for
example be faulty delivery addresses for registered customers, resulting in wrong deliveries.
Conversely, Hidden costs refer to the costs that the company is incurring but which the C level of an
organization is not aware of, for example the expenses and costs of faulty decisions making from
not knowing the profitability of products and its implications.
The following table displays the different errors involving Direct Costs in the context of a
manufacturing process; it describes the associated possible errors in the process as Manufacturing
Errors, wrong deliveries, payment errors and problems in delivery times. The table also provides the
corresponding errors mapping with the Data Quality Dimension affected.
Cost Type Error Types Causes DQ Dimension Affected
Direct Costs
Manufacturing errors Inaccurate data of the order processing. Not sufficient data for Materials Requirements Planning (MRP).
• Accuracy • Completness • Consistency • Believability
Wrong deliveries
Mixed data and categorization of customers' information. Lack of updating data repositories with expired registers of previous customers in the system.
• Accuracy • Believability • Consistency
Payment errors
Inaccurate records of stored data prices in the system. Inconsistencies between prices information and inventory in the system.
• Accuracy • Consistency • Value Added
Problems in delivery
times
Low data quality and inaccurate information as input for logistic and operational processes in the company.
• Accuracy • Consistency • Believability • Concise Representation • Value-Added
Table 2.4 –Error Types involving Direct Costs in a Manufacturing Process context
The following table displays the different errors involving Hidden Costs in the context of a
manufacturing process; it describes the associated possible errors in the process as long lead times,
focus on wrong customer segments, poor production planning and poor price policies. The table
also provides the corresponding errors mapping with the Data Quality Dimension affected.
Our contribution in the upcoming chapters would be the one to formulate a Data Quality
assessment cost based model and adapt the cost theory from the manufacturing context into the
16 Politecnico di Milano - 2015
one of Data Quality in Information Systems as it would be seen in chapter 4 of the present
document.
Cost Type Error Types Causes DQ Dimension Affected
Hidden Costs
Long lead times Lack of data timeliness in a daily operational basis.
• Accuracy • Consistency • Concise Representation
Data being registered
multiple times
Lack of implementation of procedures for data de-duplication and information checks validation.
• Accuracy • Consistency • Interpretability
Focus on wrong customer segments
Low data quality and inaccurate processing of the information contained in Strategic Market segmentation surveys.
• Accuracy • Believability • Interpretability
Poor production
planning
Wrong analysis in data tendencies and techniques fro predicting production planning.
• Accuracy • Objectivity • Consistency • Concise Representation • Value-Added
Poor price policies Inaccurate interpretation of current market price information and wrong data tendencies analysis.
• Accuracy • Consistency • Concise Representation
Table 2.5 –Error Types involving Hidden Costs in a Manufacturing Process context
2.7. Data Quality English TIQM Costs
The TIQM (Total Information Quality Management) methodology [English 1999] has been designed
to support data warehouse projects and it focuses on the management activities that are
responsible for the integration of operational data sources. Another classification to evaluate the
cost of poor data quality has been proposed by [English 1999] that categorizes the TIQM costs
generally according to Process Costs; Such as the costs associated with the re-execution of the
whole process due to data errors and Opportunity Costs; Due to lost and missed revenues.
In TIQM, data quality costs correspond to the costs of business processes and data management
processes due to poor data quality. Costs for information quality assessment or inspection measure
data quality dimensions to verify that processes are performing properly. Finally, process
improvement and defect prevention costs involve activities to improve the quality of data, with the
objective of eliminating, or reducing, the costs of poor data quality. Costs due to poor data quality
are analyzed in depth in the TIQM approach, and are subdivided into three categories:
17 Politecnico di Milano - 2015
Process failure costs: Are incurred when poor quality data causes a process not to perform
properly. As an example, inaccurate mailing addresses cause correspondence to be wrongly
delivered. It involves recovery costs of losing information and customers’ impact,
unrecoverable costs, liability and exposures costs.
Information scrap and rework: When data is of poor quality, it involves several types of
defect management activities, such as Data verification and rewrite costs, data correction
costs, workaround costs, redundant data handling and support costs.
Loss and missed opportunity costs: Correspond to the revenues and profits lost because of
poor data quality. For example, due to low accuracy of customer e-mail addresses, a
percentage of customers already acquired cannot be reached by periodic advertising
campaigns, resulting in lower revenues, roughly proportional to the decrease of the
accuracy of addresses. It also involves recovery costs of losing information and customers’
impact, unrecoverable costs, liability and exposures costs.
In the upcoming chapter number 4, the English TIQM methodology costs would be also considered
as a base to formulate a Data Quality assessment cost based model, integrating the whole theory
costs reviewed in this chapter.
18 Politecnico di Milano - 2015
3.
The purpose of this chapter is to introduce the main theory of the FMEA methodology, since it is
important to understand the underlying concepts that will be used in the risk assessment model.
Some important definitions and concepts related to the original methodology which will be used as
foundations for the model, in a general way, are also introduced in this chapter as well.
3.1. Failure Modes and Effect Analysis (FMEA)
The failure Modes and Effect Analysis (FMEA) is a methodology for analyzing potential reliability
problems early in the development cycle, typically in manufacturing processes, where it is easier to
take actions to overcome these issues, overall, enhancing the reliability trough design. FMEA is
used to identify potential failure modes, determine their impact on the operation of the product,
and identify actions to mitigate those impacts and therefore, the failures.
An important step on the development of this methodology is the anticipation of what might go
wrong with the product. Like this process of anticipating every failure mode is an impossible task,
the development team or responsible for the application of the methodology, should express as
long as possible the failure mode list. The bigger the list, the better the chances for identification of
possible failures.
The continuous and early use of FMEA methodology at the design time, allows predicting the
possible failures, and producing more reliable, safe and quality products. Moreover, the
methodology also allows capturing valuable information regarding further improvements to the
product or service.16 In other words, the proper use of the methodology can anticipate and prevent
problems therefore, reducing costs, shortening the lead times of the products, and achieving highly
reliable products and processes.
The NASA defines FMEA, as a forward logic (bottom-up) tabular methodology that explores the
ways or modes in which each system element can fail and evaluates the consequences for each
16
Somnath Deb, Sudipto Ghoshal, Amit Mathur, Roshan Shrestha and Krishna R. Pattipati, Multisignal Modeling for Diagnosis, FMECA, and
Reliability,IEEE,pp. 3-17.
19 Politecnico di Milano - 2015
failure. FMEA is also according to NASA, a useful tool for costs that facilitates the studies to
implement effective risk mitigation and countermeasure.
The different approaches or definitions from this tool have something in common, because there is
always and afterwards an examination of potential failures. After this, there is an evaluation of the
identified failures17.
3.2. FMEA development
US Army first used the methodology in 1949 by the introduction of Military Procedures Document
(MILP-P) 1629, Procedures for Performing a Failure Mode Effect and Critically Analysis, and
afterwards NASA used it for the Apollo missions in 1960’s. NASA used this methodology having in
mind the goal of mitigation of risks due to small sample sizes. In the late 1970’s, the methodology
was introduced in the automotive industry with the aim to prevent liability costs (Ford Motor
Company).
Nowadays, even though the FMEA initially was developed by the military, the use of this technique
has been spread out all over different areas such the manufacturing industry, the design of
products, the performance of services, quality assurance procedures, etc. Moreover, the tool is
often required to comply with safety and quality requirements, such as Process Safety
Management (PSM), Six Sigma, FDA Good Manufacturing Practices (GMPs), ISO 9001, etc.
3.3. FMEA Procedure
Even though there are several different approaches to perform a Failure Modes and Effect Analysis,
one possible way is described as follows:
3.3.1. Identification of the product or process
Prior to the application of the methodology, it is important to perform certain preparatory steps.
The starting point begins with the description of the product or process, because an overall view of
the product or process is essential for the properly application of the methodology. This
understanding make it easier for the people who is performing the assessment in the identification
of those products or processes uses that fall into the methodology; in other words, this
characterization phase will help to simplify those products or processes by considering either the
intentional and unintentional uses.
17
JB. Bowles, Materials. Park, Failure modes and effect analysis, ASM International, 2002,pages 50-59
20 Politecnico di Milano - 2015
3.3.2. Creating a Chart/Map/Diagram
The next step, is characterized by the describing the full picture of the process/product through the
sketching of a diagram or map or a chart. The most frequently tool used is a business process
diagram or a chart. This diagram shows major components or process steps as blocks connected
together by lines indicating how the components or steps are integrated. This diagram can be very
useful for showing the logical relationships of the components by establishing a structure around
which the FMEA can be performed.
3.3.3. FMEA Worksheet
Once the diagram has been completed, the next step is to construct a framework, which is basically
a worksheet listing the products or processes to evaluate under the methodology according to the
diagram performed in the previous step (see the example of a header in table n.1). Afterwards, the
Failure Modes must be identified. A failure mode can be defined as the way in which a component,
subsystem, process, system, etc. could potentially fail. A failure mode in one process can serve as
the cause of a failure mode in another process. Subsequently, for each failure is necessary to
identify whether or not the failure is likely to occur (probability of occurrence).
Table 3.1 - Example of a header for product or process in FMEA methodology
Comparing or looking at similar processes or products and failures that have been previously
documented, could be a good starting point. Then, it is needed to describe the effects of those
failure modes, and the evaluator must determine what the ultimate effect will be for every failure
mode. The failure effect can be defined as the consequences of a failure mode from the customer
perspective related to the function of a product/process. Those effects must be described in a way
of what the customer can see or experience.
Preventive
Controls
Detective
Controls
8Protecting IT
Assets
To block
unauthorized
requests
Rules not
appropriately
configured
IP Spoofing
Diversion of
sensitive data
traffic, fraud
8Procedures not
followed2
Procedures
available 4 64
D
e
t
R
P
N
P o tential
T echnical
Effect(s)
o f F ailure
P o tential
B usiness
C o nsequence(s)
o f F ailure
S
e
v
P o tential
C ause(s) /
M echanism(s)
o f F ailure
P
r
o
b
C urrent C o ntro ls
Sl.No.B usiness /
Service F unctio n
P o tential
F ailure M o de(s)
21 Politecnico di Milano - 2015
3.3.4. Severity
In the context of the FMEA methodology, the severity is defined as an assessment of the
seriousness of the effect and it is linked directly to the potential failure mode, which is under the
subject of study. For measuring the severity, there is a ranking that represents how difficult it will
be for the subsequent operations to be completed regarding its specification in performance, cost
and time.
There are several rankings, but a suggested criteria usually used in the industry nowadays is
showed in table n. 2 A common industry standard also suggest a scale from 1 to 10 in which 1
represents no effect while 10 indicates very severe with failure affecting system operation and
safety without warning. The aim of the ranking is helping the analyst to determine whether a failure
would be a minor trouble or a catastrophic occurrence to the customer either internal or external.
This ranking will also be the first critical step in the prioritization of the failures, thus addressing the
real big issues first.
Effect Severity of Effect Ranking
Catastrophic Resource not available / Problem unknown 10
Extreme Resource not available / Problem unknown 9
Very High Resource not available / Problem known and can be controlled
8
High Resource Available / Major violations of policies 7
Moderate Resource Available / Major violations of process 6
Low Resource Available / Major violations of procedures 5
Very Low Resource Available / Minor violations of policies 4
Minor Resource Available / Minor violations of process 3
Very Minor Resource Available / Minor violations of procedures 2
None No effect 1
Table 3.2 - Severity Ranking
22 Politecnico di Milano - 2015
3.3.5. Causes of Failure Mode
For each failure mode is necessary to identify the causes. These causes are defined as design
weaknesses that may result in a failure. The causes should be listed in technical terms instead of
symptoms. As an example in the case of a product, the improper operating conditions, too much
solvent, excessive voltage or improper alignment can be potential causes.
3.3.6. Occurrence
Once the severity is finished, the occurrence is the next stage. This term is defined as the
assessment of the probability that the specific cause of the Failure mode will occur. In other words,
the occurrence is the likelihood of occurrence for each cause of failure.
A numerical weight should be assigned to each cause, which indicates how likely that cause is
(probability of occurrence for each cause of failure). This is why frequently the failure history is
helpful in increasing the truth of the probability.
A common industry standard uses a scale in which 1 represents the probability that the failure will
not occur (unlikely event) and 10 to indicate an imminent probability of occurrence. Sometimes a
CPk indicator is also associated to the scale of occurrence as showed in the table n. 3. This number
comes from the Quality Systems to indicate the process capability.
Probability of Failure Failure Prob. Cpk Ranking
Very High: Failure is almost inevitable >1 in 2 <0,33 10
1 in 3 0,33 9
High: Repeated failures
1 in 8 0,51 8
1 in 20 0,67 7
Moderate: Occasional failures
1 in 80 0,83 6
1 in 400 1,00 5
1 in 2,000 1,17 4
Low: Relatively few failures
1 in 15,000 1,33 3
1 in 150,000 1,50 2
Remote: Failure is unlikely <1 in 1,500,000 >1,67 1
Table 3.3 - Probability of failure Ranking
23 Politecnico di Milano - 2015
3.3.7. Detection
There are two types of detection. On one hand, there is the necessity to identify the current
controls (design or process) which are mechanisms that prevent the cause of the failure mode from
occurring or detect the failure before it reaches the final user (customer). The person in charge of
applying the methodology should identify in this stage testing, analysis, monitoring and other
techniques that can be used on the same or similar products/processes to detect failures.
Each of these controls should be evaluated to determine how well it is expected to identify or
detect failure modes. After a new product or process has been in use previously undetected or
unidentified failure modes may appear. The FMEA should then be updated and new plans have to
be made to address those failures in order to eliminate them from the product/process.
On the other hand, the evaluator has to assess the probability that the proposed process controls
will detect a potential cause of failure or a process weakness. Improving Product and/or Process
design is the best strategy for reducing the Detection ranking - Improving means of Detection still
requires improved designs with its subsequent improvement of the basic design. Higher rankings
should question the method of the Control. The ranking and suggested criterion is as described in
table n.4.
Detection Likelihood of Detection Ranking
Absolute Uncertainty Control cannot prevent / detect potential cause/mechanism and subsequent failure mode
10
Very Remote Very remote chance the control will prevent / detect potential cause/mechanism and subsequent failure mode
9
Remote Remote chance the control will prevent / detect potential cause/mechanism and subsequent failure mode
8
Very Low Very low chance the control will prevent / detect potential cause/mechanism and subsequent failure mode
7
Low Low chance the control will prevent / detect potential cause/mechanism and subsequent failure mode
6
Moderate Moderate chance the control will prevent / detect potential cause/mechanism and subsequent failure mode
5
Moderately High Moderately High chance the control will prevent / detect potential cause/mechanism and subsequent failure mode
4
High High chance the control will prevent / detect potential cause/mechanism and subsequent failure mode
3
Very High Very high chance the control will prevent / detect potential cause/mechanism and subsequent failure mode
2
Almost Certain Control will prevent / detect potential cause/mechanism and subsequent failure mode
1
Table 3.4 - Detection Ranking
24 Politecnico di Milano - 2015
3.3.8. Risk Priority Numbers
The Risk Priority Numbers is the mathematical result obtained by the product of three ratings:
Severity, Occurrence and Detection.
RPN = (Severity) x (Probability) x (Detection)
The RPN is used to prioritize items that require additional quality planning or action. The final
numbers coming out as RPN numbers, normally range from 1 to 5 or from 1 to 10 and the criteria
used for each rating scale will depend on the particular circumstances for the product or process
that is being analyzed. All the failures are rated according to the same set of rating scales and this
number can be used to compare and rank failures within the analysis. Nonetheless, since ratings
are assigned relative to a particular analysis, it is not commonly appropriate to compare RPN results
among different analyses.
25 Politecnico di Milano - 2015
4.
As integration to the current research and the theory analyzed in previous chapters, this chapter
attempts to present the final output and contribution to the current document with the formulation
of a Cost Based FMEA Data Quality Model based on the fundamentals of the FMEA theory from the
original context of evaluating quality in manufacturing processes to the context of Data Quality
analysis in Information Business Processes.
4.1. Errors Classification, Data Quality Breaches and Failures
Applying the theoretical concepts of Data Quality breaches and Information Failures Categories, the
next step is to define the possible Error Types along the whole transaction path of the business
process, identifying the parts of the process where they might occur, an accurate description of the
errors classification, their implications and overall how they system could be affected. For the
analysis of this research, the error classification was used and readapted in the context of the
business process in reference and in accordance with the description of data quality potholes and
based literature18 that generally classifies the information errors in the following types:
Ambiguous Information: Interpreted in different incorrect ways.
Incorrect Information: Information is provided, but it is incorrect.
Misread, Misinterpret: Reading errors, errors in understanding consistency and correct
information.
Omitted Information: Information essential for the correct execution of a process or
operation is not available or has never been prepared.
Inadequate Warning: A warning is sent and readily available, but the method of warning is
not adequate to attract the operator’s attention.
18
C. Martin Hinckley, Make no Mistake, Oregon: Productivity Press. 2001
26 Politecnico di Milano - 2015
Then after, each type of error would be categorize within the different ten different types of Data
Quality Breaches and according to this classification the procedure continues with the definition of
the associated consequences of an error known as Failures. It is important at this point to map the
relationship between errors and the possible failures to provide a description in a certain degree of
the impact at the organizational level, implication in the business process activities and overall
vulnerabilities caused by a determined error type. A Failure may fall into one of the following
categories:
Incomplete: The activity does not fully perform its function (e.g. when trying to discover an
important disease if one test is missed this could represent an incomplete failure)
Invalid: The correct service does not last for a right period of time (e.g. necessary resources
to performed an activity are not available enough time for example 1 hour so there is an
invalid failure)
Inconsistent: The activity cannot perform consistently (e.g. in a hospital some activities are
performed consistently with machines involving cutting edge technology, however some
activities requiring human factor involvement can present inconsistencies)
Timeliness: The activity is not enacted on time (e.g. in a hospital some activity can take
longer than expected so that the patient cannot receive the appropriate treatment on time)
Inaccurate: The activity is not enacted for the right purpose (e.g. when performing a blood
test the outcome can be wrong because the sample can be contaminated and as a
consequence the result might have deviations from the real values)
In the process of identifying and classifying the different Error Types, it was found that some of
them were overlapping or classified within the same general category, while some others did not
depend directly on data quality evaluation and therefore are not to be contemplated in the scope
of this research. In this order of ideas we propose a classification in the following table mainly
based in the context of general error classifications related to Data in Information systems, and
particularly contextualized into Data quality evaluation of Business Processes transactions, with this
on mind, the definition and classification present an analysis for errors such as: Data Entry
Processing and Inaccurate Information in the System referring to Erroneous Data arising from
errors on data entry and also considering that incorrect data processing can lead to incorrect
information attributes; Misalignment with External sources if any connection with external
applications that serve as data input to the business process activities; Inconsistencies in External
27 Politecnico di Milano - 2015
Sources referring mainly to inconsistent data with variations in the codification of information’s
values, misspelling, or wrong formatting; Missing confirmation/validation notifications to the user
Missing Data resulting from incomplete collection of information, missing record or attributes; Data
Duplication issues of encoding information with the same value. The table presents the relevant
Error Types matching the corresponding Failures, definition and analysis accordingly.
ERROR TYPES - FAILURES
ERROR TYPE DQ BREACH FAILURES FAILURES CATEGORY
1
Data Entry Processing and Inaccurate Information in the System The system does not process
correctly the information inserted or selected by the user in the web interface causing a Halt/ Fault in the process.
This leads to wrong or inaccurate information presented by the GUI.
Automated content analysis across information collections is not yet available.
Distributed Heterogeneous Systems lead to inconsistent, formats and values.
Inaccuracies in the business process activity, low business metrics on Service Delivery objectives and customers’ dissatisfaction due to wrong data provided by the system.
Loss of sales for the company due to inaccurate data presented to the user.
Constraints on reputation and credibility.
Inaccuracy / Invalidity/ Inconsistency
2
Misalignment with External sources The system can get stuck in a loop
when linking and validating information of external web sources, like information provide from Banking/Online Payment portals.
Multiple sources of the same information produce different values.
Automated content analysis across information collections is not yet available.
Constraints on reputation and credibility.
Incomplete outputs in the Business Process activities.
Unexpected Halt/Ending of the Business Process.
Loss of Sales for incomplete information and unavailability of external sources in the user’s transaction.
Incompleteness/ Timeliness
3
Inconsistencies in External Sources Having inconsistent information
from third party sources, affect the overall user transaction and creates Security Breaches.
Access to information may conflict with requirements for security, privacy, and confidentiality.
Vulnerabilities with PCI Data Security
Standards. Loss of Sales for inconsistency with
external sources in the user’s transaction.
Inconsistency
4
Missing confirmation/validation notifications to the user The system generates errors that
prevent the user from receiving important notifications of the transaction procedure.
Systemic errors in information production lead to lost information.
Distributed Heterogeneous Systems lead to inconsistent, formats and values.
Customers’ dissatisfaction due to wrong output in the Business Process missing data and incoherent service delivery.
Customers’ affected monetarily to correct incoherent outputs of the process.
Constraints on reputation and credibility.
Incompleteness
5
Data Duplication Error The system accepts data
duplicated from a same user when attempting to overwrite an already done transaction.
Large Volumes of stored information difficult to access.
Inconsistency on user's information processing.
Same activities of the Business Process with repeated outputs.
Inaccurate data presented to the user when overwriting duplicated information in the system.
Storage capacity misusage due to amounts of duplicated data.
Inconsistency
Table 4.1 –Error Types Data Quality Breaches and Failures Classification
28 Politecnico di Milano - 2015
4.2. Definition of a Cost Based FMEA
In order to evaluate the risks associated with the Data Quality errors during the design of a business
process and to determine which are the most important preventive/corrective actions to be taken,
it is necessary to measure the Risk Priority Number (RPN) which is the result of multiplying the
severity of each failure mode (error type) by its probability of occurrence and detection.
It would be also relevant to perform a cost analysis, selection and integration of the costs
associated with Data Quality to create a cost based FMEA model and emphasize the implication of
introducing these new variables to the adaptation of the original approach.
Generally the cost variables would be introduced as an adaptation of the costs theory analyze in
previous chapters, while also using as a reference the research document of “A Cost-Based FMEA
Decision Tool for Product Quality Design and Management”19. In the following sections the model
would be developed to analyze and integrate the different types of costs and formulate a Cost
analysis to integrate in the model.
4.2.1. Introduction of Cost Quality Factors
The original FMEA model is a quality/reliability methodology in the stage of product design in
manufacturing processes, however under the scope of the current document the objective is to
adapt the model in the design phase of business processes. Furthermore, it is important to
emphasize the re design of the methodology by making an improvement of the original model
introducing quality cost factors for the FMEA evaluation. In this context, the new cost based FMEA
with the orientation of Data Quality analysis in Business Processes would be used to identify,
prioritize, and try to reduce the occurrence of possible failures modes (error types) in the activities
of a process before its output reaches the final user, the model could also be used as a prevention
tool to reference areas of weakness to apply process re-engineering, to create preventive plans, to
reduce the occurrence of failure modes in the execution of the process and to estimate the risk of
specific causes with regard to the possible failure modes.
Another aim of the present document is to integrate the traditional FMEA criteria parameters with
different quality cost factors with the purpose of evaluating the impact of having poor data quality
in a business process, in this particular case and based on recent research papers on the subject of
matter, “ the cost of poor quality, which will be used to determine the effects of quality failure and
19 Wang, Michael H. "A cost-based FMEA decision tool for product quality design and management." Intelligence and Security
Informatics (ISI), 2011 IEEE International Conference on. IEEE, 2011.
29 Politecnico di Milano - 2015
evaluate the severity, is bounded only to internal failure costs”, therefore the approach of these
type of costs would be consider to formulate the cost based FMEA. In the following sections it
would be presented an analysis on how to integrate all the different types of failures and how do
they fit into the category of Internal Failure Costs to finally submit a costs system into the model.
4.2.2. The Proposed System of Cost Evaluation
In general and according to Juran20 “the cost of quality is the sum of all the incurred costs by a
company in preventing poor quality”. For the purpose of integrating and re defining a cost based
FMEA model, this document improves the traditional scheme by making use of Feigenbaum
statement with a slight adaptation considering cost of quality as the sum of all associated costs to
analyze Data Quality, in this case incorporating solely the following three categories: Internal
Failure Costs, Prevention/Control Costs and Detection Costs.
a. Internal Failure Cost: The cost associated with not meeting customer requirements in
executing properly the business process activities. Includes all of the costs resulting from
poor quality product or service, which is found before the process is delivered to the user, it
will be used to identify the severity value and re-defined as:
Internal Failure Cost = Rework Cost +
Reprocessing Cost +
Lost of Information Cost +
Overtime Cost +
Opportunity Cost
b. Prevention/Control Cost: Cost resulting from activities undertaken in verifying, checking
and evaluating in order to prevent poor quality and to ensure that failures do not occur
during the execution of the business process activities.
c. Detection Cost: Associated cost to the implementation of detective control techniques in
terms of Data Quality as an improvement of the capability to detect a failure mode in the
Business Process operation.
20 Juran, J., & Godfrey, A. B. (1999). Quality handbook. Republished McGraw-Hill.
30 Politecnico di Milano - 2015
In order to clearly express this approach, all the different types of failures involved in Data Quality
would be mapped one-to-one within the category of Internal Failure Costs and incorporated in the
evaluation of the Severity Criteria parameter of FMEA as the incurred internal failure costs to apply
prevention or optimization techniques to reduce severity. Additionally, the Prevention/Control
Cost is included in the evaluation of the Occurrence Criteria Parameter as the incurred costs to
place mechanisms and mitigate the likelihood that the potential failures will occur. Finally, the
Detection Cost is evaluated under the Detection Criteria as the resulting cost from implementing
procedures and carrying out activities to improve the probability of detecting a failure. The
relationship of these costs and the FMEA ranking criteria is displayed in the following table.
FMEA Ranking Parameter Associated Cost
Severity Criteria Internal Failure Cost
Occurrence Criteria Prevention/Control Cost
Detection Criteria Detection Cost
Table 4.2 – Associated Costs with FMEA parameters
As a matter of fact, the proposed system suggests a subjective qualitative cost evaluation with the
objective of performing an overall assessment of all the involved Costs in Data Quality, their
classification, impact and how do they affect in one way or another the improvement of each
ranking criteria parameter (severity, occurrence, detection) of the FMEA model to provide the
calculation of the RPN number. The Cost Based FMEA will serve as guidance of interpretation and
assessment by the final user to determine the cost implications of implementing improvement
actions to increase data quality by improving the ranking value of each parameter to recalculate
and obtain a better RPN Number, if the user decides to apply the improvement techniques into the
system.
In this way, the user will evaluate a trade-off scenario comparing the balance of increasing data
quality versus its costs implications and it would be up to the user decision’s to compare both
variables depending on the importance and criticality of improving Data Quality and the monetary
investments in carrying out the procedures and improvement techniques to accomplish the
objective.
Within this scope, once the effects of each failure mode have been determined to evaluate its
severity, the model will then integrate the evaluation of the cost of each effect and the associated
31 Politecnico di Milano - 2015
internal failure costs, subsequently for the prevention/control costs based on the occurrence
criteria ranking and the detection costs depending on the specific mechanisms that could be
implemented to detect a failure in the design phase of the business process.
The proposed system will categorize only these three types of costs on a scale from (Low to Very
High) depending on the ranking criteria of the parameter under evaluation. The cost can also be
interpreted graphically making an adaptation of a literature review by Eppler and Helfert where
they propose a classification framework and a cost progression analysis to support the
development of quantifiable measures of data quality costs. “Cost classifications based on various
criteria can be applied to the data quality field in order to make its business impact more visible” 21,
however it is important to define the optimal data quality efforts in order to maintain and
guarantee acceptable and consistent levels of Data Quality.
Taking this into consideration, it would be useful to display the variables in a graphical analysis
involving two curves representing the costs inflicted by poor quality data and the costs of
maintaining high data quality, respectively. As a result, the costs of assuring data quality is a linear
relationship between (Prevention Control Costs + Detection Costs), while on the other hand the
Internal Failure Costs correspond to the costs inflicted by poor data represented in a separate
curve. The FMEA Total Failure Cost associated with data quality in this case would be the
aggregated cost of the two explained curves.
With this approach the user can have a general idea of the trade-off situation that is presented
when analyzing the possible improvements techniques and efforts of having an optimum level of
Data Quality versus the implied Total Costs, on one hand by the inflicted costs of poor data quality
and the costs of assuring an ideal optimum level accordingly to the user needs.
21
Eppler & Helfert, et al 2004; Kim & Choi et al 2003
Data Quality
32 Politecnico di Milano - 2015
The following graphic, displays a logical perspective on the estimation of the optimal data quality
maintenance efforts based on the variables.
Figure 4.1 – Total Costs Incurred By Data Quality on the Company22
For the standardization of the final cost model and based on previous state of the art costs theory,
it is important to mention that the English TIQM Cost are already included and mapped within the
category of Internal Failure Costs as it follows.
22
Haug, A., Zachariassen, F., & Van Liempd, D. (2011). The costs of poor data quality. Journal of Industrial Engineering and
Management, 4(2), 168-193.
FMEA Failure Total Costs
Internal Failure
Costs Inflicted by poor Data Quality
Costs of assuring Data Quality
(Prevention+Detection Costs)
Running Costs
Optimum Level
Data Quality
33 Politecnico di Milano - 2015
Internal Failure Costs
Rework Cost
Reprocessing Cost
Loss of Information
Cost Overtime Cost
Opportunity Cost
English TIQM Costs
(Information Scrap and Rework Costs) Data verification and
rewrite costs. Data correction costs. Workaround costs Redundant data handling
and support costs.
(Process Failure Costs) Recovery costs of losing
information and customers’ impact.
Unrecoverable costs. Liability and exposures
costs.
(Lost-missed opportunity Costs) Lost opportunity costs. Missed opportunity costs. Lost shareholder value
costs.
Table 4.3 – English TIQM Costs and Internal Failure Costs mapping
It is also necessary to perform a mapping between the different types of failures invo presented in
data quality and how do they match into the category of Internal Failure Costs. This, with the
purpose of generalizing the cost system and visualize the integration of the different failures and
their association within the Internal Failure Cost category of the cost based FMEA model equation.
To do so, the following table shows a convenient way to map the relationship between Failures and
Internal Failures Costs, this correlation is performed on the basis of categorizing all the different
types of failures and how do they best relate with the costs based on the implications.
34 Politecnico di Milano - 2015
Internal Failure Costs
Rework Cost
Reprocessing Cost
Loss of Information
Cost Overtime Cost
Opportunity Cost
Failures In Data Quality
Data Production
Failures
Wrong Business Transaction Result
Payment Transaction
Failures
Loss of Sales due to Inaccurate Data
Low Efficiency in the Process
Faulty Data
Data Duplication Failure
Problems in Data Processing Times
Table 4.4 – Failures and Internal Failure Costs mapping
4.3. Advantage of re-designing the FMEA Ranking Criteria
The FMEA methodology has been widely adapted for providing reliability and process improvement
in different industry areas. It has been particularly used as a model to evaluate process quality by
identifying on an early stage of the design of a process the possible failure modes that could cause
deficiencies in the development of the activities involved in the execution of a process.
In the general scope of the current document, another main objective would be that one of taking
the advantages of the original model and re-adapting the methodology to provide improvements in
the design phase of information systems business processes and the capability to integrate recent
research in Data Quality Analysis to perform an assessment of potential error types (failure modes)
that could affect the performance of the business process. By providing focus on problem
35 Politecnico di Milano - 2015
prevention, early identification, prioritization of activities and improvement actions for process re-
engineering, there would be an increment on user satisfaction regarding Data Quality.
Furthermore, the new model adaptation would be a cost based FMEA including the associated
costs to failures occurrence, detection and prevention mechanisms and the cost of applying
improvement actions. These new variables would be strategically introduced into the model and
the step by step of the methodology re design will be explained in the following sections.
4.4. Mapping and translating the possible Variables
As seen in the previous chapter with the find outs and overall description of the FMEA
methodology, it is a framework to analyze reliability and quality problems in the development
phase of manufacturing processes, and therefore, the model must be adapted from its original
context to the subject of study in this document to assess risk concerning Data Quality in business
processes and information systems. In this order, the identification, introduction of new variables
and correlation of the existing ones must be performed to adapt and translate the key concepts of
the model as follows.
Quality Production Context Data Quality Context
Process / Sub-process Activity Identification
Potential Failure Mode Error Type
Data Quality Breach
Failure Category Quality Dimension
Potential Business Consequence of Failure Failures
Recommended Actions Improvement Actions
Table 4.5 – Variables Mapping to Data Quality context
From now on, the variables to be used for the new definition of the FMEA model its adaptation and
the evaluation of the different parameters as severity, occurrence, detection, would be the ones of
the Data Quality Context for the FMEA Methodology Re-definition and Ranking Criteria.
As discussed, the main objective of implementing a FMEA methodology is to analyze the possible
errors and failures in the design, in this case, of a Business Process and determine the Data Quality
subjective evaluation of the information that is generated and received through the whole
36 Politecnico di Milano - 2015
transaction of a Business Process, to then proceed with the implementation of key check-up blocks
and improvement actions to increase Data Quality that meets customer needs and expectations.
From the perspective of a re-design of the original FMEA methodology and its adaptation into the
context of Information Systems, the re definition of the model provides a critical analysis of the
failure modes associated to different errors in the process. Generally the technique of the new
model includes equally the analysis of occurrence and detection probabilities, together with the
severity criteria of the events to develop the risk priority number (RPN) and apply the ranking of
corrective action considerations.
The new FMEA proposed is on one hand, a quantitative model for the use of a statistical algorithm
to calculate the value of the RPN variable, while on the other hand being also a qualitative model
based on the subjective evaluation and ranking criteria of the different main parameters as
severity, occurrence and detection.
4.5. Severity Ranking Criteria
The severity levels in the re definition of the model are influenced by the type of error and the
associated failures (consequences of an error) to evaluate in a determined ranking degree the
potential effects for each error type from now on to be named as (failure mode). Table - 3 serves
as a reference to formalize the severity ranking criteria based on the nature of the failure, its
impact to the customer operations and an overall description of the possible effects in the system.
The Severity Criteria is rank on a scale from 1 to 10, being 1 the lowest severity or impact into the
customers’ operations and 10 the highest value of severity. The severity ranking description also
integrates the evaluation of possible prevention/optimization techniques that could be applied to
reduce the severity of the failure according to its nature, while at the same time categorizing the
Internal Failure Costs associated to this procedure and providing a ranking on a scale of low to very
high costs. The Severity Criteria is described as it follows.
37 Politecnico di Milano - 2015
Table 4.6 – Severity Ranking Criteria
4.6. Occurrence Ranking Criteria
The frequency of occurrence of a particular error type is another key variable to calculate the RPN
parameter. The occurrence criteria in the re design of the model is defined as the probability that a
failure will occur during the execution time interval of the business process activities. The individual
error type occurrence probabilities are logically defined and categorized in different levels on a
scale from 1 to 10, being 1 the lowest probability of an individual error type to occur and 10 the
highest probability. The recommended occurrence ranking criteria for the new FMEA model
includes a general description of the overall probability of occurrence of each failure mode and an
evaluation to implement accessible mechanisms as prevention or control techniques to try to
improve and reduce the frequency of occurrence of the event depending on the probability value,
while also integrating the associated Prevention/Control Costs providing a ranking on a scale of low
to very high costs of the possible techniques applied, if any. The Occurrence Criteria is described as
it follows.
38 Politecnico di Milano - 2015
Table 4.7 – Occurrence Ranking Criteria
39 Politecnico di Milano - 2015
4.7. Detection Ranking Criteria
This section describes the adaptation of the Detection Ranking Criteria as another key variable in
the re definition of the model. In the original FMEA model, the detection ranking concerns the
probability that a failure in the manufacturing process of an item can be detected, while in the
context of a Data Quality analysis the Detection Ranking criteria must be re formulated and based
on an assessment of the probability that the failure mode (error type) will be detected with the
possibility of implementing preventive and detective controls in the system. The probability of
detection unlike the other variables is ranked in a reverse order. The scale will start from 1
indicating a very high probability that a failure mode would be detected before reaching the
customer; a number 10 will then indicate a low almost zero probability that the failure mode will be
detected; therefore, the individual failure mode would be experienced by the customer. The
ranking criteria gives a description of how affordable are the Detection Costs to implement
preventive and detective control techniques in terms of Data Quality as an improvement of the
capability to detect a failure mode in the Business Process operation. The table 5 ranks the re
formulated Detection Criteria for the model as it follows.
Table 4.8– Detection Ranking Criteria
40 Politecnico di Milano - 2015
4.8. Implementation of Improvement Actions
Since the RPN number that comes from the previous assessment for every failure, includes the
associated costs according to the different Data Quality Dimensions, these RPN numbers proposed
until this stage as well as the ranking classification, would be meaningless without taking into
account the improvements actions.
In order to classify the most important failures in which the assessment team must be focused on,
when reducing the Data Quality Risks related to some business process, some improvement actions
could be taken so that there will be a final RPN indicator reclassified, including the effect of the
improvement actions and therefore the final ranking.
These improvement actions were particularly selected with the criteria of performing maintenance
activities (routine activities) that are used in a corporate environment and can reduce or mitigate
the impact on different factors when the Data Quality Dimensions are affected. Another important
point to be aware of when selecting the improvement actions is that these should not affect the
business process performance (Cappiello et al 2013).
The improvement actions will have effects on either the occurrence or detection parameters, as the
severity criteria was constructed by taking as a base the internal failure costs which are more linked
with the default costs or initial costs related to poor data quality product or service.
Data related improvement activities and process related improvement activities are included as
detective or preventive controls according to the FMEA methodology. In the case of
preventive/improvement actions, it would be possible to define them as:
Data Enrichment: It is about fixing and/or enhancing the current data by retrieving values
from reliable external data sources.
Data Cleaning: It is about comparing current data with the real or correct value thus
changing the current data with the appropriate reliable value.
On the other hand, the improvement actions associated with detective improvement controls are
as follows:
Data Monitoring: It is about all the procedures used in the verification of the data in a way
that these data complies with certain rules (special or specific requirements).
Re-execution: This improvement action is about having procedures, which automatically
might detect certain requirements in the data.
41 Politecnico di Milano - 2015
Workaround: It is about a contingent method used temporally when the planned or used
method is not effective to accomplish the goals or the activities expected.
Furthermore, the following table displays the relationship of the possible error types with the
different types of improvement activities.
ERROR TYPES – IMPROVEMENT ACTIONS ERROR TYPE DQ DIMENSION AFFECTED IMPROVEMENT ACTIVITIES
1
Data Entry Processing and Inaccurate Information in the System The system does not process correctly the information inserted
or selected by the user in the web interface causing a Halt/ Fault in the process.
This leads to wrong or inaccurate information presented by the GUI.
Accuracy Objectivity Consistency Concise Representation
Data Cleaning Data Monitoring
2
Misalignment with External sources The system can get stuck in a loop when linking and validating
information of external web sources, like information provide from Banking/Online Payment portals.
Value-Added Accessibility Interpretability
Re - Execution Data Enrichment Workaround
3
Inconsistencies in External Sources Having inconsistent information from third party sources, affect
the overall user transaction and creates Security Breaches.
Access Security Accessibility
Data Enrichment Workaround
4
Missing confirmation/validation notifications to the user The system generates errors that prevent the user from
receiving important notifications of the transaction procedure. Completeness Relevancy Value-Added
Workaround
5
Data Duplication Error The system accepts data duplicated from a same user when
attempting to overwrite an already done transaction.
Amount of Data Ease of Understanding Consistency Interpretability Objectivity
Data Cleaning Data Monitoring
Table 4.9 – Error Types and Improvement Actions
42 Politecnico di Milano - 2015
4.9. Recalculation of RPN parameter after Improvement Actions
After the improvement actions are formulated for each failure mode, there are three possibilities:
Doing nothing or leaving the RPN indicator as it is, recommend an improvement action either as a
detective control or as a preventive control, and the last case by implementing improvement
actions for both of the controls (detective and preventive controls).
According to the formula of the RPN calculation (Severity*Occurrence*Detection), for the
recalculation effects, the severity number will remain constant as lons as this number is associated
or it has been mapped with the internal failure cost, but the occurrence probability and the
detection number will change according to the improvement actions performed. (e.g. one or two
notch up or one or two notch down depending on the analysis done)
All in all, after the improvement actions have been performed, the RPN number will be reduced
which is in line with the reduction of the total Data Quality Risk.
4.10. Guidelines to carry out a Data Quality Risk assessment using FMEA
In order to evaluate the risks associated with the Data Quality errors during the analysis and to
determine which are the most important corrective actions to be taken, it is necessary to measure
the Risk Priority Number (RPN) which is the result of multiplying the “severity” of each failure by
the probability of “occurrence” of the failure by the probability of “early detection” of the failure
(the likelihood of detecting the problem before it harms the system or the subsequent processes).
The following is the guideline and the subsequent steps to be followed a as general procedure on
how to adapt and implement the FMEA methodology in a Data Quality Risk Assessment context.
43 Politecnico di Milano - 2015
FMEA Guidelines to evaluate Risk and Data Quality in a Business Process
Identify the businesses, services and key activities of the company to be under the analysis.
Describe the main (Error Type) generated in the process or activity within the business process.
Look for the possible (Data Quality Breaches) related to error type from previous step.
Classify the failure under the various failure categories available.
Identify the effects of every failure and if feasible its effects on the business /service. Please note that each failure can have
more than one effect.
Refer to the severity chart and choose the relevant number to rank the effect of the failure.
Identify and rank the ocurrence criteria. Please note that each failure mode can have more than one cause.
Refer to the probability chart and choose the number that is more relavant to the frequency of occurence.
1
8
7
6
5
4
3
2
44 Politecnico di Milano - 2015
Table 4.10 – Guidelines to implement FMEA methodology in a Data Quality context
List down the current controls . Analyze the difference and categorize the controls as preventive and detective controls as
best corresponds. Write each type of control in separate columns.
Refer to the detectability chart and choose a relevant number to categorize the efectiveness of the controls.
The user can now see the RPN calculated for a failure mode for each Data Quality Error.
Allocate the possible errors generated (error types) in the BPMN diagram.
Link the error types with the improvement actions based on the Quality Dimensions.
Allocate the tentative places for quality checks (i.e. places where improvement actions could be performed)
The user will see the final RPN number recalculated once the improvement actions have been implemented.
Decide and compare the feasibility of the improvement actions based on the final ranking based on the final RPN recalculated.
9
16
15
14
13
33
12
11
10
0
45 Politecnico di Milano - 2015
5.
This final chapter attempts to demonstrate the cost based FMEA model capabilities in a real context
by analyzing a regular business process transaction as a case of study, starting from its definition,
the BPMN diagram with the activities and tasks of the process. The analysis includes the
identification of the possible Data Quality breaches, errors and failures in the different stages of the
process, the implementation of improvement actions and a simulation with the correlation of all the
variables involved in the execution of the methodology to provide the Data Quality assessment.
5.1. Creating a Business Process Prototype Model
In order to facilitate the comprehension and analysis of Data Quality Theory in the context of
Business Processes, an initial practical exercise would be the definition and mapping of a simple
prototype model (i.e. Business Process of Booking a flight), that serves as a basic model to analyze
the correlation and classification of possible Data Quality breaches, errors and failures mapping
with involved costs and advice of improvement actions implementation.
There are two main actors involved in the basic process of booking a flight: the user and the
website both represented in two separate pool lanes. The process starts from the user side who
attempts to book a flight on an airline company website, the first step by the user is open the
airline’s website, browse and scroll for the booking options in the web portal, initialize the booking
options and complete the requested input information as flight booking Round-trip or One-way and
depending on the selected option insert the flight information criteria regarding departing and
returning dates and number of passengers travelling adults or children.
Then, the process continues by the website processing the collected information and displaying and
set of options to the users according to the selected criteria, the user can select between choosing
one of the presented options for the most suitable one, or the user can decide to terminate the
process and cancel the search if none of the solutions match the corresponding criteria. If the user
46 Politecnico di Milano - 2015
continues with the process and chooses one of the flight options, the website displays a checkout
screen confirming the selection performed by the user and requests all personal details of
passenger(s) information a link to go backwards in the process or continue with the online payment
and checkout information. If the user decides to continue forward, the website displays a
confirmation of the whole information details inserted by the user prior proceeding with the credit
card validation and security credentials. If the transaction is not successful the website takes back
the user to the payment details information to correct the inserted data if any mistakes or to try a
different payment source and proceed again with the process, if on the other hand the transaction
is approved, the purchase procedure is complete, the flight is booked and the user receives an
email with the transaction confirmation and flight details, being this the last step and the end of the
process. The Business Process Modeling Notation (BPMN) is represented graphically in the
following diagram using Bizagi software.
Figure 5.1 –BPMN of Booking a Flight
47 Politecnico di Milano - 2015
5.2. Definition of Improvement Actions
In order to define a tentative list of improvement actions, the equipment or the analyst in charge of
applying this methodology must think at the starting state of the business process under
evaluation, and select the possible places (by doing a BPMN diagram) where the tentative errors
could be presented. The following is a Business Process diagram, which displays the possible
allocation of errors that could be generated (red error cycles) when booking a flight is performed.
Figure 5.2 – BPMN Error Types
It is important to notice that some errors could appear more than one time in different places, as
the flight booking process is executed.
Once this stage has been finished, the relationship among the possible errors and the improvement
actions must be done by keeping in mind the Data Quality Dimensions affecting every error in the
context of the Data Quality Improvement activities. In other words, the classification of DQ
improvement activities depends on the data quality dimension affected by the error type23.
23
Cappiello, Cinzia, et al. "An Approach To Design Business Processes Addressing Data Quality Issues." ECIS. 2013.
48 Politecnico di Milano - 2015
5.3. Allocation of improvement actions in the Business Process
Once the improvement actions have been defined, the possible places where these improvement actions
can take place can be drafted in the business process in those places where the possible errors were, by
keeping in mind the effects of the improvement actions on the errors (i.e. if an improvement action can deal
with two or more errors at the same time). These allocations of improvement actions are temporary and
will be affected in the end by the RPN (risk priority number) recalculated.
The following diagram represents the places where the improvement actions, under the name of quality
checks (Quality Checks activities) can take place.
Figure 5.3 – BPMN Improvement Actions Locations
The most important here in this step, is to be aware that the option of correcting all the errors is
not feasible, because it does not make any sense for any business process in terms of time and
effort and finally it would not be a wise decision. Correcting all the errors would mean assuring a
Data Quality level of 100% by making huge investments (like changing the software of a company
for instance) and thus sacrificing a part of the budget that could be allocated for other important
purposes.
49 Politecnico di Milano - 2015
5.4. Cost Based FMEA and Data Quality Analysis in a Business Process
After the identification of the possible places of the improvement actions, the development of the
methodology can start by defining and making an analysis of the potential failures presented in the
business process (booking a flight) related to Data Quality which afterwards will be collected in a
format like this:
No. POTENTIAL FAILURE
MODE ERROR TYPE
DATA QUALITY BREACH
QUALITY DIMENSION AFFECTED
FAILURES
1 Data Entry Processing
and Inaccurate Information in the System
The system does not process correctly the
information inserted or selected by the user in the
web interface causing a Halt/ Fault in the process. This
leads to wrong or inaccurate information presented by the
GUI.
Automated content analysis across
information collections is not yet available.
Distributed Heterogeneous Systems
lead to inconsistent, formats and values
Inaccuracy / Inconsistency/
Invalidity
• Data Production Failures Inaccuracies in the business process activity, low business metrics on Service Delivery objectives and customers’ dissatisfaction due to wrong data provided by the system. • Loss of Sales due to Inaccurate Data Loss of sales for the company due to inaccurate data presented to the user. It includes constraints on reputation and credibility.
2 Misalignment with External sources
The system can get stuck in a loop when linking and validating information of
external web sources, like information provide from Banking/Online Payment
portals.
Multiple sources of the same information
produce different values. Automated content
analysis across information collections is
not yet available.
Incompleteness/timeliness
• Wrong Business Transaction Result Incomplete outputs in the Business Process activities. Accessibility constraints when obtaining the output or unexpected Halt/Ending of the Business Process. • Loss of Sales due to Inaccurate Data Incomplete information and unavailability of external sources in the output of the user’s transaction.
3 Inconsistencies in External Sources
Having inconsistent information from third party sources, affect the overall
user transaction and creates Security Breaches.
Access to information may conflict with requirements for
security, privacy, and confidentiality.
Inconsistency
• Payment Transaction Failures Due to vulnerabilities with PCI Data Security Standards. • Loss of Sales due to Inaccurate Data Inconsistency with external sources in the output of the user’s transaction.
50 Politecnico di Milano - 2015
Table 5.1 – Cost based FMEA methodology variables analysis
In the table 5.1 it can be seen the application of the methodology applied to a business process,
once the main activity or business process has been established (i.e. booking a flight). On the first
column, the potential failure mode is identified while on the next column the error type is
explained in more detail. Afterwards, on the next column the Data Quality Breach related to the
error type appears along with the Data Quality Dimensions affected on the next column (following
the mapping in table 4.1). On the next column, the failures are explained and identified.
4 Missing
confirmation/validation notifications to the user
The system generates errors that prevent the user from
receiving important notifications of the
transaction procedure.
Systemic errors in information production lead to lost information.
Distributed Heterogeneous Systems
lead to inconsistent, formats and values.
Incompleteness
• Faulty Data Management does not realize that this has consequences for the company’s overall profit potential when wrong decision making based on incorrect business process processing • Problems in Data Processing Times Customers’ affected monetarily due to long waiting times for incoherent outputs of the business process. • Wrong Business Transaction Result Customers’ dissatisfaction due to wrong output in the Business Process missing data and incoherent service delivery. It also causes constraints on reputation and credibility.
5 Data Duplication
The system accepts data duplicated from a same user when attempting to overwrite an already done transaction.
Large Volumes of stored information difficult to
access Inconsistency
• Data Duplication Failure Causing inaccurate data presented to the user when overwriting duplicated information in the system. • Low Process Efficiency Storage capacity misusage due to large amounts of duplicate data. Inconsistency on user's information processing by the consequence of having repeated information in re processing a business process activity. • Faulty Data Due to data duplicated in the system from the same user, management is not aware from the costs implication in exceeding the storage software infrastructure of the company.
51 Politecnico di Milano - 2015
It is important to point out that every potential failure might have more than one effective failure
at a time.
No. POTENTIAL
FAILURE MODE
SEVERITY CRITERIA
Sev
OCCURRENCE CRITERIA Pro
b
DETECTION CRITERIA
Det RPN
Severity Ranking
Internal Failure Cost
Occurrence Ranking
Prevention Control
Cost
Detection Ranking
Detection Cost
1
Data Entry Processing and
Inaccurate Information in
the System
Very High Very High 10
Low: Relatively few failures 1 in
15,000
Low 3 (Moderate
probability of detection)
High 6 180
2 Misalignment with External
sources Moderate Medium 6
Moderate: failures 1 in
400 High 7
(High probability of
detection) Medium 5 210
3 Inconsistencies
in External Sources
High High 8
Low: Relatively few failures 1 in
150,000
Low 3 (High
probability of detection)
Medium 3 72
4
Missing confirmation/vali
dation notifications to
the user
Moderate Medium 7
Low: Relatively few failures 1 in
15,000
Low 3 (High
probability of detection)
Medium 4 84
5 Data Duplication Moderate Medium 6 Moderate:
failures 1 in 400
High 8 (Moderate
probability of detection)
High 7 336
Table 5.2 – Cost Based FMEA Criteria
As it can be seen from table 5.2 which is the extension of table 5.1 (excel file) the costs associated
to the FMEA are included in every of the criteria as these were mapped in table 4.2. For the
severity criteria, which is related to the internal failure cost, the user should select the appropriated
number according to the table 4.6. In the case of the occurrence criteria, the user must establish a
probability of occurrence of the failure (taking as a reference some internal metrics like KPI’s) and
assigning a ranking from low to high as seen in table 4.7. A similar process have to be done with the
52 Politecnico di Milano - 2015
detection criteria, but this time analyzing the probability of failure detection assuming that the
implementation of preventive and detective controls in the system are possible. Once these
numbers are selected, a preliminary RPN became known by multiplying the three criteria numbers.
No.
IMPROVEMENT ACTIONS
Parameters Recalculation (After Improvement Actions)
RPN Recalculation
Preventive Controls Detective Controls
1 Data Monitoring
(High probability of detection) New DET 3 90
2 Data Enrichment
Re - Execution Workaround
(Occasional probability of occurrence)
New PROB 4
(High probability of detection) New DET 3
72
3 Data Enrichment Workaround
(Unlikely probability of occurrence) New
PROB 1
(Very High probability of detection) New DET 1
8
4 Workaround (Very High probability of detection)
New DET 2 42
5 Data Cleaning Data Monitoring
(Ocassional probability of occurrence) New PROB 4
(High probability of detection) New DET 4
96
Table 5.3 - Cost Based FMEA Improvement actions and RPN re-calculation
53 Politecnico di Milano - 2015
After the allocation of possible errors have been made (5.2) and the proposition of improvement
actions have been defined, by having as a base the Data Quality Dimensions affected by each error
type as explained in the methodology (Chapter 4) the improvement actions can be classified as
preventive or detective controls. Depending whether the improvement actions belongs either to
detective or preventive control or both in some cases, this will decrease the number of occurrence
criteria or detection criteria or both numbers, therefore the RPN will be recalculated and the final
ranking of improvement actions will be showed.
The new RPN recalculated will result from multiplying the three ranking criteria numbers after the
improvement actions have been applied or in other words after the preventive and detective
controls are performed. The new RPN number will change because the improvement actions will
reduce the final number, which mean a reduction in the total risk. In order to clarify this concept,
as it can be seen from the first potential failure the initial values were 10 for severity, 3 for
occurrence and 10 for detection and the first RPN is 180. After the data monitoring (which is an
improvement action taken as a detective control, the new RPN number will be half (i.e. 90) because
the detective control will reduce the detective criteria from high to medium (i.e. from 6 to 3).
5.5. Decision criteria about which improvement actions might be exercised
Once this new scenario of RPN after improvement actions have been applied to the business
processes theoretically, it is crucial to evaluate which improvement actions are worth doing in a
real scenario. By performing all the improvement actions, the business process might achieved a
Data Quality of 100% which is in most cases, unrealistic and unnecessary because the cost of
implementing these improvement actions are sometimes very high compared to the benefits of
implementing them and therefore this could not be an optimal decision.
Having said this, as the increase in Data Quality must be a multi-criteria evaluation, three
parameters to be consider in the execution of the improvement actions will serve to the
assessment team or evaluator that is performing the improvement actions:
1) Up to the user decision (user’s priority): this parameter will take into account the point of view
of the user, according to the environment in which the business processes are occurring (i.e. the
most crucial activity, the most expensive activity, etc.)
54 Politecnico di Milano - 2015
2) RPN, which is the risk priority number of the model (ranking order): the user of course have the
possibility to establish a threshold according to the ranking obtained after theoretically applying the
improvement actions.
3) Total investments associated for every improvement action: this parameter has the aim to
measure the worthiness of every improvement actions having into account the basic accounting
principal of the Net present value (NPV). The assessing team should calculate or estimate the cost
of performing such improvement actions (negative cash flows) compared with the benefits of
performing them (positive cash flows) for the different activities within the business process. In the
end, following the principle of NPV>0, will determine if these activities will worth doing.
55 Politecnico di Milano - 2015
6.
Failure Mode and Effect Analysis is a powerful methodology, coming originally from the
manufacturing industry and used mainly in quality issues, but then further applied in different
engineering fields.
This research, explores the possibility of successfully adapting the FMEA methodology to Data
Quality matters in business processes, by proposing a Data Quality Risk evaluation model involving
the Quality Dimensions related to information quality issues and the typical failures that can
emerge inside of a company when Data are exchanged.
The cost component included in the model, can be used as a helpful tool for decision-making
processes within the enterprises, but it is also an important component when assessing the Quality
of Data for products or services. Moreover, the cost component includes the internal cost
parameter, which is used to measure the worthiness of the improvement actions that can be
executed to reduce the negative impact of a failure.
The proposed Data Quality Risk evaluation model is supported through the development of a
typical business process activity like the process of booking a flight. At the beginning, the business
process is modelled under the BPMN methodology along with the possible allocation of failures and
improvement actions as well. Attempting to determine which kind of improvement actions are
worth to take, a summary of the methodology a FMEA cost based model is constructed in an excel
file following the instructions described in Chapter 4. The output of the model is a suggested
ranking list of improvement actions that can be executed to mitigate or eliminate the failures in a
business process. Suggestions about how to decide which improvement actions are worth doing are
described, taking as a base three parameters such as the point of view of the user, the RPN number
provided by the model and a link between the costs of doing the improvement actions compared to
the benefits through the methodology of Net Present Value.
Further developments regarding the optimal point of Data Quality improvements in business
processes must be object of research and are beyond the scope of this work.
56 Politecnico di Milano - 2015
7.
1. Andreja Kovacic ; Business renovation: business rules (still) the missing link.,Business Process
Management, pages 158–170, 2004.
2. Ballou, D., Madnick, S., & Wang, R.. (2003). Special Section: Assuring Information Quality.
Journal of Management Information Systems, 20(3), 9–11.
3. Barbara D. Klein and Donald F. Rossin, editors. Fifth Conference on Information Quality (IQ 2000). MIT, 2000.
4. Beverly K. Kahn, Diane M. Strong, and Richard Y. Wang. 2002. Information quality
benchmarks: product and service performance. Commun. ACM 45, 4 (April 2002), 184-192.
5. Carlo Batini, Cinzia Cappiello, Chiara Francalanci, and Andrea Maurino. 2009. Methodologies
for data quality assessment and improvement. ACM Comput. Surv. 41, 3, Article 16 (July
2009).
6. Cappiello, C., Caro, A., Rodriguez, A., & Caballero, I. (2013). An Approach To Design Business Processes Addressing Data Quality Issues. In ECIS (p. 216).
7. Cinzia Cappiello, Chiara Francalanci, and Barbara Pernici. 2004. Data quality assessment from the user's perspective. In Proceedings of the 2004 international workshop on Information quality in information systems (IQIS '04). ACM, New York, NY, USA, 68-73.
8. Diane M. Strong, Yang W. Lee, and Richard Y. Wang. 1997. 10 Potholes in the Road to
Information Quality. Computer 30, 8 (August 1997), 38-46.
9. Diane M. Strong, Yang W. Lee, and Richard Y. Wang. 1997. Data quality in context. Commun.
ACM 40, 5 (May 1997), 103-110.
10. Deb, S.; Ghoshal, S.; Mathur, A.; Shrestha, R.; Pattipati, K.R., "Multisignal modeling for
diagnosis, FMECA, and reliability," in Systems, Man, and Cybernetics, 1998. 1998 IEEE
International Conference on , vol.3, no., pp.3026-3031 vol.3, 11-14 Oct 1998
57 Politecnico di Milano - 2015
11. Edward W. Gore Jr, (1999) "Organizational culture, TQM, and business process
reengineering: An empirical comparison", Team Performance Management: An
International Journal, Vol. 5 Iss: 5, pp.164 - 170
12. Elizabeth M Pierce, Information Quality, Volume 1 of Advances in Management Information
Systems Series. M.E.Sharpe, pages 3-32, 2005.
13. Eppler, M., & Helfert, M. (2004, November). A classification and analysis of data quality costs. In International Conference on Information Quality (pp. 311-325).
14. Ge, M., & Helfert, M. (2007). A Review of Information Quality Research - Develop a Research
Agenda. International Conference on Information Quality, November 9-11, 2007,
Cambridge, Massachusetts, USA.
15. Haug, Anders, Frederik Zachariassen, and Dennis Van Liempd. "The costs of poor data quality." Journal of Industrial Engineering and Management 4.2 (2011): 168-193.
16. Hinckley, C. Martin. Make No Mistake!: An Outcome-Based Approach to Mistake-Proofing. Productivity Press, 2001.
17. Kyung Seok Ryu, Joo Seok Park, and Jae Hong Park, "A Data Quality Management Maturity
Model," ETRI Journal, vol. 28, no. 2, Apr. 2006, pp. 191-204.
18. Leo L. Pipino, Yang W. Lee, and Richard Y. Wang. 2002. Data quality assessment. Commun.
ACM 45, 4 (April 2002), 211-218.
19. R.G. Lee, B.G. Dale, (1998) "Business process management: a review and evaluation",
Business Process Management Journal, Vol. 4 Iss: 3, pp.214 – 225.
20. Thomas C. Redman. 1998. The impact of poor data quality on the typical enterprise.
Commun. ACM 41, 2 (February 1998), 79-82.
21. Wang, Michael H. "A cost-based FMEA decision tool for product quality design and management." Intelligence and Security Informatics (ISI), 2011 IEEE International Conference on. IEEE, 2011.