dealing with outliers

8
Dealing with outliers 2012 Produced By Mr. Sunil Kumar Sharma, Published in Spinning Textiles Magazine, Vol.7, Issue-2, March – April 2013 Edition Page 1 Dealing with Outliers is a difficult task in spinning industry. There are so many questions arises when we plan to deal with outliers. We are trying to answers all these questions through this article. Q.1. What is an Outlier and it’s definition? Ans. An outliers can be understand through following definitions : - Outlier is a scientific term to describe things or phenomena that lie outside normal experience. In statistics, an outlier is an observation that is numerically distant from the rest of the data. An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs. An extreme deviation from the mean. An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. From the above definitions we can understand that an outlier value does not belongs to the normal population and it is different with others members or readings of a distribution. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be considered abnormal. Before abnormal observations can be singled out, it is necessary to characterize normal observations. Q. 2. Why outliers to be detected and removed from process? Ans. Outliers arises due to changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. A sample may have been contaminated with elements from outside the population being examined. Outlier detection has been used to detect and, where appropriate, remove anomalous observations from data. Outlier detection can identify system faults before they escalate with potentially catastrophic consequences. Outliers should be investigated carefully. Often they contain valuable information about the process under investigation or the data gathering and recording process. Before considering the possible elimination of these points from the data, one should try to understand why they appeared and whether it is likely similar values will continue to appear. Of

Upload: sunil-kumar-sharma

Post on 16-Jul-2015

73 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Dealing with Outliers

Dealing with outliers 2012

Produced By Mr. Sunil Kumar Sharma, Published in Spinning Textiles Magazine, Vol.7, Issue-2, March – April 2013 Edition Page 1

Dealing with Outliers is a difficult task in spinning industry. There are so many questions arises when we plan to

deal with outliers. We are trying to answers all these questions through this article.

Q.1. What is an Outlier and it’s definition?

Ans. An outliers can be understand through following definitions : -

Outlier is a scientific term to describe things or phenomena that lie outside

normal experience.

In statistics, an outlier is an observation that is numerically distant from the rest

of the data.

An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs.

An extreme deviation from the mean. An outlier is an observation that lies an abnormal distance from other values in a

random sample from a population.

From the above definitions we can understand that an outlier value does not belongs to the normal population and it is different with others members or readings of a distribution. In a sense, this definition leaves it up to the analyst (or a consensus process) to decide what will be considered abnormal. Before abnormal observations can be singled out, it is necessary to characterize normal observations.

Q. 2. Why outliers to be detected and removed from process?

Ans. Outliers arises due to changes in system behaviour, fraudulent behaviour,

human error, instrument error or simply through natural deviations in populations. A

sample may have been contaminated with elements from outside the population being

examined. Outlier detection has been used to detect and, where appropriate, remove

anomalous observations from data. Outlier detection can identify system faults before

they escalate with potentially catastrophic consequences. Outliers should be

investigated carefully. Often they contain valuable information about the process

under investigation or the data gathering and recording process. Before considering

the possible elimination of these points from the data, one should try to understand

why they appeared and whether it is likely similar values will continue to appear. Of

Page 2: Dealing with Outliers

Dealing with outliers 2012

Produced By Mr. Sunil Kumar Sharma, Published in Spinning Textiles Magazine, Vol.7, Issue-2, March – April 2013 Edition Page 2

course, outliers are often bad data points. An outlier is the abnormal reading which is

significantly different from most of the population of a normal distribution. This

significant different characteristic of outlier creates major variation in process or in

ultimate product characteristics. In spinning process there are many factors creates

variation in process. These variations considered as spinning abnormalities and

causing defects in yarn or fabric. One defective yarn package may spoil the thousand

meters of fabric length. A defect in spinning preparatory process may disturb the

working of whole spinning mill. Hence it is better to identify & remove such

abnormalities in early stages before they create problems. Systematic outlier detection

and it’s detail analysis up to root cause and finally eliminating the origin of outlier

reduces the variation significantly in downstream with better Yarn & fabric Quality.

Q. 3. How to identify or calculate outliers?

Ans. : There is no rigid mathematical definition of what constitutes an outlier; determining whether or not an observation is an outlier is ultimately a subjective exercise. There are several methods for detection of outliers, however in spinning mills commonly used method for identification & detection of outliers are based on mean and standard deviation. Hence here we explain this method only. A Standard Deviation is a measuring stick used to describe how data are dispersed around their average. A normal distribution, which takes the shape of a nice “bell curve, one Standard Deviation encompasses about 68.27% of all observation data represented with dark blue colour in fig.-1. Two Standard Deviations includes about 95.45% of all observations represented with dark blue & medium blue colour. And three Standard Deviations encompass nearly all values i.e. 99.73% of all observations represented with dark blue, medium & light blue colours. A graphical representation of a normal deviation is shows below in fig.1: -

Page 3: Dealing with Outliers

Dealing with outliers

Produced By Mr. Sunil Kumar Sharma, Published in Spinning Textiles Magazine, Vol.

Fig. 1 : Graphical Representation of a Normal Distribution with different Standard deviation level.

Where x is an observation from a normally distributed

its standard deviation:

Thus reading outside the two sigma (i.e. 2 S.D.) be considered as outlier depending upon the no. of occurrences total number of readings observed

Chart : 1 : % Population & expected frequency of

Range % Population in range

μ ± 1σ

μ ± 1.5σ

μ ± 2σ

μ ± 2.5σ

μ ± 3σ

It is clear with above table should be 22 in case of two sigma limit (i.e. 2 SD) & 370 for three sigma (i.e. 3 SD) limits. Hence for a spinning mill two sigma limit

Dealing with outliers

ublished in Spinning Textiles Magazine, Vol.7, Issue-2, March – April 2013 Edition

Fig. 1 : Graphical Representation of a Normal Distribution with different Standard deviation level.

is an observation from a normally distributed random variable, μ is the mean of the distribution, and

two sigma (i.e. 2 S.D.) or three sigma (i.e. 3 SD) depending upon the no. of occurrences for outside range

total number of readings observed. % Population & expected frequency of outliers for different standard deviation range.

Population in range Expected frequency outside range

68.27

86.64

95.45

98.76

99.73

It is clear with above table that minimum observation for identifying the outliers should be 22 in case of two sigma limit (i.e. 2 SD) & 370 for three sigma (i.e. 3 SD)

Hence for a spinning mill two sigma limits are more appropriate & practical for

2012

Page 3

Fig. 1 : Graphical Representation of a Normal Distribution with different Standard deviation level.

is the mean of the distribution, and σ is

or three sigma (i.e. 3 SD) limits may for outside range and

outliers for different standard deviation range.

Expected frequency outside range

1 in 3

1 in 7

1 in 22

1 in 81

1 in 370

that minimum observation for identifying the outliers should be 22 in case of two sigma limit (i.e. 2 SD) & 370 for three sigma (i.e. 3 SD)

more appropriate & practical for

Page 4: Dealing with Outliers

Dealing with outliers 2012

Produced By Mr. Sunil Kumar Sharma, Published in Spinning Textiles Magazine, Vol.7, Issue-2, March – April 2013 Edition Page 4

outlier detection instead of three sigma limit. 2 sigma limits provide us opportunity to review & analysis approx. 4.55 % readings from total observation for further improvement. Q. 4 : In spinning Quality control, how we can utilize these principles and implement the 2 σ theory for detection of outliers? Ans. : - In spinning Quality control we generate lot of data during daily testing of in-process & finished material. For Outliers detection through two sigma analysis we require minimum 22 no. of readings, hence this analysis is practicable & beneficial for speed frame & ring frame section where we obtain maximum test readings on daily basis through spindle wise testing. Following critical parameters & test results may be analyzed for outliers detection with two sigma limits : -

1. Spindle wise Roving Hank measurement. 2. Spindle wise Roving U %. 3. Spindle wise count measurement of Ring frame. 4. Spindle wise U %, Imperfection level & Hairiness index of ring frame. 5. Spindle wise single yarn strength & Elongation %. 6. Or any report may be analyzed for outlier analysis, where minimum no. of

readings should be more than 22.

Page 5: Dealing with Outliers

Dealing with outliers 2012

Produced By Mr. Sunil Kumar Sharma, Published in Spinning Textiles Magazine, Vol.7, Issue-2, March – April 2013 Edition Page 5

Chart – 2 : A reference UT-5 test report for outlier detection through 2 sigma limit.

Page 6: Dealing with Outliers

Dealing with outliers 2012

Produced By Mr. Sunil Kumar Sharma, Published in Spinning Textiles Magazine, Vol.7, Issue-2, March – April 2013 Edition Page 6

Methodology : - Chart – 2, illustrates a reference UT-5 test report in which outliers being identified for U % & Hairiness index by applying two sigma limits. In given report total 30 No. of samples were tested from identified ring frames spindles. Frame no. & spindle numbers mentioned in first column of the test report.

1. Detection of outliers for U % : - The average U % of total 30 readings is 8.88 & standards deviation is 0.20. Hence 2 σ limit will be = 0.20 x 2 = 0.40 i.e. 8.88 ± 0.40 = 8.48 to 9.28. Now see the total readings of U %, where test no.- 29 observed beyond this

limit, which belong to RF No.- 14 RHS, Spdl No.- 612. Which is an outlier reading for U %. Highlighted with yellow colour in test report.

2. Detection of outliers for Hairiness Index i.e. H : - The average of all readings for H is 5.09 & standard deviation is 0.16. Hence 2 σ limit will be = 0.16 x2 = 0.32 i.e. 5.09 ± 0.32 = 4.77 to 5.41. Test No.-2 observed beyond this range, which belong to RF No.-13 RHS, Spdl.

No.-108, which is an outlier reading for hairiness index. Highlighted in orange colour in test report.

Fig. 2 : - Graphical representation of outliers for U %

8.5

8.6

8.7

8.8

8.9

9

9.1

9.2

9.3

9.4

9.5

9.6

9.7

0 5 10 15 20 25 30

U %

No. of Readings

Outlier

Page 7: Dealing with Outliers

Dealing with outliers 2012

Produced By Mr. Sunil Kumar Sharma, Published in Spinning Textiles Magazine, Vol.7, Issue-2, March – April 2013 Edition Page 7

Fig. 3 : - Graphical representation of outliers for Hairiness Index

Conclusion : - An outlier is a distinct observation from the mean value, which represent different characteristics with most of others members of a normal distribution. Hence these observations should be detected, removed and detailed analyzed up to the root cause of abnormality and to be corrected. A defect in spinning preparatory process may disturb the working of whole spinning mill or a defects of spinning process may causes huge losses in downstream processes, hence to detect the prominent outliers at spinning stage itself and eliminating it’s root cause will significantly reduce the rejection in next processes. There are several methods for detection of outliers but for spinning process, outlier detection through two sigma limit is more practicable and easy method. Outlier’s detection through two sigma limits covers minimum 4.55 % observation for at least more than 22 No. of reading, which provides opportunity to correct & analyzed at-least 4.55 % observations and reduces the variation. However if No. of readings are more than 370 and there is huge variation in process, than 3 σ limit may be applied, which covers total 99.73 % readings and approx. 0.27 % no. of readings will be outside the limit which will be considered as outliers. Methodology of outlier detection through 2 σ or 3 σ is very simple as now days most of the reports generated through PC based instruments which itself provide the standard deviation.

4.6

4.7

4.8

4.9

5

5.1

5.2

5.3

5.4

5.5

5.6

0 5 10 15 20 25 30

Hai

rin

ess

Ind

ex

No. of Readings

Outlier

Page 8: Dealing with Outliers

Dealing with outliers 2012

Produced By Mr. Sunil Kumar Sharma, Published in Spinning Textiles Magazine, Vol.7, Issue-2, March – April 2013 Edition Page 8

Produced by : Mr. Sunil Kumar Sharma, Manager – QAD,

Mobile No. : – 09552596742, 09921417107

E_mail : - [email protected] Loknayak Jayprakash Narayan Shetkari Sahakari Soot Girni Ltd.

Kamalnagar, Untawad – Hol, Shahada,

Tal. : - Shahada, Dist. : - Nandurbar (MS)

Pin : - 425409