looking for something special -- outlier detection in r
TRANSCRIPT
PhD student in Computer Engineering
Fault Tolerant Systems Research Group
Availability of 99.99%
2011: „We need a method to detect erroneous
observations.”
PhD. student in Computer Engineering
Fault Tolerant Systems Research Group
Availability of 99.99%
2011: „We need a method to detect erroneous
observations.”
PhD. student in Computer Engineering
Fault Tolerant Systems Research Group
Availability of 99.99%
2011: „We need a method to detect erroneous
observations.”
PhD. student in Computer Engineering
Fault Tolerant Systems Research Group
Availability of 99.99%
2011: „We need a method to detect erroneous
observations.”
„An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism” (Hawkins 1980)
„An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism” (Hawkins 1980)
„An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism” (Hawkins 1980)
„An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism” (Hawkins 1980)
PISA 2012 results
Children’s math and reading scores by country
PISA 2012 results
Children’s math and reading scores by country
PISA 2012 results
Children’s math and reading scores by country
China-Shanghai
Quatar Peru
Japan
Indonesia Colombia
1970
1980
1990
2000
2010
isodepth
mve
db
lof
China-Shanghai
Kazakhstan
Montenegro Peru
Albania
Quatar
Hans Rosling’s TED talk in 2006 Still one of the most popular talks (as of Oct. 2015)
Fault Tolerant Systems Research Group Outliers: high communication workload
Only planned system maintenance with moving lots of data
isodepth mve db lof
1970 1980 1990 2000 2010
mcd
bacon s-h-esd
fast-mcd
Algorithm R Scikit-learn Rapidminer WEKA ELKI
isodepth
MVE
DB
LOF
salankia
R packages: depth, fields, robustX, DMwR
Pictures
Forest Gump, Judit Polgár, Garry Kasparov
Outlier detection applications in finance, security, medicine, police surveillance
1977 and 1987 pictures
Github code: https://github.com/salankia/OutlierDetection-Budapest-BI-2015