vi l e l ti f visual evaluation of

1
Institut für Institut für Vi lE l ti f Informatik Visual Evaluation of Informatik Visual Evaluation of Ludwig- Visual Evaluation of Ludwig Maximilians O tli Dt ti Mdl Maximilians- Outlier Detection Models Universität Outlier Detection Models München Outlier Detection Models München It ti O tli S Environment for Interpreting Outlier Scores: Environment for Interpreting Outlier Scores: Interpreting the meaning of the outlier score provided by Deve Loping Interpreting the meaning of the outlier score provided by Deve Loping some outlier method is a non trivial problem as the scores Deve Loping some outlier method is a non trivial problem as the scores KDD Applications differ widely between different methods or different data sets KDD-Applications differ widely between different methods or different data sets diff t i f th dt t Oft th KDD Applications or even different regions of the same data set. Often the Supported by Inde Str ct res or even different regions of the same data set. Often the d ii h th t dt it t ll i tli i Supported by Index-Structures decision whether or not a data point actually is an outlier is Supported by Index Structures not easy not easy . In ELKI 0 3 the contrast of outlier scores can be visualized in Motivation: In ELKI 0.3, the contrast of outlier scores can be visualized in Motivation: a histogram and in a bubble-plot where the outlier scores are I th ft t ELKI f ilit t th f id a histogram and in a bubble-plot, where the outlier scores are In the software system ELKI, we facilitate the use of a wide scaled to a suitable radius of a bubble where a large bubble f diff t l ith l ith id hi f scaled to a suitable radius of a bubble, where a large bubble i l hi h d f tli F thi d f range of different algorithms along with a wide choice of signals a high degree of outlierness. For this end, a range of distance measures and different possibilities of visualization signals a high degree of outlierness. For this end, a range of diff t (ti i l d ti i l) li th d i il bl distance measures and different possibilities of visualization. different (trivial and non-trivial) scaling methods is available. Especially for outlier detection methods the used distance Especially for outlier detection methods, the used distance function may have considerable impact on certain methods function may have considerable impact on certain methods. In other outlier detection methods, the distance function may In other outlier detection methods, the distance function may b i li itl d fi d b th l ith Diff t tli be implicitly defined by the algorithm. Different outlier dt ti dl h diff t i t iti f detection models, however, pursue a different intuition of what constitutes an outlier anyway Hence we provide a what constitutes an outlier anyway . Hence we provide a visualization to easily compare the outlier scores annotated visualization to easily compare the outlier scores annotated by different algorithms to the same object by different algorithms to the same object. Fl ibl F k Flexible Framework: Flexible Framework: A f k ELKI i fl ibl i th t it ll t As a framework, ELKI is flexible in a sense, that it allows to read arbitrary data types (given a suitable parser for your read arbitrary data types (given a suitable parser for your Availability: data file or adapter for your database) and that it supports Availability: data file or adapter for your database), and that it supports Availability: Thi k i i d fi d h d d bi i the use of any distance or similarity measure appropriate for This work is continued find the source-code and binaries, the use of any distance or similarity measure appropriate for This work is continued find the source code and binaries, d t ti d b t i the given data type. Usually, an algorithm needs to be provi- documentation, and bug-reports via: the given data type. Usually, an algorithm needs to be provi dd ith di t f ti f t Th di t ded with a distance function of some sort. Thus, distance f nctions connect arbitrar data t pes to arbitrar algorithms http://www dbs ifi lmu de/research/KDD/ELKI/ functions connect arbitrary data types to arbitrary algorithms. http://www.dbs.ifi.lmu.de/research/KDD/ELKI/ Elk A ht tH Pt Ki l Li Rih t Elke Achtert Hans-Peter Kriegel Lisa Reichert Elke Achtert, Hans Peter Kriegel, Lisa Reichert, EihShb tR ii W jd ki A th Zi k Database Erich Schubert Remigius Wojdanowski Arthur Zimek Database Erich Schubert, Remigius Wojdanowski, Arthur Zimek Group Group

Upload: others

Post on 04-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Vi l E l ti f Visual Evaluation of

Institut fürInstitut für Vi l E l ti fInformatik Visual Evaluation ofInformatik Visual Evaluation ofLudwig-

Visual Evaluation ofLudwigMaximilians O tli D t ti M d lMaximilians- Outlier Detection ModelsUniversität Outlier Detection ModelsMünchen

Outlier Detection ModelsMünchen

I t ti O tli SEnvironment for Interpreting Outlier Scores:Environment for Interpreting Outlier Scores:p gInterpreting the meaning of the outlier score provided byDeveLoping Interpreting the meaning of the outlier score provided byDeveLoping some outlier method is a non trivial problem as the scoresDeveLoping some outlier method is a non trivial problem as the scores

KDD Applications differ widely between different methods or different data setsKDD-Applications differ widely between different methods or different data setsdiff t i f th d t t Oft thKDD Applications or even different regions of the same data set. Often the

Supported by Inde Str ct resor even different regions of the same data set. Often thed i i h th t d t i t t ll i tli iSupported by Index-Structures decision whether or not a data point actually is an outlier isSupported by Index Structures p ynot easynot easy.yIn ELKI 0 3 the contrast of outlier scores can be visualized inMotivation: In ELKI 0.3, the contrast of outlier scores can be visualized inMotivation: a histogram and in a bubble-plot where the outlier scores are

I th ft t ELKI f ilit t th f ida histogram and in a bubble-plot, where the outlier scores are

In the software system ELKI, we facilitate the use of a wide scaled to a suitable radius of a bubble where a large bubblet e so t a e syste , e ac tate t e use o a def diff t l ith l ith id h i f

scaled to a suitable radius of a bubble, where a large bubblei l hi h d f tli F thi d frange of different algorithms along with a wide choice of signals a high degree of outlierness. For this end, a range ofg g g

distance measures and different possibilities of visualizationsignals a high degree of outlierness. For this end, a range ofdiff t (t i i l d t i i l) li th d i il bldistance measures and different possibilities of visualization. different (trivial and non-trivial) scaling methods is available.

Especially for outlier detection methods the used distance( ) g

Especially for outlier detection methods, the used distancefunction may have considerable impact on certain methodsfunction may have considerable impact on certain methods.In other outlier detection methods, the distance function mayIn other outlier detection methods, the distance function mayb i li itl d fi d b th l ith Diff t tlibe implicitly defined by the algorithm. Different outlierbe p c y de ed by e a go e e ou ed t ti d l h diff t i t iti fdetection models, however, pursue a different intuition of, , pwhat constitutes an outlier anyway Hence we provide awhat constitutes an outlier anyway. Hence we provide avisualization to easily compare the outlier scores annotatedvisualization to easily compare the outlier scores annotatedby different algorithms to the same objectby different algorithms to the same object.

Fl ibl F kFlexible Framework:Flexible Framework:A f k ELKI i fl ibl i th t it ll tAs a framework, ELKI is flexible in a sense, that it allows to, ,read arbitrary data types (given a suitable parser for yourread arbitrary data types (given a suitable parser for your Availability:data file or adapter for your database) and that it supports Availability:data file or adapter for your database), and that it supports Availability:

Thi k i i d fi d h d d bi ithe use of any distance or similarity measure appropriate for This work is continued – find the source-code and binaries,the use of any distance or similarity measure appropriate for This work is continued find the source code and binaries,d t ti d b t ithe given data type. Usually, an algorithm needs to be provi- documentation, and bug-reports via:the given data type. Usually, an algorithm needs to be provi

d d ith di t f ti f t Th di t, g p

ded with a distance function of some sort. Thus, distance,f nctions connect arbitrar data t pes to arbitrar algorithms http://www dbs ifi lmu de/research/KDD/ELKI/functions connect arbitrary data types to arbitrary algorithms. http://www.dbs.ifi.lmu.de/research/KDD/ELKI/y yp y g

Elk A ht t H P t K i l Li R i h tElke Achtert Hans-Peter Kriegel Lisa ReichertElke Achtert, Hans Peter Kriegel, Lisa Reichert,gE i h S h b t R i i W jd ki A th Zi kDatabase Erich Schubert Remigius Wojdanowski Arthur ZimekDatabase Erich Schubert, Remigius Wojdanowski, Arthur Zimek

Group, g j ,

Group