the role of visualization in data miningwebstaff.itn.liu.se › ~aidvi › courses › 06 › dm ›...

Post on 23-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Role of Visualization in Data Mining

Björn Gustafsson, Jonas Gustafsson and Ragnar Hammarqvist

Why use Information Visualization in Data Mining?

Information visualization lexicon definition:A method of presenting data or information in non traditional, interactive graphical forms. By using 2-D or 3-D color graphics and animation, these visualizations can show the structure of information, allow one to navigate through it, and modify it with graphical interactions.

Problem is finding valuable information hidden in raw data (monitoring systems, credit cards and so on).

Allows faster data exploration and generally provides a better result than automatic data mining algorithms.

The idea of visual exploration in data mining is to represent the data with visualization.

The Visual Exploration Paradigm

1. overview first

2. zoom and filter

3. details-on-demand

Visualization Techniques

Geometrically transformed displays

Iconic displays

Dense Pixel Displays

Stacked Displays

Geometrically transformed displays

Example: Parallel Coordinates

Iconic displays

Example: Chernoff faces and star icons.

Dense Pixel Displays

Example: Tree map.

Stacked Displays

Example: Table lens.

Interaction and Distortion

Dynamic Projections

Interactive filtering

Interactive distortion

Interactive Linking and Brushing

Classification

Data type to be visualizedVisualization techniqueInteraction and distortion

Example: (multi-dimensional, Iconic Display, Distortion)

Exploratory Data Analysis, EDA

Not hypothesis testing.

Find systematic relations between variables when there are no expectations of what the result might be.

Computational EDA and Graphical EDA techniques.

Computational EDA

Basic statistical exploratory

Multivariate exploratory analysisNeural Networks

Graphical EDA techniques

Brushing

Other techniques

Verification of results of EDA

Only a first stage of analysis

In a second stage the data needs to be confirmed

Visualizing Data Mining models

Extracting information from a data base that the user did not already know about.

To be able to do this we need to understand the user’s needs and design the visualization after that.

The two major driving forces behind visualizing data mining models are understanding and trust.

Understanding leads to trust.

Trust

The ways to assessing trust are many, for example:

Not violate expected qualitative principles when having a general knowledge of the domain. Example of violation: finding correlation between shoe size and IQ.

Domain knowledge is also critical for outlier detection. If you know that the domain is between the numbers 10 and 50, you can not put numbers outside it. It simply makes no sense.

Measure their trustworthiness in some way, such as a quantified measurement of variance.

Understanding

There are three components for understanding a model:

Representation

Interaction

Integration

Comparing methods

You can compare models in three approaches:

Input/output

Algorithms

Processes

Bad examples of Information Visualization

Why all the graphics?

Bad examples of Information Visualization

Sufficient and appropriate

Bad examples of Information Visualization

What does it display?

Bad examples of Information Visualization

This is what it tried to display (left), but it can still be distorted (right).

Bad examples of Information Visualization

Based on the assumption that happiness should be linearly related to GNP.

Current research ITN

VITA research:Goal: Improving existing visual user interfaces (VUI) methods.

Mission: Discover and create tools and technologies :1. Aid human analytical reasoning 2. State-of-the-art visual representations and interaction techniques3. Effectively communicate analytical understanding to a wide variety of users

Current research ITN

Example application: geovisualization application GeoWizard developed with the GAV framework

Current research ITN: Jimmy Johansson

Parallel Coordinates in Information Visualization

Transfer functions : linear, quadratic, square root and logarithmic.

Parallel Coordinates in 3D.

References

[1] Daniel Keim, Information Visualization andVisual Data Mining, IEEE transactions on visualization and computer graphics, vol. 7 No. 1 January-March 2002.

[2] Michael Friendly, “Gallery of Data Visualization”http://www.math.yorku.ca/SCS/Gallery/, April 2007.

[3] Kurt Thearling, Barry Becker among others, "Visualizing Data Mining Models" http://www.thearling.com/text/dmviz/modelviz.htm, April 2007.

[4] Statsoft, “Exploratory Data Analysis”http://www.statsoft.com/textbook/stdatmin.html#eda, April 2007.

[5] Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining, Addison-Wesley, 2006.

top related