© 2003, prentice-hall1 chapter 3: data mining and data visualization modern data warehousing,...

83
© 2003, Prentice-Hall 1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas BCIS 4660 Spring 2006

Upload: elfrieda-webb

Post on 27-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 1

Chapter 3: Data Mining andData Visualization

Modern Data Warehousing, Mining, and Visualization: Core Concepts

by George M. Marakas

BCIS 4660 Spring 2006

Page 2: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 2

3-1: A Picture is Worth a Thousand Words

• Data mining is the set of activities used to find new, hidden, or unexpected patterns in data.

• These techniques are often called knowledge data discovery (KDD), and include statistical analysis, neural or fuzzy logic, intelligent agents or data visualization.

• The KDD techniques not only discover useful patterns in the data, but also can be used to develop predictive models.

Page 3: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 3

Verification Versus Discovery

• In the past, decision support activities were primarily based on the concept of verification.

• This required a great deal of prior knowledge on the decision-maker’s part in order to verify a suspected or known relationship.

• With the advance of technology, the concept of verification began to turn into discovery—a.k.a, data mining.

Page 4: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 4

Data Mining’s Growth in Popularity

• One reason is that we keep getting more and more data all the time and need tools to understand it.

• We also are aware that the human brain has limits processing multidimensional data (RULE of 7).

• A third reason is that machine learning techniques are becoming more affordable and more refined at the same time.

Page 5: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 5

Making Accurate Predictions with Data Mining

• Although the literature contains statements such as “data mining will allow us to predict who will buy a particular product,” that is against human nature.

• In situations where data mining is used to predict response to a marketing campaign, only about 5% of the people selected as “likely respondents” actually do respond.

• Even Exit Polls – post-behavior predictions, can be misleading! – E.g., 2004 Presidential election

Page 6: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 6

Making Accurate Predictions with Data Mining (cont.)

• Although the accuracy of predicting individual behavior is not so good, it is better than it seems, since direct marketing (mailers, email, phone calls) efforts often have “hit rates” of only about 1% without data mining.

• Therefore a 5X increase in successes is quite good!

Page 7: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 7

3-2: Online Analytical Processing (OLAP)

1. Multidimensional view

2. Transparent to user

3. Accessible

4. Consistent reporting

5. Client-server architecture

6. Generic dimensionality

7. Dynamic sparse matrix handling

8. Multiuser support

9. Cross-dimensional ops

10. Intuitive manipulation

11. Flexible reporting

12.Unlimited dimension and aggregation

Codd (co-founder of relational databases with Date) developed a set of 12 rules for the development of multidimensional databases (Recall Chap. 9 of Pratt):

Page 8: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 8

OLAP as Implemented

• Codd introduced the term OLAP in 1993• To date, it does not appear that any implementation

exists that satisfies all 12 multidimensionality rules.• Some people argue it might not even be possible to

attain all of them.• More recently, the term OLAP has come to represent the

broad category of software technology that enables multidimensional analysis of enterprise data.

Page 9: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 9

4

3

1

0.3Product

0.4

0.5

2

0.6

0.7

2

Sales

1

3Region

• Data can be viewed across several dimensions. Here sales are arrayed by region and product.

• A fourth dimension could be added by using several graphs -- perhaps at different points of time.

• Most analyses have many more dimensions than this. MOLAP handles data as an n-dimensional hypercube.

• Data slices cut across dimensions (hold one dimension constant)

Multidimensional OLAP (MOLAP)

Page 10: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 10

Relational OLAP (ROLAP)

• A large relational database server replaces the multidimensional one.

• The database contains both detailed and summarized data, allowing “drill down” techniques to be applied.

• SQL interfaces allow vendors to build tools, both portable and scalable.

• This does require databases with many relational tables (typically 100s+) which may lead to substantial processor overhead on complex joins.

Page 11: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 11

A Typical Relational Schema (ERD)

Page 12: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 12

3-3: Techniques Used to Mine the Data

• Paralleling the popularity of data mining itself, the development of new techniques is exploding as well.

• Many innovations are vendor-specific (e.g., SAS EM, Cognos), which sometimes does little to advance the state of the art.

• Regardless, data-mining techniques tend to fall into four major categories:

1. classification 2. association3. sequencing 4. clustering

Page 13: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 13

Classification methods

• The goal is to discover rules that define whether an item belongs to a particular subset or class of data.

• For example, if we are trying to determine which households will respond to a direct mail campaign, we will want rules that separate the “probables” from the “not probables”.

• These IF-THEN rules often are portrayed in a tree-like structure or diagram.

Page 14: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 14

Association Methods

• These techniques search all transactions from a system for patterns of occurrence.

• A common method is market basket analysis (a.k.a, affinity analysis, association analysis), in which the set of products purchased by thousands of consumers are examined.

• Results are then portrayed as percentages; for example, “30% of the people that buy steaks also buy charcoal”.

Page 15: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 15

Sequencing Methods

• These methods are applied to time series data in an attempt to find hidden trends.

• If found, these can be useful predictors of future events (e.g., leading indicators).

• For example, customer groups that tend to purchase products tied-in with hit movies would be targeted with promotional campaigns timed to release dates.

Page 16: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 16

Clustering Techniques

• Clustering techniques attempt to create partitions in the data according to some distance metric.

• The clusters formed are data grouped together simply by their similarity to their neighbors (factor and discriminate analysis).

• By examining the characteristics of each cluster, it may be possible to establish rules for classification.

Page 17: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 17

Data Mining Technologies

• Statistics – the most mature data mining technologies, but are often not applicable because they need clean data. In addition, many statistical procedures assume linear relationships, which limits their use [Regression, correlation, ANOVA, etc.]

• Neural networks, genetic algorithms, fuzzy logic – these technologies are able to work with complicated and imprecise data. Their broad applicability has made them popular in the field.

Page 18: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 18

Data Mining Technologies (cont.)

• Decision trees – these technologies are conceptually simple and have gained in popularity as better tree growing software was introduced. Because of the way they are used, they are perhaps better called “classification” trees.

Page 19: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 19

The Knowledge Discovery [KD] Search Process

Table 3-2 contains a more detailed outline of the process, but the major steps are:1. Define the business problem and obtain the data

to study it.

2. Use data mining software to model the problem.

3. Mine the data to search for patterns of interest.

4. Review the mining results and refine them by respecifying the model.

5. Once validated, make the model available (publish) to other users of the DW.

Page 20: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 20

Creating a (task-relevant) Data-Mining Model

Although syntax differs from vendor to vendor, building a model on top of a database is much like creating a table:CREATE MODEL mail_list(Income character input, Age integer input, Respond character input)

To populate it with data, use an SQL INSERT:INSERT INTO mail_listSELECT income, age, respondFROM client_listWHERE region = ‘Southeast”

Page 21: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 21

Creating a Data-Mining Model (cont.)

The process automatically created additional views of the model (mail_list_UNDERSTAND and mail_list_PREDICT). These can be examined (MS OLAP pseudo-code):

SELECT * FROM mail_list_UNDERSTANDWHERE input_column_name = “income” and

input_column_value = “high” andoutput_column_name = “respond” andoutput_column_value = “yes”

Once these are created, they are treated as tables in the database so they can be viewed and joined by other users.

Page 22: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 22

New Applications for Data Mining

As the technology matures, new applications emerge, especially in two new categories, text mining (AskSam) and web mining. Some text mining examples are:– Distilling the meaning (abstract) of a text– Accurate summarization of a text– Explication of the text theme structure– Clustering of texts

Page 23: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 23

Web mining

• Web mining is a special case of text mining where the mining occurs over a website (e.g., Amazon.com).

• It enhances the website with intelligent behavior, such as suggesting related links or recommending new products.

• It allows you to unobtrusively learn the interests of the visitors and modify their user profiles in real time.

• They also allow you to match resources to the interests of the visitor.

Page 24: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 24

3-4: Market Basket Analysis: The King of Algorithms

• This is the most widely used and, in many ways, most successful data mining algorithm.

• Also, known as “Affinity” or “Association” Analysis• It essentially determines what products people

purchase together.• Stores can use this information to place these

products in the same area.• Direct marketers can use this information to determine

which new products to offer to their current customers.• Inventory policies can be improved if reorder points

reflect the demand for the complementary products.

Page 25: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 25

Association Rules for Market Basket Analysis

Rules are written in the form “left-hand side implies right-hand side” and an example is:

Yellow Peppers IMPLIES Red Peppers, Bananas, Bakery

To make effective use of a rule, three numeric measures about that rule must be considered:

(1) support

(2) confidence and

(3) lift

Page 26: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 26

Measures of Predictive Ability

Yellow Peppers IMPLIES [Red Peppers, Bananas, Bakery]

1. Support refers to the percentage of baskets where the rule was true (both left and right side products were present in the basket). Intersection of both sides present.

2. Confidence measures what percentage of baskets that contained the left-hand product also contained the right.

e.g., If basket contains Peppers What % contained BananasSmaller universe, so numbers will be higher

3. Lift measures how much more frequently the left-hand item is found with the right than without the right.

Ratio: “Confidence” divided by % of baskets with Peppers that do NOT contain bananas. If 50% of time peppers are found with bananas and 50% not found with bananas, the lift is 1.0

Page 27: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 27

An Example

• The confidence suggests people buying any kind of pepper also buy bananas.

• Green peppers sell in about the same quantities as red or yellow, but are not as predictive.

Rule:

Green Peppers IMPLIES

Bananas

Red Peppers IMPLIES

Bananas

Yellow Peppers IMPLIES

Bananas

Lift 1.37 1.43 1.17

Support 3.77 8.58 22.12

Confidence 85.96 89.47 73.09

Page 28: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 28

Market Basket Analysis Methodology

• We first need a list of transactions and what was purchased. This is pretty easily obtained these days from scanning cash registers.

• Next, we choose a list of products to analyze, and tabulate how many times each was purchased with the others.

• The diagonals of the table shows how often a product is purchased in any combination, and the off-diagonals show which combinations were bought.

Page 29: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 29

A Convenience Store Example (5 transactions)

Consider the following simple example about five transactions at a convenience store:

Transaction 1: Frozen pizza, cola, milkTransaction 2: Milk, potato chipsTransaction 3: Cola, frozen pizzaTransaction 4: Milk, pretzelsTransaction 5: Cola, pretzels

These need to be cross tabulated and displayed in a table.

Page 30: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 30

A Convenience Store Example (5 transactions; Cross tabulated)

• Pizza and Cola sell together more often than any other combo; a cross-marketing opportunity?

• Milk sells well with everything – people probably come here specifically to buy it.

Product Bought

Pizza also

Milk

also

Cola

also

Chips also

Pretzels

also

Pizza 2 1 2 0 0

Milk 1 3 1 1 1

Cola 2 1 3 0 1

Chips 0 1 0 1 0

Pretzels 0 1 1 0 2

Page 31: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 31

Using the Results

• The tabulations can immediately be translated into association rules and the numerical measures computed.

• Comparing this week’s table to last week’s table can immediately show the effect of this week’s promotional activities.

• Some rules are going to be trivial (hot dogs and buns sell together) or inexplicable (toilet rings sell only when a new hardware store is opened).

Page 32: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 32

Limitations to Market Basket Analysis

• A large number of real transactions are needed to do an effective basket analysis, but the data’s accuracy is compromised if all the products do not occur with similar frequency. Statistical insignificance results with “empty” cells.

• The analysis can sometimes capture results that were due to the success of previous marketing campaigns (and not natural tendencies of customers).

Page 33: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 33

Performing Analysis with Virtual Items

• The sales data can be augmented with the addition of virtual items. For example, we could record that the customer was new to us, or had children.

• The transaction record might look like:

Item 1: Sweater Item 2: Jacket Item 3: New• This might allow us to see what patterns new customers

have versus old customers.

Page 34: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 34

Taxonomies

• The presence of items not purchased very frequently is an obstacle to a good market basket analysis [missing data].

• One way to deal with this is to eliminate products that occur with a frequency less than some threshold.

• A better idea would be to try to form groups of products that fall below the threshold. Four flavors of popsicle occur 9% of the time all together, but no more than 3% individually.

Page 35: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 35

Multidimensional Market Basket Analysis

• Rules can involve more than two items, for example Plant and Clay Pot IMPLIES Soil.

• These rules are built iteratively. First, pairs are found, then relevant sets of three or four.

• These are then pruned by removing those that occur infrequently.

• In an environment like a grocery store, where customers commonly buy over 100 items, rules could involve as many as 10 items.

Page 36: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 36

3-5: Current Limitations and Challenges to Data Mining

Despite the potential power and value, data mining is still a new field. Some things that that thus far have limited advancement are:– Identification of missing information – not all

knowledge gets stored in a database– Data noise and missing values – future systems

need better ways to handle this– Large databases and high dimensionality – future

applications need ways to partition data into more manageable chunks

Page 37: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 37

3-6: Data Visualization: “Seeing” the Data

Page 38: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 38

Visual Presentation

• For any kind of high dimensional data set, displaying predictive relationships is a challenge.

• The picture on the previous slide uses 3-D graphics to portray the weather balloon data numbers in text Table 11-4. We learn very little from just examining the numbers .

• Shading is used to represent relative degrees of thunderstorm activity, with the darkest regions the heaviest activity.

Page 39: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 39

A Bit of History

• An early effort used sequences of two-dimensional graphs to add depth.

• Current virtual reality programs allow the user to step through a data set. Try going to a realtor’s website and taking a tour of a house up for sale.

http://www.microsoft.com/solutions/bi/overview/visualization.asp

Page 40: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 40

Data Visualization

• Multidimensionality Visualization: Modern data and information may have several dimensions. – Dimensions:

• Products• Salespeople• Market segments• Business units• Geographical locations• Distribution channels• Countries• Industries

Data visualization refers to presentation of data by technologies such as digital images, geographical information systems, graphical user interfaces, multidimensional tables and graphs, virtual reality, three-dimensional presentations, videos and animation.

Page 41: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 41

Data Visualization Continued

Multidimensionality Visualization:

• Measures:• Money• Sales volume• Head count• Inventory profit• Actual versus forecasted results.

• Time:• Daily• Weekly• Monthly• Quarterly• Yearly.

Page 42: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 42

Data Visualization Continued

Page 43: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 43

Data Visualization Continued

• A geographical information system (GIS) is a computer-based system for capturing, storing, checking, integrating, manipulating, and displaying data using digitized maps. Every record or digital object has an identified geographical location. It employs spatially oriented databases.

• Visual interactive modeling (VIM) uses computer graphic displays to represent the impact of different management or operational decisions on objectives such as profit or market share.

• Virtual reality (VR) is interactive, computer-generated, three-dimensional graphics delivered to the user. These artificial sensory cues cause the user to “believe” that what they are doing is real.

Page 44: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 45

Human Visual Perception and Data Visualization

• Data visualization is so powerful because the human visual cortex converts objects into information so quickly.

• The next three slides show: (1) usage of global private networks,

(2) flow through natural gas pipelines, and

(3) a risk analysis report that permits the user to draw an interactive yield curve.

• All three use height or shading to add additional dimensions to the figure.

Page 45: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 46

Global Private Network Activity

High Activity

Low Activity

Page 46: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 47

Natural Gas Pipeline Analysis

Note: Height shows total flow through compressor stations.

Page 47: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 48

An “Enlivened” Risk Analysis Report

Page 48: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 49

Geographical Information Systems (GIS)

A GIS is a special purpose database that contains a spatial coordinate system. A comprehensive GIS requires:1. Data input from maps, aerial photos, etc.

2. Data storage, retrieval and query

3. Data transformation and modeling

4. Data reporting (maps, reports and plans)

Page 49: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

Image from mapquest.com

The Power of Visualization:Driving directions

1. Start out going Southwest on ELLSWORTH AVE

Towards BROADWAY by turning right.

2: Turn RIGHT onto BROADWAY.

3. Turn RIGHT onto QUINCY ST.

4. Turn LEFT onto CAMBRIDGE ST.

5. Turn SLIGHT RIGHT onto MASSACHUSETTS AVE.

6. Turn RIGHT onto RUSSELL ST.

Page 50: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

Images from yahoo.com

Visualization Success Stories

Page 51: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 52

The Special Capabilities of a GIS• In general, a GIS contains two types of

data:Spatial data: these elements correspond to a

uniquely-defined location on earth. They could be in point, line or polygon form.

Attribute data: These are the data that will be portrayed at the geographic references established by spatial data.

• Example: Data from an opinion poll is displayed for multiple regions in the United States. Clicking on an area allows the user to drill down to the results for smaller areas.

Page 52: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 53

Telephone Polling Results

Note: On the “live” map, clicking on an area allows the user to drill down and see results for smaller areas.

Page 53: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 54

3-7: “Siftware” Technologies

Although data visualization product vendors seem to enter or leave the market with great frequency, several firms are beginning to develop significant brand loyalty.

Red Brick – Helped category managers at H.E.B. in San Antonio to determine which products to put in which stores. Another application was the consolidation of three old data warehouses at Hewlett-Packard.

Page 54: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 55

Siftware -- Continued

SAS – A large suite of statistical analysis software, which allows detailed analysis of large volumes of data. With its add-on product, Enterprise Miner, SAS represents the largest share of the data analysis/mining market place.

Cognos – A sophisticated and widely used 3-Dimension visualization software package.

Page 55: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 56

Siftware -- Continued

Oracle – A large suite of connectivity products allows transparent access to mainframe databases. Some major customers include John Alden Insurance, ShopKo Stores and Pacific Bell.

Informix – Associated Grocers uses Informix data warehousing products at the heart of its three-tier client-server system.

Page 56: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 57

Siftware -- Continued

Sybase – Sybase Warehouse WORKS is an integrated system designed around the four key functions in data warehousing.

Silicon Graphics – Data mining software is mated to 3-D visualization tools to allow users to fly through data.

IBM – provides a number of decision support tools in its Information Warehouse Solutions.

Page 57: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 58

Visualization in the Aftermath of 9/11

Page 58: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 59

Six Degrees of Separation of Mohamed Attahttp://business2.com/articles/mag/0,1640,35253,FF.html

Page 59: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 60

U.S. Presidential Election 2004

Red Counties=BushBlue Counties=Kerry

Page 60: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 61

U.S.A. City Populationby decade

U.S. Census Bureau

Page 61: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 62

Page 62: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 63

Page 63: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 64

Page 64: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 65

Page 65: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 66

Page 66: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 67

Page 67: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 68

Page 68: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 69

Page 69: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 70

Page 70: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 71

Page 71: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 72

Page 72: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 73

Page 73: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 74

Page 74: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 75

Page 75: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 76

Page 76: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 77

Page 77: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 78

Page 78: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 79

Page 79: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 80

Page 80: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 81

Page 81: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 82

Page 82: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 83

Page 83: © 2003, Prentice-Hall1 Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

© 2003, Prentice-Hall 84

Two Different Primary Goals:Two Different Types of Visualizations

Explore/Calculate

Analyze

Reason about Information

Communicate

Explain

Make Decisions

Reason about Information