© 2003, prentice-hall chapter 7 - 1 chapter 7: the future of data mining, warehousing, and...

27
© 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

Upload: lizbeth-powers

Post on 05-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 1

Chapter 7: The Future of Data Mining, Warehousing, and Visualization

Modern Data Warehousing, Mining, and Visualization: Core Concepts

by George M. Marakas

Page 2: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 2

0%

2%

4%

6%

8%

10%

12%

14%

16%

18%

Types of Data Mined in 2001

Series1 14% 14% 17% 8% 7% 3% 2% 1% 16% 18%

w eb clickstream

(45)

w eb content

(44)text (55)

know ledge bases (26)

XML data (24) images (11)

audio/video (5)

CAD/CAM data (3)

time series (53)

other complex data (59)

Page 3: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 3

0%

2%

4%

6%

8%

10%

12%

14%

Areas of Data Mining in 2002

Series1 13% 8% 11% 10% 1% 11% 6% 4% 4% 6% 6% 6% 2% 8% 5%

Banking (56) Genetics

Direct Marketing eBiz

Entertainment (3)

Fraud Detection

(46)

Insurance (27) Investing Mfg.

Pharmaceuticals (24)

Retail (27)Science

(25)Security

(8) Telecom Other (23)

Page 4: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 4

0%

2%

4%

6%

8%

10%

12%

14%

Current Data Mining Activities - Mid-Year 2002

Series1 13% 5% 7% 9% 2% 8% 6% 3% 5% 5% 6% 8% 2% 3% 9% 7% 1%

Banking (77) Genetics

Direct Marketing

(42)eBiz(53)

Entertainment (10)

Fraud Detection

(51)

Insurance (36)

Stocks (17) Mfg. (28)

Pharmaceuticals (31)

Retail (36)

Scientif ic data (51)

Security (14)

Supply Chain (21)

Telecom (56)

Other (44) None (9)

Page 5: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 5

7-1: The Future of Data Warehousing

As a DW becomes a mature part of an organization, it is likely that it will become as “anonymous” as any other part of the IS.One challenge to face is coming up with a workable set of rules that ensure privacy as well as facilitating the use of large data sets.Another is the need to store unstructured data such as multimedia, maps and sound.The growth of the Internet allows integration of external data into a DW, but its varying quality is likely to lead to the evolution of third-party intermediaries whose purpose is to rate data quality.

Page 6: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 6

Integrated Architecture

Historically, market and business forces have moved organizations toward ineffective nonintegrated DW systems (next slide).

Far too often, a “silo” DW simply replaces a silo OLTP system.

To survive in a future world of low-cost, turnkey application systems, the transition to a federated architecture (two slides ahead) must be made.

Page 7: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 7

Typical Nonintegrated Information Architecture

i2 Supply Chain Oracle Financials

Oracle Financial

DW

Marketing DW

Supply Chain

Data Mart

Subset Non-Architected Data Marts

Siebel CRM 3rd Party Data

Page 8: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 8

Federated Integrated Information Architecture

Federated Financial

DWFederatedMarketing

DW

FederatedSupply Chain

Data Mart

Subset Non-Architected Data Marts

i2 Supply Chain Oracle Financials Siebel CRM 3rd Party Data

Common Data Staging

Area

Page 9: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 9

7-2: Alternate Storage and the Data Warehouse

Surprisingly, the future of data warehousing is not high-performance disk storage, but an array of alternative storage.

This involves two forms of storage. Near-line storage involves an automated silo where tape cartridges are handled automatically.

Secondary storage is slower and less expensive, such as CD-ROMs and floppy disks.

Page 10: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 10

Device Capacity Data Access Speed Media LifetimeWrite once or Write many

DAT DDS2 4-8 Gbyte 510 Kbyte/s 10-25 Yrs WM

DAT DDS3 12-24 Gbyte 1 Mbyte/s 10-25 Yrs WM

CD-ROM 640 Mbyte X times 1.5 Mbits/s to Read 10 Yrs Plus WO

CD-RW 640 Mbyte X times 1.5 Mbits/s to Read 10 Yrs Plus WM

Exabyte 20-40 Gbyte 3-6 Mbyte/s 10-25 Yrs WM

DLT 35 Gbyte 5 MByte/s 30 Yrs WM

DVD up to 15Gbyte Not Known Not Known WO

DTF 42Gbyte 12 Mbyte/s 10-25 Yrs WM

Data D3 50 Gbyte 12 Mbyte/s 10-25 Yrs WM

DVD-RAM up to 3 Gbyte Not Known Not Known WM

Magneto-optical 2.6-5.6 Gbyte Not Known Not Known WM

Speed and Capacity of Various Near-Line Storage Media

Page 11: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 11

Typical Near-Line Tape Storage Silo

Close-up of tape storage carouselView Through Silo Entry Door

Robotic Tape Retrieval ArmMain View

of the Tape Silo

Page 12: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 12

Why Use Alternative Storage?

1. The data in a DW are stable. They are placed there once and left alone, so do not need to be updated at high speed.

2. The queries that operate on the DW data often require long streams of data stored sequentially. Operational access requires different units of data from different storage areas.

3. The DW is of indeterminate size and is always increasing in volume, requiring flexible capacity.

4. When data gets accessed less often as it ages, it can be moved to secondary storage, making access to newer data more efficient.

Page 13: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 13

To make this two-level

storage work, we need both an activity monitor

(shown here) and a cross

media storage monitor.

Page 14: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 14

7-3: Trends in Data Warehousing

Customer interaction and learning relationships require capturing information “everywhere” and massive scalability.

Enterprise applications generate data that is doubling very 9-12 months.

The time available for working with data is shrinking and the need for 24×7 access is becoming the norm.

Fast implementation and ease of management are becoming more and more important.

In the future, more organizations will build Web applications that operate in conjunction with the DW.

Page 15: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 15

7-4: The Future of Data Mining

As promising as the field may be, it has pitfalls:The quality of data can make or break the

data mining effort. In order to mine the data, companies first

have to integrate, transform and cleanse it.To obtain value from data mining,

organizations must be able to change their mode of operation and maintain the effort.

Finally, there are concerns about privacy.

Page 16: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 16

Personalization versus Privacy

Companies that use data mining for target marketing walk a tightrope between personalization and privacy.

Implementation of the recent FTC guidelines about information practices can be a problem since companies often do not know how they will use information ahead of time.

Further, technology appears to create new ways to acquire information faster than the legal system can handle the ethical and property issues.

Nonetheless, many view information as a natural resource that should be managed as such.

Page 17: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 17

Concept of Personal and Corporate Information as a National Resource

CorporateData Warehouse

and Affiliated

Data Marts

Page 18: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 18

7-5: Using Data Mining to Protect Privacy

While Internet use has grown, so have the problems of network intrusion.

One current intrusion detection technique is misuse detection – scanning for malicious activity patterns known by signatures.

Another technique is anomaly detection where there is an attempt to identify malicious activity based on deviations from norms.

Most intrusion detection systems operate by the signature approach.

Page 19: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 19

Shortfalls of Current Detection Schemes

Variants – although signature lists are updated frequently, minor changes in the exploit code can produce a “new” intruder.

False positives – a detection system may be too conservative and declare an intrusion when there is none.

False negatives – an intrusion won’t be detected until a signature has been identified.

Data overload – as traffic grows, the ability to find new hacks becomes harder and harder.

Page 20: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 20

How Can Data Mining Help?

Data mining can help mainly by its ability to identify patterns of valid network activity. Variants – anomalies can be detected by

comparing connection attempts to lists of know traffic.

False positives – data mining can be used to identify recurring patterns of false alarms.

False negatives – if valid activity patterns are identified, invalid activity will be easier to spot.

Data overload – data reduction is one of the major features of data mining.

Page 21: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 21

7-6: Trends Affecting the Future of Data Mining

While the available data increases exponentially, the number of new data analysts graduating each year has been fairly constant. Either of lot of data will go unanalyzed or automatic procedures will be needed.

Increases in hardware speed and capacity makes it possible to analyze data sets that were too large just a few years ago.

The next generation Internet will connect sites 100 times faster than current speeds.

To be more profitable, businesses will need to react more quickly and offer better service, and do it all with fewer people and at a lower cost.

Page 22: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 22

7-7: The Future of Data Visualization

Weapons performance and safety – data visualization coupled with simulation models can show how weapons perform under typical conditions and the effect of weapons aging.

Medical trauma treatment – today’s surgeons use computer vision to assist in surgery. In the future this trend suggests that local medical personnel can also be assisted from afar by specialists through telepresence.

Page 23: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 23

Visualization of a Simulated Warhead Impact

Page 24: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 24

Augmented-reality Headset Worn by Surgeon

Page 25: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 25

Surgery Being Conducted Via Telepresence

Page 26: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 26

7-8: Components of Future Visualization Applications

The data visualization environment links the critical components and enables the smooth flow of information among the components.In the future, the bounds between computers, graphics and human knowledge will become more blurred.Many advances in technology will be need to handle the visualization environment of the future. Intelligent file systems and data management software will contend with thousands of coupled storage devices.

Page 27: © 2003, Prentice-Hall Chapter 7 - 1 Chapter 7: The Future of Data Mining, Warehousing, and Visualization Modern Data Warehousing, Mining, and Visualization:

© 2003, Prentice-Hall Chapter 7 - 27

Conceptual Mapping of an Information Architecture

EnterpriseMetadatabase

Visualization Environment

VisualInterpreter

VisualizationInterface

ManagementSystem

EnterpriseMetadata System

Metadata BrowserGlobal Query System

System SimulationInformation Modeler

ENTERPRISE NETWORK