cash registers & satellites briefing to the 2006 noaatech conference november 2, 2005 stan...

28
Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler [email protected] 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

Upload: baldric-melton

Post on 20-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

Cash Registers & Satellites Briefing to the 2006 NOAATech Conference

November 2, 2005

Stan Cutler

[email protected] 301-457-5210 x 163

Mitretek Systems/NESDIS/OSD

Page 2: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

2

Improve communication between NOAA’s developers and the wider community of data management professionals

– Introduce vocabulary

– Identify NOAA applications that can be described using common vocabulary

Purpose

Page 3: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

3

Agenda

Universal Data Management Challenges Notional Data Warehouse Architecture Data Modeling Approaches

– Relational

– Dimensional

Page 4: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

4

I. Universal Data Management Challenges

Page 5: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

5

Data Mining Example: “Market Basket Analysis”

Decisions:1) Move beer display closer to the diaper display 2) On Thursdays, sell beer & diapers at full price

Rationale:1) When men bought diapers on Thursdays and Saturdays, they

also tended to buy beer2) Men typically did their weekly grocery shopping on Saturdays3) On Thursdays, they only bought a few items

Page 6: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

6

Many Disciplines Mine Their Data

Law Enforcement - Optimal Deployment Health Care – Coverage Risks E-Commerce – Pop-up/Link Selection Medicine – Gene/Disease Associations Etc.

Data Management GoalDevelop systems in which the data and procedures are

configured to answer questions that are important to the enterprise

Page 7: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

7

Integrating Global (Environmental Observations) and Data Management

Ensuring Sound, State-of-the-Art (Research) Developing, Valuing, and Sustaining a World-Class

Workforce

NOAA’s Future

We are not unique. Any enterprise that collects large amounts of data has the same kind of challenges and goals

Page 8: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

8

Ask the same kinds of questions as those challenged with similar problems

Understand the constructs and vocabulary– Architectures – Data Modeling

We can find valuable expertise outside the NOAA community

Page 9: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

9

II. Notional Data Warehouse Architecture

Page 10: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

10

“Hub and Spoke Architecture”

Application Specific “Data Marts”

use ”OLAP” Technologies()

DataStagingArea

DataWare-houseExternal

Data

InternalData

Transform&

“Cleanse”

Application Neutral

“ETL” = Extract, Transform and Load

“OLAP” = Online Analytical Processing

Page 11: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

11

Retail ApplicationHub and Spoke Architecture

OLAP Data Marts(Application Specific)

DataStagingArea

DataWare-houseExternal

CustomerLists

SalesData

Transform&

Cleanse

Application Neutral

Marketing

FloorManagement

Human Resources

RealEstate

Accounting

Page 12: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

12

Notional NOAA Hub and Spoke Architecture

NOAA Applications(Data Marts using OLAP)

DataStagingArea(RichInventory?)

DataWare-house

Other SatelliteArchives

CLASS

Transform&

Cleanse

Application Neutral

ClimatePrediction

WeatherForecast

EcosystemsManagement

Commerce &Transportation

ExternalCustomers

ESPC

Data Centers

Page 13: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

13

III. Data Modeling Approaches

Page 14: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

14

“Relational” Vocabulary “Relational” technologies

– Relational Data Base Management Systems (RDBMS)• COTS Products (INFORMIX, DB2, ORACLE, MS/SS, etc.)• Proprietary data management/manipulation software

– RDBMS Extensions (Most COTS products built on an RDBMS) • GUIs, CASE Tools, COOP, Application Generators, Security, etc.

“Relational” Data Models - Evolutionary approach to data base design

• Conceptual Entity Relationship Diagrams (ERD) used to identify data requirements, relationships, rules

– Diagrams– Data Dictionaries

• Logical ERDs used to normalize (eliminate redundancies)• Physical models are the Table Schema entered into the RDBMS

Online Transaction Processing (OLTP) – e.g., CLASS

Page 15: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

15

Entity Relationship Diagram (ERD)

key..…

key..…

key..…

key..…

Entity

Relationship

Attributes

Cardinality(1, Many, or 0)

The foundation of all OLTP systems, such as CLASS

Attributes, entities, and relationships are described in the data dictionary

EntityClass

Page 16: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

16

Object Models “inherit” ERD constructs

key..…

key..…

key..…

ObjectClass

key

Behavior:>>>>>>>>

Page 17: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

17

Pros & Cons of systems based on Relational models

Strengths – Referential integrity

– Data locking

– Fast Look-up and Retrieval

– GUIs Weaknesses

– Entity proliferation

– Users don’t understand them

– Complex code must be written to accumulate multiple instances (Hard to use for Data Mining)

Page 18: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

18

Dimensional Data Models

Fact– An instance of numeric data

Dimension– Foreign key

Fact Table– Key is a concatenation of foreign keys (dimensions)

– An instance can have dozens of foreign keys

– Millions of instances (rows) often required Programmers revenge on Data Base Administrators

– Break many relational “rules”

– Re-invented often

Page 19: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

19

A “Dimensional” Data Model for Retailing

Who (buys, sells) – Customer (age, gender, marital status, occupation, etc.)– Sales person ( “ , “ , training, etc.)– Cash Register

What (products) – Brand, color, size, type, etc

When – Time of day, day of week, season

Where – Store (location, size, type), Shelf

Why– Promotions, advertising, discounts, economic trends

How much (was spent)– Per product, per total sale

Page 20: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

20

Classical Star Schema: Point of Sale

Clerk_key ClerkNameJobGradeEtc.

Clerk Dimension

Time_keyCustomer_keyStore_keyClerk_keyPromo_keyProduct_keyRegister_keyDollars SoldUnits SoldDollars Cost

Register_key LocationTypeEtc.

Register Dimension

Promo_key PromoNamePriceTypeAdTypeEtc.

Promo Dimension

Product_keyDescriptionBrandSub CategoryCategoryDeptFlavorPackage Type

Product Dimension

Time_keyDayofWeekFiscal period

Time Dimension

Customer_keyCustomerNamePurchase ProfileEtc.

Customer Dimension

Store_keyStoreNameAddressFloorTypeEtc.

Store Dimension

FACT

Page 21: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

21

Snowflake Schema: Point of Sale

Register_key LocationTypeEtc.

Register Dimension

Clerk_key ClerkNameJobGradeEtc.

Clerk Dimension

Time_keyCustomer_keyStore_keyClerk_keyPromo_keyProduct_keyRegister_keyDollars SoldUnits SoldDollars Cost

Promo_key PromoNamePriceTypeAdTypeEtc.

Promo Dimension

Product_Type_PKProduct_Type_Desc

Product Dimension

Time_keyDayofWeekFiscal period

Time Dimension

Customer_keyCustomerNamePurchase ProfileEtc.

Customer Dimension

Store_keyStoreNameAddressFloorTypeEtc.

Store Dimension

FACT Sub-Type_PKSub-Type-Desc

Sub-Type_PKSub-Type-Desc

Sub-Type_PKSub-Type-Desc

Model-Num_PKModel-Desc

Brand-ID_PKMaker-Desc

Sub-Type_PKSub-Type-Desc

Model-Num_PKModel-Desc

Brand-ID_PKMaker-Desc

Page 22: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

22

Metadata in Dimensional Modeling

NOAA usage:– If it’s not a fact

– If it’s not a key

– It’s metadata Conventional Dimensional usage:

– If it’s not a fact

– If it’s not a key

– It’s documentation

BUT

– If it’s a key

– It’s metadata (because it describes the fact)

Page 23: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

23

Dimensional Models for NOAA Which

– Satellite– Instrument

When – Orbit, UTC, Season, decade, epoch, etc

Where – Geospatial coordinates

Who– User affiliation– Developer affiliation

FACT: How much? – Temperature, moisture, radiance, color, etc.

Page 24: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

24

A NOAA Star Schema?

Altitude_ key Distance above SLEtc.

Altitude Dimension

Time_key (fk)Location-key (fk)Altitude key (fk)Product_key (fk)Satellite_key (fk)Instrument_key (fk)

Temperature

Satellite_key NamePosition

Satellite Dimension

Instrument_key NameDescription

Instrument Dimension

Product_keyProduct NameDescriptionSystemSub SystemEtc.

Product Dimension

FACT TABLE

Time_keyUTC of Obs’nUTC of receipt LocalT of Obs’nOrbit_IdEtc.

Time Dimension

Location keyGeo-Coordinates of Obs’n Etc.

Location Dimension

Page 25: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

25

Pros & Cons of systems based on dimensional models

Strengths– Very few “entity types” needed

– Decision Support Systems (DSS)• End-Users construct complex queries by selecting dimensions from a GUI

• Statistical analysis of very large data bases

– Artificial Intelligence (AI) • Automated scheduling of continuous executions

• System identifies (“discovers”) new relationships

• Discoveries shape successive execution

Weaknesses – Development Cost

– Storage

– Operational Cost - Requires much “care and feeding”

Page 26: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

26

False Dichotomy: Relational “vs.” Dimensional

Relational and dimensional systems are not mutually exclusive – Data warehouses usually extract fact tables from relational

data bases

– Data warehouse capabilities are extensions in RDBMSs Depends on the business

– Feasibility: Is the application data good enough for ETL?

– ROI: Does the business benefit outweigh the cost?

Page 27: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

27

SUMMARY:

NOAA’s data mining challenge is similar to that of other enterprises

A world-wide community of IT professionals uses a particular vocabulary to address the challenge

Relational technologies & models are the essential first step

Dimensional technologies & models come next

Page 28: Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler Stanley.cutler@noaa.gov 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

28

Questions

Stan CutlerMitretek System/NESDIS/[email protected] ex 163