agenda: 01/30/2014

33
Agenda: 01/30/2014 Discuss class administration items Website and WebCampus Use Declaration of teams for Internal Data Project #1 (by 2/6) Structure of class: Skills (doing) and Concepts (knowing) Today Concepts: Data warehousing systems Skills: Review of transaction database design in preparation for learning data warehouse design next week

Upload: zinnia

Post on 08-Feb-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Agenda: 01/30/2014. Discuss class administration items Website and WebCampus Use Declaration of teams for Internal Data Project #1 (by 2/6) Structure of class: Skills (doing) and Concepts (knowing) Today Concepts: Data warehousing systems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Agenda:  01/30/2014

Agenda: 01/30/2014Discuss class administration items

Website and WebCampus UseDeclaration of teams for Internal Data Project #1 (by 2/6)Structure of class: Skills (doing) and Concepts (knowing)

TodayConcepts: Data warehousing systemsSkills: Review of transaction database design in preparation for learning data warehouse design next week

Page 2: Agenda:  01/30/2014

Purpose of the course readingsBusiness Intelligence: A Managerial Perspective (BIM)

Introduce and explain conceptsProvide examples

Delivering Business Intelligence (DBI)Learn how to use SQL Server business intelligence productsLearn a little conceptual information about data warehousing systems

Readings on WebCampus (numbered)Learn specific skills not in BIM or DBI. For example readings #1 and #2 focus on learning how to design a data warehouse.Learn concepts not in BIM or DBI. For example, reading #4 discusses information visualization methods.

Page 3: Agenda:  01/30/2014

A Business Intelligence “System”A business intelligence system encompasses all processes, hardware and software necessary to extract data, transform it, integrate it, save it, and provide information made accessible by users to support decision making.Some people equate the terms “data warehousing system” and “BI system”. Others believe that a data warehousing system is a type of BI system, using BI system as the umbrella term.

Page 4: Agenda:  01/30/2014

What are the components of BI system?

Data warehouseStructured, unstructured, internal, external, transaction-level, and derived data.Data storage repository.

Extract, transform and load methodsMethods of loading accurate and consistent data into the data warehouse.Methods of integrating data from disparate sources.

Metadata repositoryData definitions and meanings.Business rules and process decisions.

Analytical toolsOLAP: Online Analytical ProcessingStatistical analysis.Data Mining.

Data Visualization/End-User Presentation ToolsDashboards.Graphics, tables, pictures.

Page 5: Agenda:  01/30/2014

DataSources

ERP

Legacy

POS

OtherOLTP/wEB

External data

Select

Transform

Extract

Integrate

Load

ETL Process

EnterpriseData warehouse

Metadata

Replication

A P

I

/ M

iddl

ewar

e Data/text mining

Custom builtapplications

OLAP,Dashboard,Web

RoutineBusinessReporting

Applications(Visualization)

Data mart(Engineering)

Data mart(Marketing)

Data mart(Finance)

Data mart(...)

Access

No data marts option

Page 6: Agenda:  01/30/2014

Our example data warehousing system is SQL Server 2012

SQL Server 2012 offers an integrated set of products to create a data warehousing system.Data warehouse: SQL Server Database BaseETL Product: SQL Server Integration ServicesOLAP Product: SQL Server Analysis ServiceVisualization Product: SQL Server Reporting Services

Page 7: Agenda:  01/30/2014

DataSources

ERP

Legacy

POS

OtherOLTP/wEB

External data

Select

Transform

Extract

Integrate

Load

ETL Process

EnterpriseData warehouse

Metadata

Replication

A P

I

/ M

iddl

ewar

e Data/text mining

Custom builtapplications

OLAP,Dashboard,Web

RoutineBusinessReporting

Applications(Visualization)

Data mart(Engineering)

Data mart(Marketing)

Data mart(Finance)

Data mart(...)

Access

No data marts option

Integration Services (SSIS)

SQL Server database (SSDB)

Analysis Services(SSAS)

Reporting Services (SSRS)

Page 8: Agenda:  01/30/2014

What are the steps of the tutorials?Build the database through Management Studio.Populate the database through SSIS.Create a data mart “cube” with SSAS.Look at the “cube” with SSRS.Look at the “cube” with a pivot table in Excel.Create a visualization method with Tableau Software.Use the SQL Server data mining tools.

Page 9: Agenda:  01/30/2014

Data warehouse system architectureContinuum of choices.Basic architectural issues:

Where is the data stored? Within the operational data stores and then concatenated on the fly?Within a centralized data warehouse that is optimized for decision making?Within a series of data marts?

When is the data cleaned (or whether the data is cleaned at all)?How is the data accessed?

Page 10: Agenda:  01/30/2014

Operationaldatabase

Operationaldatabase

External data source

User departments

Data mart

Data mart

Data martExtract,

Transform and Load Processes

Data mart extraction data warehouse

Page 11: Agenda:  01/30/2014

Two-tier data warehouse architecture

Data warehouse

Operationaldatabase

Operationaldatabase

Externaldata source

EDM

Summarizeddata

Transformationprocess

Data warehouseserver

User departments

Page 12: Agenda:  01/30/2014

Three-tier data warehouse architecture

Data warehouse

Operationaldatabase

Operationaldatabase

External data source

EDM

Summarized data

Transformation process

Data warehouse server

User departments

Data mart

Data mart

Data mart tier

Extraction process

Page 13: Agenda:  01/30/2014
Page 14: Agenda:  01/30/2014

What is an operational data store?An operational data store consolidates data from multiple source systems and provides a near real-time, integrated view of volatile, current data. Its purpose is to provide integrated data for operational purposes. It has add, change, and delete functionality.Sometimes they are created to avoid a full blown ERP implementation.

Page 15: Agenda:  01/30/2014

Factors that may affect the architectural decision

• Information interdependence among organizational units

• Upper management’s information needs

• Urgency of need for a data warehouse

• Nature of end-user tasks

• Identified role of the data warehouse prior to implementation

• Compatibility with existing systems

• Perceived ability of the in-house IT staff

• Social/political factors

Page 16: Agenda:  01/30/2014

Remember the five components?Data warehouse

Structured, unstructured, internal, external, transaction-level, and derived data.Data storage repository.

Extract, transform and load methodsMethods of loading accurate and consistent data into the data warehouse.Methods of integrating data from disparate sources.

Metadata repositoryData definitions and meanings.Business rules and process decisions.

Analytical toolsOLAP: Online Analytical ProcessingStatistical analysis.Data Mining.

Data Visualization/End-User Presentation ToolsDashboards.Graphics, tables, pictures.

Page 17: Agenda:  01/30/2014

What is a data warehouse?A data warehouse is a collection of data gathered into a database specifically designed to support decision making.Types of decisions supported by data warehouses:

OperationalShort-termLong-termStrategic

An organization may have one or may have multiple data warehouses designed to suit multiple applications and/or decisions.

Page 18: Agenda:  01/30/2014

Characteristics of a data warehouseSubject-oriented.

Integrated.

Time-variant.

Non-volatile.

Page 19: Agenda:  01/30/2014

Other potential characteristics of DWSummarized (or not...)Not normalized (or normalized...)Web based (or not...)Real-time (or batch...)Single version of truth (or one of many...)Enterprise-wide (or not…)

Page 20: Agenda:  01/30/2014

What is an enterprise data warehouse?

A data warehouse that is created to encompass multiple subject areas.Is usually normalized.Can be used for decision making in multiple organizational areas.

Page 21: Agenda:  01/30/2014

What is a data mart?A data mart stores data for a limited number of subject areas.An “independent” data mart is loaded directly from operational data.A “dependent” data mart is loaded from an enterprise data warehouse.Usually not normalized.

Page 22: Agenda:  01/30/2014

SQL Server Database (SSDB)Relational database management system.

Aligns with rules of a relational DBMS.Transact-SQL.

Includes metadata repository.SQL Server Management Studio.

Accessible from UNR COB labs through remote desktop; a college resource rather than a university resource.Remote desktop server: sts.coba.unr.eduSQL Server instance: ISSQL\students

Page 23: Agenda:  01/30/2014

MaxMin and SQL Server BI (DBI pg. 21)

Page 24: Agenda:  01/30/2014

DBI: pg. 108

Page 25: Agenda:  01/30/2014

Can use SQL CREATE statements or follow the wizard instructions in the book.Issues to be aware of:

No constraints other than primary keys.Referential integrity is not maintained.

BI Tutorial #1: Building the database

Page 26: Agenda:  01/30/2014

Metadata repositoryA metadata repository contains information about all data objects stored in the data warehouse.Contains the following segments:

Business segment: describes the business definition of the data element. This is frequently the context/meaning for the data element.Technical segment: describes the computer-related technical properties of each element (size, data type, unit of measure, etc.).Process segment: describes how the element is processed before being placed in the data warehouse.Usage segment: describes the relative usage of the element including who accesses it and how often and in what manner. Used for performance tuning.

Page 27: Agenda:  01/30/2014

Extract, Transform, Load (ETL)Extract

Take data from source systems. May require middleware to gather all necessary data.

TransformationPut data into consistent format and content.Validate and fix data – check for accuracy, consistency using pre-defined and agreed-upon business rules.Convert data as necessary.

LoadUse a batch (bulk) update operation that keeps track of what is loaded, where, when and how. Keep a detailed load log to audit updates to the data warehouse.

Page 28: Agenda:  01/30/2014

Online analytical processing toolsProvides multi-dimensional data analysis techniques.Works primarily with data aggregation.Provides advanced statistical analysis.Provides advanced graphical output.Supports access to very large databases.Provides enhanced query optimization algorithms.The key objective of basic OLAP functionality is to speed up query processing.

Page 29: Agenda:  01/30/2014

Data mining toolsData mining tools:

analyze the data; uncover patterns hidden in the data; form computer models based on the findings; anduse the models to predict business behavior.

Proactive tools, used for prediction and discovery of behavior.Some are based on standard statistical tools of correlation, regression, factor analysis and structural equation modeling.Most are based on artificial intelligence software such as decision trees, neural networks, fuzzy logic systems, inductive nets and classification networking.

Page 30: Agenda:  01/30/2014

Contrast between OLAP and Data Mining Decision Support QuestionsOLAP Data Mining

Which customers spent the most with us in the past year?

Which types of customers are likely to spend the most with us in the coming year?

How much did the bank lose from loan defaulters within the past two years?

What are the characteristics of the customers most likely to default on their loans before the year is over?

What were the highest selling fashion items in our San Diego stores?

What additional products are most likely to be sold to customers who buy shorts?

Which store/ location made the highest sales in the past year?

In which area whould we open a new store next year?

Page 31: Agenda:  01/30/2014

Data visualization/End-User Presentation

Text portrayal of data.Tables.

Graphical portrayal of data.Graphics include:

Standard graphics (bar chart, pie chart, line chart, etc.)PicturesScatter diagrams combined with pictures.Animation.

Cool 3-D images…Audio.Video.

Page 32: Agenda:  01/30/2014
Page 33: Agenda:  01/30/2014

ToolsExcel!!Query generatorsReport writersDashboards (we will use Tableau Software)A very long list of possibilities from a very long list of ever-evolving vendors!