building a data warehouse...bring in the sheaves january 13, 2004 educause mid-atlantic conference...

26
ing a Data Warehouse...Bring in the She ing a Data Warehouse...Bring in the She January 13, 2004 January 13, 2004 EDUCAUSE Mid-Atlantic Conference EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Baltimore, Maryland Ella Smith Ella Smith U.S. Department of Agriculture U.S. Department of Agriculture Alan Harmon Alan Harmon U.S. Naval Academy U.S. Naval Academy

Upload: pauline-mckenzie

Post on 16-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Building a Data Warehouse...Bring in the SheavesBuilding a Data Warehouse...Bring in the Sheaves Building a Data Warehouse...Bring in the SheavesBuilding a Data Warehouse...Bring in the Sheaves

January 13, 2004January 13, 2004EDUCAUSE Mid-Atlantic ConferenceEDUCAUSE Mid-Atlantic Conference

Baltimore, MarylandBaltimore, Maryland

Ella SmithElla SmithU.S. Department of AgricultureU.S. Department of Agriculture

Alan HarmonAlan HarmonU.S. Naval AcademyU.S. Naval Academy

January 13, 2004January 13, 2004EDUCAUSE Mid-Atlantic ConferenceEDUCAUSE Mid-Atlantic Conference

Baltimore, MarylandBaltimore, Maryland

Ella SmithElla SmithU.S. Department of AgricultureU.S. Department of Agriculture

Alan HarmonAlan HarmonU.S. Naval AcademyU.S. Naval Academy

Page 2: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Copyright Ella Smith and Alan Harmon, 2004. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the authors. To disseminate otherwise or to republish requires written permission from the authors.

Copyright Ella Smith and Alan Harmon, 2004. This work is the intellectual property of the authors. Permission is granted for this material to be shared for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the authors. To disseminate otherwise or to republish requires written permission from the authors.

Page 3: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Agenda for this SessionAgenda for this SessionAgenda for this SessionAgenda for this Session

• DefinitionDefinition• Overview of the Initial Process (Proof-of-Concept)Overview of the Initial Process (Proof-of-Concept)• Organizational OwnershipOrganizational Ownership• Data Warehouse ArchitectureData Warehouse Architecture• Project Team CompositionProject Team Composition• The ProcessThe Process• Wrap UpWrap Up• QuestionsQuestions

• DefinitionDefinition• Overview of the Initial Process (Proof-of-Concept)Overview of the Initial Process (Proof-of-Concept)• Organizational OwnershipOrganizational Ownership• Data Warehouse ArchitectureData Warehouse Architecture• Project Team CompositionProject Team Composition• The ProcessThe Process• Wrap UpWrap Up• QuestionsQuestions

Page 4: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

What is a Data Warehouse?What is a Data Warehouse?

DefinitionDefinition:: a repository of data derived from a repository of data derived from operational systems or external source; NOT an operational systems or external source; NOT an archivearchive

PurposePurpose: collect and report data in a consistent, : collect and report data in a consistent, centralized manner; mechanism for conducting centralized manner; mechanism for conducting longitudinal analysislongitudinal analysis

StrategyStrategy: Target key applications (Admissions, : Target key applications (Admissions, Registrar, Frozen Files), clean data and load. Registrar, Frozen Files), clean data and load.

Page 5: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Benefits of a Data WarehouseBenefits of a Data Warehouse

Cost SavingsCost Savings by reducing the amount of by reducing the amount of manual time and effort required to compile, manual time and effort required to compile, organize, and report the data.organize, and report the data.

Data ConsistencyData Consistency among the different areas among the different areas since the data will be synchronized upon since the data will be synchronized upon entry into the data warehouse.entry into the data warehouse.

AccessAccess to the information will be faster since to the information will be faster since the process will be automated and available the process will be automated and available online (versus paper reports).online (versus paper reports).

Page 6: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Loading and Cleaning DataLoading and Cleaning DataOpportunity to Integrate, Correct, and Validate DataOpportunity to Integrate, Correct, and Validate Data

DataDataWarehouseWarehouse

DataDataWarehouseWarehouse

Data Extraction and CleaningData Extraction and Cleaning(can be very complex)(can be very complex)

Integrate multiple data sourcesIntegrate multiple data sources Correct data problems (cleanse)Correct data problems (cleanse) Validate DataValidate Data Summarize and roll-up dataSummarize and roll-up data Update MetadataUpdate Metadata

Flat FileFlat FileDataData

SourcesSources

DataDatain Databasesin Databases

DataDatain Databasesin Databases

Live Data SourcesLive Data Sources

ApplicationsApplicationsSAPSAP

PeopleSoftPeopleSoftOracle AppsOracle Apps

ApplicationsApplicationsSAPSAP

PeopleSoftPeopleSoftOracle AppsOracle Apps

Page 7: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Online Analytical ProcessingOnline Analytical ProcessingFast and Selective Access to Summarized DataFast and Selective Access to Summarized Data

REGISTRAR View

FINANCIAL View Ad Hoc View

PROD

MARKET

TIME

ADMISSIONS View

MAJO

RS

CL

AS

S Y

EA

R

TIME

STUDENTSSTUDENTS

Page 8: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

DW Development StrategyDW Development Strategy Think & Plan Big Think & Plan Big

– Build In Small Steps; Don’t build a BARN! Not an Build In Small Steps; Don’t build a BARN! Not an archive systemarchive system

Identify your audienceIdentify your audience Use DW to address new areas, add new capabilities, Use DW to address new areas, add new capabilities,

and fix existing problemsand fix existing problems Retain existing transactional systemsRetain existing transactional systems Iterative development approachIterative development approach

– Address key needsAddress key needs– Rapidly deliver capability to usersRapidly deliver capability to users– Lower riskLower risk

Page 9: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Strategy (continued)Strategy (continued)

Evolve system in manageable phasesEvolve system in manageable phases– Identify questions you need to answer ORIdentify questions you need to answer OR– Look at data to determine questions you can Look at data to determine questions you can

answeranswer StrategyStrategy

– Develop an overall planDevelop an overall plan– Develop common Develop common metadatametadata standards standards– Implement needed pieces mindful of integration Implement needed pieces mindful of integration

and expansion and expansion

Page 10: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Initial Considerations Initial Considerations VisionVision Proof-of-Concept / Phased ApproachProof-of-Concept / Phased Approach BenefitsBenefits StrategyStrategy TimelineTimeline CostCost IssuesIssues

– DataData– Political constraintsPolitical constraints– Organizational FactorsOrganizational Factors

Page 11: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

11stst Step: Proof-of-Concept Step: Proof-of-Concept

Develop a Stand Alone Proof-of-ConceptDevelop a Stand Alone Proof-of-Concept Develop model to demonstrate use of new tools Develop model to demonstrate use of new tools

to end users.to end users. Provide benchmarks for future planning.Provide benchmarks for future planning. Low cost way to “test the waters”Low cost way to “test the waters” Exposes YOUR data and ability to deal with itExposes YOUR data and ability to deal with it

Define number of tasks and deliverables.Define number of tasks and deliverables.

Page 12: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Proof-of-Concept TimelineProof-of-Concept Timeline

6-8 Weeks for each increment6-8 Weeks for each increment– Requirements:Requirements: gather and document gather and document– DataData: identify source, construct model, extract : identify source, construct model, extract

data, cleanse data, transport data to databasedata, cleanse data, transport data to database– Data AccessData Access: user interface, security, training, : user interface, security, training,

documentationdocumentation

Page 13: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Proof-of-Concept TimelineProof-of-Concept Timeline

# TaskWeek

1Week

2Week

3Week

4Week

5Week

61 Project Planning/Management2 Development Environment3 Software Installation4 Data Inventory & Quality Assessment5 Database Design6 Application Design7 Data Scrubbing & Loading8 Application Development9 Testing

11 Documentation12 Rollout Completed Project

Page 14: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

ADMISSIONADMISSION DATADATA

SSNCLASS YEAR

DEMOGRAPHICDEMOGRAPHIC DATADATA

SSNCLASS YEARETHNICITYGENDERHIGH SCHOOLREGION

STUDENT_FACTSTUDENT_FACT

SATVHISATMHIH.S. RANKH.S. CLASS SIZE

TIMETIME

#ACYEARCLASS YEAR

REGISTRARREGISTRAR DATADATA

SSNCLASS YEARGPAMAJOR

Proof-of-Concept

Logical Data ModelLogical Data Model

SSNSSN

SSNSSN

SSNSSN

ACYRACYR

Page 15: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

ADMISSIONADMISSION DATADATA

#ADMISSION_SSNSCORE_CLASS

DEMOGRAPHICDEMOGRAPHIC DATADATA

#DEMO_SSNDEMO_CLASSETHNICITYGENDERHIGH SCHOOLREGION

STUDENT_FACTSTUDENT_FACT

#ADMISSION_SSN#DEMO_SSN#REGISTRAR_SSN#ACYEARSATVHISATMHIH.S. RANKH.S. CLASS SIZE

TIMETIME

#ACYEARCLASS_YEAR

REGISTRARREGISTRAR DATADATA

#REGISTRAR_SSNCLASSGPAMAJOR

Proof-of-Concept

Physical Data ModelPhysical Data Model

SSNSSN

SSNSSN

SSNSSN

ACYRACYR

Page 16: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Post-PoC: DW ArchitecturePost-PoC: DW Architecture

Many types of architectureMany types of architecture– Star schema, Snowflake, HybridStar schema, Snowflake, Hybrid

Depends on:Depends on:– Types of queriesTypes of queries

– Size of databaseSize of database

– Capability of hardware and softwareCapability of hardware and software

Basic Components:Basic Components:– Logical ModelLogical Model

– Physical ModelPhysical Model

Page 17: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Physical Data Warehouse TopologyPhysical Data Warehouse Topology

Admissions

Academic Affairs

Dean of Students

Finance Office

HR

President

Instit Research

WebServerWebServerDatabaseDatabase ServerServerGeneral Public

for remoteconnectivity

Remote Laptop1 Remote Laptop2

Public Affairs

Page 18: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

MetaDataMetaDataDefinition:Definition: Information about your dataInformation about your data

Centralized description of business rulesCentralized description of business rules– Describes data and transformations within DWDescribes data and transformations within DW– Captures changes in business rules over time to Captures changes in business rules over time to

provide a level playing field for comparing dataprovide a level playing field for comparing data Audit trail for data authenticationAudit trail for data authentication Bottom lineBottom line

– Increased trust in DW-based analysis Increased trust in DW-based analysis

resultsresults

Page 19: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Project Team CompositionProject Team Composition

Types of PersonnelTypes of Personnel and and Level of SkillLevel of Skill– Analysis & Design (HIGH)Analysis & Design (HIGH)– Implementation (MED)Implementation (MED)– Test & Quality Assurance (LOW)Test & Quality Assurance (LOW)

Skill = $$$Skill = $$$

Vary Skill by Task to control costVary Skill by Task to control cost

Page 20: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

The Project Model The Project Model “Roles and Responsibilities”“Roles and Responsibilities”

Steering CommitteeSteering CommitteeSteering CommitteeSteering Committee

Project ManagerProject ManagerProject ManagerProject Manager

Quality AssuranceQuality AssuranceQuality AssuranceQuality Assurance

PrgmrPrgmrPrgmrPrgmrModelerModelerModelerModeler DBADBADBADBA Tool PrgmrsTool PrgmrsTool PrgmrsTool Prgmrs EndUserEndUserLiaisonLiaison

EndUserEndUserLiaisonLiaison DocumentationDocumentationDocumentationDocumentation

Planning, Reporting, CertificationPlanning, Reporting, Certification

Joint Client and ConsultantJoint Client and Consultant

Test and Map to RequirementsTest and Map to Requirements

Page 21: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

The Project Model The Project Model “Roles and Responsibilities”“Roles and Responsibilities”

PrgmrPrgmrPrgmrPrgmrModelerModelerModelerModeler DBADBADBADBA Tool PrgmrsTool PrgmrsTool PrgmrsTool Prgmrs EndUserEndUserLiaisonLiaison

EndUserEndUserLiaisonLiaison DocumentationDocumentationDocumentationDocumentation

ScopingScopingScopingScoping InfrastructureInfrastructure InfrastructureInfrastructureScopingScoping

ModelingModeling CleaningCleaning Capacity PlanningCapacity Planning PrototypingPrototyping ModelingModeling

ETLETL ImplementationImplementation ImplementationImplementation BuildingBuilding BuildingBuilding

QAQA QAQA QAQA QAQA QA / TrainingQA / Training

ScopingScoping

ModelingModeling

DocumentationDocumentation

TrainingTraining

Analysis PhaseAnalysis Phase

Architecture PhaseArchitecture Phase

Implementation PhaseImplementation Phase

Transition PhaseTransition Phase

Page 22: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

The Harvest!The Harvest!

Review requirements and results periodicallyReview requirements and results periodically– At end of each phaseAt end of each phase– Annually, taken as a wholeAnnually, taken as a whole

Optimize data warehouseOptimize data warehouse– Response based on queries and loadResponse based on queries and load– Bring in-line with operational systemsBring in-line with operational systems

Review and AdjustReview and Adjust the DW mission as the DW mission as institutional mandates changeinstitutional mandates change

Page 23: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Cost ControlCost Control

Start small and develop in phasesStart small and develop in phases Bring in skill sets as needed Bring in skill sets as needed

remember: $$$ = (Skills) x (period of time) remember: $$$ = (Skills) x (period of time) Institutional staff should know the dataInstitutional staff should know the data Organizational issues need to be resolved by the Organizational issues need to be resolved by the

Project Manager and Steering CommitteeProject Manager and Steering Committee

Page 24: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

AccountabilityAccountability

MUST show results MUST show results (standard or adhoc reports)(standard or adhoc reports) Ensure complete documentation Ensure complete documentation to maintain to maintain

responsibility and association of data to responsibility and association of data to departmentsdepartments

Establish a Return-on-Investment (ROI)Establish a Return-on-Investment (ROI) whether whether tangible (number of reports) or intangible tangible (number of reports) or intangible (executive support/decision making)(executive support/decision making)

Page 25: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

IssuesIssues

SecuritySecurity PerformancePerformance Managing the metadataManaging the metadata Managing the data warehouseManaging the data warehouse Hardware/software configurationHardware/software configuration ResourcesResources Staying in the loop!Staying in the loop!

Page 26: Building a Data Warehouse...Bring in the Sheaves January 13, 2004 EDUCAUSE Mid-Atlantic Conference Baltimore, Maryland Ella Smith U.S. Department of Agriculture

Building a Data WarehouseBuilding a Data Warehouse