data warehousing:
DESCRIPTION
Data Warehousing:. Jeffrey T. Edgell Assistant Executive Director ManTech Advanced Systems International. Decision Support. - PowerPoint PPT PresentationTRANSCRIPT
Data Warehousing:
Jeffrey T. Edgell
Assistant Executive Director
ManTech Advanced Systems International
IManTech Advanced Systems International, Inc.
Decision Support
In order to make correct decisions, accurate, meaningful informationinformation about business environments, external issues, and internal workings must be available in a timely fashion.
A Need For New Technology
• Government and industrial entities have been collecting data in electronic format since the 1960s.
• Today, organizations collect millions of pieces of information about every aspect of their operation on a daily basis.
• Data is obtained from multiple disparate sources.
A Need For New Technology
• Often information is replicated, leading to confusion.
• Related data is often retained in seemingly heterogeneous and incompatible platforms.
• Common data attributes are represented in nonstandard formats and naming constructs across systems.
A Need for New Technology
• Most systems are built for data collection (transaction based).
• Designed to support On-Line Transaction Processing (OLTP).
• Designed to support day-to-day business operations.
• Very specific applications built to support interaction with the data.
A Need for New Technology
• Perform best when handling small specific volumes of data.
• Does not accept information from dissimilar sources readily.
• Are not constructed to handle analysis of large amounts of data efficiently.
A Need for New Technology
• Capable of answering questions of a specific nature and time frame.– How many items do I have in stock today?– How many tickets were sold on a specific date?– What is the current price of an item?
A Need for New Technology
• Transaction based systems experience great difficulty in answering analytical and decision support questions.
• Analysis takes a long time, interfering with:– transaction performance– daily operations
• The nature of the data is dynamic and dispersed.
A Need for New Technology
Most organizations have created a “spider web” of systems and data sources.
A Need for New Technology
All of this has created “data overload” and “data confusion”.– What do I do with all of this data?– What does it mean?– Do I really need this data?– I am overwhelmed with the amount of data I am
confronted with.– I cannot make a timely decision (too much data
from too many sources).
Data Warehousing 101
Data warehousing is: A large historical databaselarge historical database designed to accept
key analytical datakey analytical data from multiple andmultiple and disparate sourcesdisparate sources that manage the day-to-day management of enterprise data. Furthermore, the role of the warehouse is to transform transactiontransaction data into corporate informationcorporate information. The warehouse is provided in a read-only fashion to a user.
Data Warehousing 101
A data warehouse will provide:
The ability to ask business analysis questionsbusiness analysis questions in a real-time, iterative fashion,real-time, iterative fashion, obtaining decision support information readily and quickly.
Data Warehousing 101
• A data warehouse is notis not: A repository for all corporate data.
• A data warehouse will notwill not: Single handedly solve all of the problems
associated to an enterprise.
Data Warehousing 101
Key components include the following:– data model
– data storage architecture (relational, proprietary)
– data access/replication/transport
– data transformation and scrubbing
– staging and publication
– metadata
– warehouse hardware and software
Data Warehousing 101
Legacy Data
Legacy Data
External Data Source
DataTransformation
Data Warehouse
Information Directory Repository
Data Warehouse Management Layer
Data Warehouse Architecture
Data Warehousing 101
How a warehouse deals with aging data:
1993 QtrlyData
1994 QtrlyData
1995 QtrlyData
1996 MonthData
1997 MonthData
1998 DetailData
Aggregation
Data Warehousing 101
Model concepts:– Fact table(s)
• A table containing multiple measurable descriptors relating to a specific area of business
• Each fact can be viewed, calculated, and aggregated against various defining areas of the business (time, geography, customer)
Data Warehousing 101
Model concepts:– Dimension Table(s)
• Retains information (product description, geography description, customer description) that is descriptive and remains moderately constant over time
Data Warehousing 101
Data Warehouse Modeling– Special modeling techniques must be applied to
provide rapid response of queries on large volumes of data.
– OLTP systems are built with update operations in mind, resulting in normalization and greatly reduced browse performance.
Data Warehousing 101
Common data model techniques are as follows:– star schema– snowflake– fact constellation– relational
Data Warehousing 101
Sample Star Schema ModelSample Star Schema Model
SALES
TIME
STORE CUSTOMER
GEOGRAPHY
Sales FactsSales Facts
DimensionsDimensions DimensionsDimensions
Data Warehousing 101
Sample Snowflake ModelSample Snowflake Model
SALES
TIME
STORE CUSTOMER
GEOGRAPHY
Sales FactsSales Facts
DimensionsDimensions DimensionsDimensions
North
South
East
West
Month
Qtr
Year
East Region
West Region
Data Warehousing 101
TIME
STORE
DimensionsDimensions
CUSTOMER
GEOGRAPHY
DimensionsDimensions
Regional Sales
District Sales
Store Sales
Sample Fact Constellation ModelSample Fact Constellation Model
Data Marting 101
Data marting is: A functional segmentfunctional segment of an enterprise
restricted for purposes of security, locality, performance, or business necessity using modeling and information delivery techniques identical to data warehousing.
Data Marting 101
Why build a data mart?– Allows an organization to visualize the large
but focus on the small and attainable.– Provides a platform for rapid delivery of an
operational system.– Minimizes risk.– A corporate warehouse can be constructed from
the union of the enterprise data marts.
Data Marting 101
DataWarehouse
Data From
Transaction Sources
Financial
Data Mart
Logistics
Data Mart
Contract
Data Mart
Update From theWarehouse
The data warehouse
populates
the data marts.
Data Marting 101
DataWarehouse
Data FromTransaction Sources
FinancialData Mart
LogisticsData Mart
ContractData Mart
Update From theData Marts
The data marts populatethe data warehouse.
Data Marting 101
Data FromTransaction Sources
FinancialData Mart
LogisticsData Mart
ContractData Mart
Virtual Data Warehouse
Abstract Data Warehouse Access LayerData is moved through the
abstract layer on demand.
The data warehouse layermanages the data marts
as a warehouse.
OLAP 101
• OLAP is a powerful graphics-oriented tool used to access the data warehouse
• OLAP supports– Business analysis queries– Data visualization– Trend analysis– Scenario analysis– User defined queries
OLAP 101
• Drill Down– Move from summary to detail
• Roll Up– Move from detail to summary
• Slice and Dice– Look at a specific interest of the business
OLAP 101
• Pivot and Rotate– Looking at data from varying perspectives
• Drill Through– Move to a near transaction level of detail
OLAP 101
• The flavors of OLAP– Multidimensional On-Line Analytical
Processing (MOLAP)– Relational On-Line Analytical Processing
(ROLAP)– Hybrid On-Line Analytical Processing
(HOLAP)
OLAP 101
• MOLAP– Produces a hypercube– Pre-aggregated and pre-calculated– Rapid response times– Limited in the amount of data that can be
managed
OLAP 101
• ROLAP– Data remains in a relational format– Some degree of aggregation– Slower response times– Scales to large amounts of data
OLAP 101
• HOLAP– Can manage data both as ROLAP and MOLAP– Currently evolving– MOLAP vendors are finding it easier to move
into the HOLAP market space
Data Mining 101
As defined by the Gartner Group in 1995, data mining is:
“…the process of discovering meaningful new correlations, patterns, and trends by sifting through large amounts of data stored in a repository, using pattern recognition technologies and statistical and mathematical techniques.”
Data Mining 101
• Data mining requires an analyst who is familiar with the domain to appropriately model scenarios.
• Data mining assists analysts in uncovering nontrivial data relationships.
• Analysis must be conducted to determine the meanings of these newly identified relationships.
Why Use a Data Warehouse ?
• Data warehousing is a must for anyone who uses multiple data sources to make decisions and understand business (trends, forecasting).
• Those who do not move to warehousing will not be capable of responding to problems and business conditions, thus falling behind the competition.
Why Use a Data Warehouse ?
• For organizations wanting to minimize costs and maximize productivity, warehousing is a must.
• Individuals who spend time gathering data instead of analyzing data require the assistance of a warehouse.
• Organizations that collect data but have difficulty determining meanings and impacts need a data warehouse.
Making the Warehouse a Reality
• Think big but work small.
• Match technology to requirements.
• Build for the future (scalability).
• Work closely with the users.– Requirements– Rapid Application Development (RAD) – Periodic releases to the user community
Real World Success Stories
• Radio Shack – Sales and stocking analysis– Marketing (regionalized mailings)
• Wal-Mart– Sales and stock analysis– Trend analysis– Vendor analysis
Real World Success Stories
• Naval Surface Warfare Center (NSWC)– Procurement– Supply– Workload
• Harris Semiconductor– Yield– Product– Personnel productivity
Real World Success Stories
• Defense Logistics Agency (DLA)/ManTech– Trend analysis– Problem identification– Procurement support– Enterprise data analysis
A Few Observations About Data Warehouses
• Industry and our experience indicate that:– Warehouses that succeed average an ROI of 400% with the top end
being as much as 600% in the first year.
– The incremental approach is most successful (build the warehouse a functional area at a time).
– The average time to gather requirements, perform a design, and deploy a warehouse increment is six months.
– New tools may be required that differ from the transaction environment.
• Software oriented toward intelligent analysis and query of the data warehouse
• Hardware oriented to support the massive storage requirements and analytical queries
Keys to Success
• Do you understand why you are building the warehouse?
• Have you identified both technical and business professionals that you will need to build the warehouse?
• Do you have a strong management sponsor?
• Are you managing the expectations of the users?
Careers in Data Warehousing
• System Administration
• DW Architect• Data Architect• DW Manager• DW Administrator• Decision Support
Analysts
• DBA• Application Developer• Data Cleansing/
Transformation Analyst
• Business Analyst• Management
A Real Architecture
DataWarehouse
FutureArchitecture
DLSC
DISCHP V2500
DSD
WSDB
Other
DSCCHP V2500
DSD
WSDB
Other
C & THP V2500
DSD
WSDB
Other
MedicalHP V2500
DSD
WSDB
Other
DISASAMMS
DSCRHP V2500
DSD
WSDB
Other
T1 T1 T1
T1
FLIS
T1MPP or SMP Cluster
Data Movement UsingMQSeries
DISC Users
DSCC Users
DSCR Users
C & T Users
Medical Users
NIPRNET
Internet Other Users
Other Data SourceAll Key DLSC Analytical
Data