sales data warehouse advisor : dr. irwin levinstein presented by : kalyan yadavalli
TRANSCRIPT
Contents
Mission of the project Need For a Data Warehouse Data Warehouse - Overview Sales Data Warehouse Conclusion
Mission of the project
The mission of this project is to provide strategic and tactical support to the Marketing-Sales and Advertising departments of a media company through the acquisition and analysis of data pertaining to their customers and markets.
This project helps to identify areas of readership and marketing through creation of a Data Warehouse that will provide a media company with a better understanding of its customers and markets.
Need For a Data Warehouse
To provide an environment where a relatively small amount of knowledge of the technical aspects of database technology is required to write and maintain queries and reports .
To provide a means to speed up the writing and maintaining of queries and reports by technical personnel.
For example, a query that requests the total sales income and quantity sold for a range of products in a specific geographical region for a specific time period can typically be answered in a few seconds or less regardless of how many hundreds of millions of rows of data are stored in the data warehouse database.
To make it easier, on a regular basis, to query and report data from multiple transaction processing systems ,external data sources for querying or reporting purposes.
To prevent persons who only need to query and report transaction processing system data from having any access whatsoever to transaction processing system databases and logic used to maintain those databases.
Data Warehouse – Overview
A data warehouse is a copy of data combined from different data sources specifically structured for querying and reporting.
Data warehouses support business decisions by collecting, consolidating, and organizing data for reporting and analysis with tools such as online analytical processing (OLAP) and data mining.
Dimensional Modeling VS Entity-Relationship Modeling
An OLTP system requires a normalized structure to minimize redundancy, provide validation of input data, and support a high volume of fast transactions. A transaction usually involves a single business event, such as placing an order or posting an invoice payment. An OLTP model often looks like a spider web of hundreds or even thousands of related tables.
In contrast, a typical dimensional model uses a star design that is easy to understand and relate to business needs, supports simplified business queries, and provides superior query performance by minimizing table joins.
Dimensional Modeling VS Entity-Relationship Modeling
This project uses Dimensional modeling, which is the name of the logical design technique often used for data warehouses. It is different from entity-relationship modeling.
Entity relationship modeling is a logical design technique that seeks to eliminate data redundancy while Dimensional modeling seeks to present data in a standard framework that is intuitive and allows for high-performance access.
For example, a query that requests the total sales income and
quantity sold for a range of products in a specific geographical region for a specific time period can typically be answered in a few seconds or less regardless of how many hundreds of millions of rows of data are stored in the data warehouse database.
Entity–Relationship Modeling
Customer
Campaign History
Payment
Salesperson
City
Demographics
SalesConditions
CustomerSubscriptions
District
Zones
Channel
Carrier HistoryCarrier MasterCampaign Offer
Dimensional Modeling
Subscription Sales
EffectiveDateKeyCustomerKeySubscriptionsKeyPaymentKeyCampaignKeySalesPersonKeyRouteKeyDemographics KeyUnitsSoldDollarsSoldDiscountCostPremiumCost
Customer
Campaign
Payment
Salesperson
Route
Subscriptions
Date
Demographics
Dimensions DimensionsFact Table
TechnicalArchitecture
Design
TechnicalArchitecture
Design
ProductSelection &Installation
ProductSelection &Installation
End-UserApplication
Specification
End-UserApplication
Specification
End-UserApplication
Development
End-UserApplication
Development
ProjectPlanningProject
Planning
Business
Requirement
Definition
Business
Requirement
Definition
DeploymentDeploymentMaintenance
andGrowth
Maintenanceand
Growth
Project ManagementProject Management
DimensionalModeling
DimensionalModeling
PhysicalDesign
PhysicalDesign
Data StagingDesign &
Development
Data StagingDesign &
Development
Kimball- Dimensional life cycle diagram
Sales Data Warehouse
Business Users Requirements Technical Architecture Product [ Software] Selection Dimensional Modeling Logical Design Data Staging Design & Development Building Data Cube using SQL Analysis Services End User Application Specification & Development Deployment Maintenance & Growth
Requirements Gathering
This phase involves the following steps: Collect some business questions the users want an answer for. Gather details/requirements from the business users Get user sign off on the business questions.
Business Questions: Can we profile our "best subscribers" to pull lists of "like" non-subscribers
that we could touch in some way? Who exists in the marketplace and have we touched them? Can we build a loyalty model based on a subscriber's payment history?
ServicesTransform from source to target.Maintain conformed dimensions.
Data StorageFlat files or relational tables
Design GoalsStaging throughput.Integrity and consistency.
Subscription SalesDimensional.Atomic and summary data.Business process.Design GoalsEase-of-use.Query performance.
Dimensional Bus:Conformed facts and dimensions
SQL Reporting Services
Excel
AccessLoad Access
Marketing/SalesData
Name Phone Data
Demographics
Extract
Extract
Extract
Source Systems
Sales Data Warehouse High Level Technical ArchitectureSales Data Warehouse High Level Technical Architecture
Data Staging Area Presentation Area Data Access Tools
Product Selection
Hardware Specs: AMD Opteron Processor 252 2.6 GHz, 3.83 GB RAM Operating System: Windows Server 2003
Software Specs: Kimball Data Warehouse Tool [Create staging and production databases] Microsoft ® SQL Server™ 2000 [ETL { Extract Transform Load} ] Microsoft ® SQL Server™ 2005 Integration Services [ Nightly Automation] Microsoft ® SQL Server™ 2005 Analysis Services [ Create OLAP Data
Cube] Microsoft ® SQL Server™ 2005 Reporting Services[ End User Reports] Internet Information Services [ IIS 6.0] [ Web Server to Host the Reports]
Dimensional Modeling
Design Dimensions Attributes of the dimension Hierarchy in the dimension
Dimensional Bus Matrix Design Fact Tables
Dimension Hierarchy-Subscriptions
SubscriptionName
Rate Service Term Publication
Rate Year Rate Area Rate Type
DiscountCategory
FrequencyGroups
Term LengthGroups
Short or LongTerm
BusinessGroup
Dimensions Date Demographics Sales Conditions
Salesperson Campaign……
Business
Processes Subscription Sales( starts)
X X X X X
Subscription Tracking
X X X X
Complaints X X X
Stops X X X
Upgrades AndDowngrades
X X X X X
Dimensional Bus Matrix
SubscriptionSales Fact Table
EffectiveDateKeyCampaignKeySalespersonKeyCustomerKeyDempgraphicsKeySubscriptionKey………………..Grain: Each subscription soldFactsUnits SoldNumber of Sales (=1)Dollars SoldDiscount CostPremium Cost
Design Fact Tables•Choose the Business Process as the fact table•Declare the grain•Choose the dimensions•Choose the facts
SalesConditions
Campaign
Demographics
Address
Route
Salesperson
Subscription
Loyalty-Payment
Customer
Date
Subscr.Sales
(starts)
Subscription Sales
Grain: Each subscription sold
MeasuresUnits Sold
Number of Sales (=1)Dollars Sold
Discount CostPremium Cost
Dimension Model: SubscriptionSales
Logical Design
Fact Table Design
Dimension Table Design
Slowly Changing Dimensions Type 1: The new record replaces the original record. No trace of the old
record exists. Type 2: A new record is added into the customer dimension table.
Therefore, the customer is treated essentially as two people.
Column Names Data Type
NULL?
Key? FK TO Dimension Description
EffectiveDateKey int N FK DimDate Key of effective date
EnteredDateKeyint N FK DimDate
Key of date entered in the system
CustomerKey int N FK DimCustomer Key of customer
LoyaltyKey int N FK DimLoyalty Key of loyalty score
PaymentKey int N FK DimPaymentHistory Key of payment behavior
SalesPersonKeyint N FK DimSalesPerson
Key of sales person for the change
CampaignKey int N FK DimCampaign Key of campaign
SalesConditionsKey int N FK DimSalesConditions Key of sales conditions
SubscriptionKey int N FK DimSubscription Key of subscription
PersonKeyint N FK dimPerson
Key of person on the subscription
Fact Table -SubscriptionSales
Column Names Data Type
NULL? Key?
FK TO Dimension Description
Slowly changing dimension type
CustomerSubscriptionKey int N PK ID Surrogate Primary Key
AddressNum int Y BKBusiness key of the subscription summary record address
SubscriptionNum int Y BK `Business key of the numbered subscription at the address
BusinessKey int Y BK Concatenated business key
CustomerID int N Unique identifier for this customer 1
BillingMethod int The method of delivery for the customers bill 2
OriginalStartDateKey int FK DimDateThe earliest start date on record for this customer 1
StartDateKey int FK DimDateThe start date of this customer’s current subscription 2
StopDateKey int FK DimDate The most recent stop date for this customer 2
ExpireDateKey int FK DimDateThe expiration date of this customer’s current or most recent subscription 2
Dimension Table - CustomerSubscriptions
Data Staging
The following are the sub processes of Data Staging process.
Extracting : Reading and understanding the source data, copying the parts that are needed to the staging area.
Transforming: Possible transformation steps in the data staging area Cleaning the data – correct misspellings, deal with missing data elements,
parsing into standard formats. Purging selected data which is not required Combining data sources, by matching exactly on key values or performing fuzzy
matches on non-key attributes. Creating surrogate keys for each dimensional record. Building aggregates to boost performance of common queries.
Loading – Loading the transformed data into the production database.
Data Staging -ETL Architecture
Common Source Database
Source SchemaCreates the tables for the common source database
Kimball Data Modeling toolCreates the staging and production database tables and metadata.
Production Data Warehouse Database
Source QueriesCreates views (queries) that feed data to production
Staging Database
Setup files on SQL server that is running the DTS packages.
Setup files on SQL server that is running the DTS packages.
Database Configuration FileDatabase connection information for the ETL process
Dates ConfigurationFileImportant date info for the ETL process
DTSData Transformation Services Package
DTSData Transformation Services Package
Data Staging –Development for Sales Data Warehouse
The ETL packages perform the following work. Extract the full sets of dimension rows Most transformation logic occurs in the extract query, using SQL Extracted rows are stored in a staging table until the ETL package is run. There are steps for the staged rows to be fixed up, via SQL statements.
There’s a statement for deleting bogus rows, and a separate statement for updating rows.
Find rows that are new; insert them into the target table. Use a checksum to find rows that have seen a Type 1 change. Update the
appropriate columns in the target table. Use a checksum to find rows that have seen a Type 2 change. Propagate a
new dimension row. Log the number of rows extracted, staged, deleted and updated from the
staging tables, inserted into target, Type 1 and Type 2 rows updated in target.
Building Data Cube using SQL Analysis Services
SQL Server Analysis Services 2005 provides tools for developing OLAP applications
OLAP [ Online Anlytical Processing ] organizes data warehouse data into multidimensional cubes based on the dimensional model, and then preprocesses these cubes to provide maximum performance for queries that summarize data in various ways.
Build the cube using SQL Analysis Services and deploy it to SQL Analysis Services Server.
End User Application Development
Reporting Services - Uses the Analysis Services Data Cube as Data Source.
Sample Report Screenshots
Solicitor SalesSales Type Sales
ChannelSales Agent Number Of
SalesCost Per
Unit Retention
Carrier Sales
Carrier
Sales
Carrier Sales 2 $0.00 50.0% DM Sales DM Sales DM Sales 124 $0.13 13.7% Non-Solicited
Other 2005 THISISHAMPRDS
FREE2WKSAMP8 $0.00 12.5%
AD CONTRACT START 8 $0.17 62.5% ALLCONNECT 1,914 $0.04 43.5% CAN'T AFFORD DM "2 WKS
FREE"7 $0.10 71.4%
COLLECTIONS TEAM 15 $0.14 73.3% COOLSAVINGS.COM 9,415 $21.55 7.1% CUSTOMER SERVICE
PROMO STARTS1,934 $0.15 73.7%
DATA ENTRY STARTS 49 $0.14 59.2%
Sales Agent
Source Name
Phone Address St City ZIP
# Sales
Units
Sold
$ Sold
Discount
Cost
Premium
Cost
AD CONTRACT START
ULTIMATE TAN OF SMITHFIELD
(757) 365-9400
13412 BENNS CHURCH BLVD
VA SMITHFIELD
23430
1 260 $30.99
$37.91 $5.00
AD CONTRACT START
CHOREY & ASSOCIATE
(757) 539-7451
330 W CONSTANCE RD # 100
VA SUFFOLK
23434
1 260 $30.99
$37.91 $5.00
AD CONTRACT START
CHOREY AND ASSOCIATE
(757) 539-7454
804 W WASHINGTON ST
VA SUFFOLK
23434
1 260 $30.99
$37.91 $5.00
AD CONTRACT START
VIRGINIA STAGE CO
0 254 GRANBY ST VA NORFOLK
23510
1 260 $30.99
$37.91 $5.00
AD CONTRACT START
SPINE & ORTHAPEDIC CTR, PC
0 6160 KEMPSVILLE CIR # 303A
VA NORFOLK
23502
1 260 $30.99
$37.91 $5.00
AD CONTRACT START
COUNTRYWIDE HOME LOAN
0 3000 WOODLAWN DR
VA SUFFOLK
23434
1 260 $30.99
$37.91 $5.00
AD CONTRACT START
WHITE, E.D. 0 730 10TH ST VA VIRGINIA BEACH
23451
1 260 $30.99
$37.91 $5.00
AD CONTRACT START
PERMANENT COATING SOLUTIONS IN
(757) 539-4366
434 N MAIN ST # D
VA SUFFOLK
23434
1 260 $30.99
$37.91 $5.00
8 2,080 $247.94
$303.26 $40.00
Sales Agent Details
Deployment
Deploy the Reports to SQL Reporting Services 2005 server.
Give Access to the users to view the reports.
Desktop Installation – Dot Net Framework 2.0 For access to Report Builder.
ConclusionThe reports generated from the data warehouse answered the following questions collected form the business users during the requirement gathering phase of the project.
Identify their best customers/loyal customers [ customer subscriptions /subscription sales]
Non-subscribers who can be reached Contact history of customers in market place [ Demographic data]
Benefits to Marketing Increased telemarketing close rates and increased direct mail response rates Reduced cost and use of outside telemarketing services and reduced print and
mailing costs Identification of new product bundling and distribution opportunities Increased acquisition and retention rates, and reduced cost of acquisitions
Benefits to Advertising An increase in the annual rate of revenue growth. Increase in new advertisers Improved targeting capabilities
.