dmdw lesson 02 - basics with adventure works
DESCRIPTION
TRANSCRIPT
STAATLICHANERKANNTEFACHHOCHSCHULE
Author I: M.Sc. Johannes HofmeisterAuthor II: Dip.-Inf. (FH) Johannes HoppeDate: 25.02.2011
STUDIERENUND DURCHSTARTEN.
STAATLICHANERKANNTEFACHHOCHSCHULE
Basics – Adventure Works
Author I: M.Sc. Johannes HofmeisterAuthor II: Dip.-Inf. (FH) Johannes HoppeDate: 25.02.2011
Resources
› Microsoft Visual Studio 2008 (NOT 2010)
› SQL Server 2008 (NOT Express Edition)
› MSSQL Server Community Projects & Sampleshttp://www.codeplex.com/SqlServerSamples
› Adventure Works Databases for SQL Server 2008http://msftdbprodsamples.codeplex.com/
› Adventure Works Sample Data Warehouse Documentationhttp://technet.microsoft.com/en-us/library/ms124623(SQL.90).aspx
› SQL Authority Adventure Works Tutorialhttp://blog.sqlauthority.com/2008/08/10/sql-server-2008-download-and-install-samples-database-adventureworks-2005-detail-tutorial/
Slide 5
Adventure Works
› Example Database of fictional companynamed „Adventure Works“› SSAS Integration (SQL Server Analysis Services)
› Finance› Franchises› Currency Rates (daily exchange rates)
› Sales› Reseller› Contracts
Slide 6
Available Scenarios
› DM/DW Scenarios› Mining Szenarios
› Forecasting Bikes by Region/Time
› Targeted Mailing Campaign Algorithms for demographic data Age, Region, Volume, etc.
› Market Basked Analysis „suggesting a product“
› Sequence Clustering
Slide 7
Available Scenarios
› OLAP Scenarios› Financial Reporting› Actual versus Budget› Product Profitability Analysis› Sales Force Performance› Trend/Growth Analysis› Promotion Effectiveness
Source: http://msdn.microsoft.com/en-us/library/ms124623.aspx
Slide 8
Adventure Works Data Warehouse
› Data from OLTP DB + Additional „External“ Datasource› Synchronization via available SSIS Packages› Copy of actual (live) data› Can be changed, merged for mining
Data Mining Applied with AW DB
› Read and try it out!!!› Preparation
› 1. Get Visual Studio 2008› 2. Get SQL Server 2008› 3. Install Adventure Works Database (DW)
Homeworkhttp://msdn.microsoft.com/en-us/library/ms167167.aspx
Data Mining Applied with AW DB
Don‘t get confused*“SQL Server Business Intelligence Development Studio”is the combination ofMicrosoft Visual Studio 2008+ SQL Server 2008 (not Express)+ with Feature “Business Intelligence”
(*For the first time everybody is confused here! )
Slide 13
A look into the database
› Adventure Works 2008› AdventureWorksDW2008ProductCategory
vDMPrep
vTargetMail
Slide 14
Table: ProductCategory
Id Name rowguid Modified--- ------------ ------------------- -----------1 Bikes CFBDA25C-DF71-[...] 1998-06-01 2 Components C657828D-D808-[...] 1998-06-01 3 Clothing 10A7C342-CA82-[...] 1998-06-01 4 Accessories 2BE3BE36-D9A2-[...] 1998-06-01
Slide 15
View: vTargetMail
-- vTargetMail supports targeted mailing data model-- Uses vDMPrep to determine if a customer buys a bike and joins to DimCustomer
CREATE VIEW [dbo].[vTargetMail] AS SELECT c.[CustomerKey], -- [...] CASE x.[Bikes] WHEN 0 THEN 0 ELSE 1 END AS [BikeBuyer] FROM [dbo].[DimCustomer] c INNER JOIN (SELECT [CustomerKey],[Region],[Age]
,Sum(CASE [EnglishProductCategoryName] WHEN 'Bikes' THEN 1 ELSE 0 END) AS [Bikes]
FROM [dbo].[vDMPrep] GROUP BY [CustomerKey],[Region],[Age]) AS [x]
ON c.[CustomerKey] = x.[CustomerKey];GO
Slide 16
Create Project
› Add Source› Add Source View› Add Mining Structure
› Add Models (Algorithms)› Decision Trees› (Clustering)› (NaiveBayes)
Slide 18
Algorithm Overview
› Used to identify relationships› Column 1, Column 2, Column 3› Most cases: 4 Steps
› Analyze› Create Model (Training)› Verify Model (Testing)› Predict Future Data
Slide 19
Decision Trees
› Also: Classification Trees› Partition Data› Can detect non-linear relationships› Machine Learning Technique
› Sepearate into Training and Testing set› Training set is created to create model based on certain criteria› Test set is used to verify the model
Slide 20
Decision Trees: Example
2,6 % respose rate
Male 3,0%
Female 2,9%
Income > $30 000: 3,6 %
Age < 40: 3,2 %
Males: $30 000
Female: 40+
Response Rate: > 3,5 %
Income < $30 000: 2,3 %
Age > 40: 3,8%
Trained Tree
Slide 21
Pros and Cons of Decision Trees
› Pros› Very flexible, white box Model› Occams Razor: Kiss – Keep it simple, stupid!› Little preparation and resources needed
› Cons› Can be tuned until death› Long time to build› Wisley select training data
False training yields false results Big tree might require disk swapping