Transcript

STAATLICHANERKANNTEFACHHOCHSCHULE

Author I: M.Sc. Johannes HofmeisterAuthor II: Dip.-Inf. (FH) Johannes HoppeDate: 25.02.2011

STUDIERENUND DURCHSTARTEN.

STAATLICHANERKANNTEFACHHOCHSCHULE

Basics – Adventure Works

Author I: M.Sc. Johannes HofmeisterAuthor II: Dip.-Inf. (FH) Johannes HoppeDate: 25.02.2011

Adventure Works

01

Slide 3

Resources

› Microsoft Visual Studio 2008 (NOT 2010)

› SQL Server 2008 (NOT Express Edition)

› MSSQL Server Community Projects & Sampleshttp://www.codeplex.com/SqlServerSamples

› Adventure Works Databases for SQL Server 2008http://msftdbprodsamples.codeplex.com/

› Adventure Works Sample Data Warehouse Documentationhttp://technet.microsoft.com/en-us/library/ms124623(SQL.90).aspx

› SQL Authority Adventure Works Tutorialhttp://blog.sqlauthority.com/2008/08/10/sql-server-2008-download-and-install-samples-database-adventureworks-2005-detail-tutorial/

Slide 5

Adventure Works

› Example Database of fictional companynamed „Adventure Works“› SSAS Integration (SQL Server Analysis Services)

› Finance› Franchises› Currency Rates (daily exchange rates)

› Sales› Reseller› Contracts

Slide 6

Available Scenarios

› DM/DW Scenarios› Mining Szenarios

› Forecasting Bikes by Region/Time

› Targeted Mailing Campaign Algorithms for demographic data Age, Region, Volume, etc.

› Market Basked Analysis „suggesting a product“

› Sequence Clustering

Slide 7

Available Scenarios

› OLAP Scenarios› Financial Reporting› Actual versus Budget› Product Profitability Analysis› Sales Force Performance› Trend/Growth Analysis› Promotion Effectiveness

Source: http://msdn.microsoft.com/en-us/library/ms124623.aspx

Slide 8

Adventure Works Data Warehouse

› Data from OLTP DB + Additional „External“ Datasource› Synchronization via available SSIS Packages› Copy of actual (live) data› Can be changed, merged for mining

Simple Datamining with View

02

Slide 9

Homework!

Slide 10

Data Mining Applied with AW DB

› Read and try it out!!!› Preparation

› 1. Get Visual Studio 2008› 2. Get SQL Server 2008› 3. Install Adventure Works Database (DW)

Homeworkhttp://msdn.microsoft.com/en-us/library/ms167167.aspx

Data Mining Applied with AW DB

Don‘t get confused*“SQL Server Business Intelligence Development Studio”is the combination ofMicrosoft Visual Studio 2008+ SQL Server 2008 (not Express)+ with Feature “Business Intelligence”

(*For the first time everybody is confused here! )

Slide 13

A look into the database

› Adventure Works 2008› AdventureWorksDW2008ProductCategory

vDMPrep

vTargetMail

Slide 14

Table: ProductCategory

Id Name rowguid Modified--- ------------ ------------------- -----------1 Bikes CFBDA25C-DF71-[...] 1998-06-01 2 Components C657828D-D808-[...] 1998-06-01 3 Clothing 10A7C342-CA82-[...] 1998-06-01 4 Accessories 2BE3BE36-D9A2-[...] 1998-06-01

Slide 15

View: vTargetMail

-- vTargetMail supports targeted mailing data model-- Uses vDMPrep to determine if a customer buys a bike and joins to DimCustomer

CREATE VIEW [dbo].[vTargetMail] AS SELECT c.[CustomerKey], -- [...] CASE x.[Bikes] WHEN 0 THEN 0 ELSE 1 END AS [BikeBuyer] FROM [dbo].[DimCustomer] c INNER JOIN (SELECT [CustomerKey],[Region],[Age]

,Sum(CASE [EnglishProductCategoryName] WHEN 'Bikes' THEN 1 ELSE 0 END) AS [Bikes]

FROM [dbo].[vDMPrep] GROUP BY [CustomerKey],[Region],[Age]) AS [x]

ON c.[CustomerKey] = x.[CustomerKey];GO

Slide 16

Create Project

› Add Source› Add Source View› Add Mining Structure

› Add Models (Algorithms)› Decision Trees› (Clustering)› (NaiveBayes)

Algorithm: Decision Tree

03

Slide 17

Slide 18

Algorithm Overview

› Used to identify relationships› Column 1, Column 2, Column 3› Most cases: 4 Steps

› Analyze› Create Model (Training)› Verify Model (Testing)› Predict Future Data

Slide 19

Decision Trees

› Also: Classification Trees› Partition Data› Can detect non-linear relationships› Machine Learning Technique

› Sepearate into Training and Testing set› Training set is created to create model based on certain criteria› Test set is used to verify the model

Slide 20

Decision Trees: Example

2,6 % respose rate

Male 3,0%

Female 2,9%

Income > $30 000: 3,6 %

Age < 40: 3,2 %

Males: $30 000

Female: 40+

Response Rate: > 3,5 %

Income < $30 000: 2,3 %

Age > 40: 3,8%

Trained Tree

Slide 21

Pros and Cons of Decision Trees

› Pros› Very flexible, white box Model› Occams Razor: Kiss – Keep it simple, stupid!› Little preparation and resources needed

› Cons› Can be tuned until death› Long time to build› Wisley select training data

False training yields false results Big tree might require disk swapping

THANK YOUFOR YOUR ATTENTION

Slide 22


Top Related