sql server 2008 data mining

35
SSAS 2008 Data Mining Lynn Langit/MSDN Developer Evangelist Microsoft http://blogs.msdn.com/SoCalDevGal

Upload: llangit

Post on 07-Nov-2014

2.328 views

Category:

Technology


3 download

DESCRIPTION

SQL Server 2008 Data Mining

TRANSCRIPT

Page 1: SQL Server 2008 Data Mining

SSAS 2008 Data Mining

Lynn Langit/MSDN Developer Evangelist Microsofthttp://blogs.msdn.com/SoCalDevGal

Page 2: SQL Server 2008 Data Mining

Session Prerequisites• Working SQL Server 2008

Developer• Understanding of OLAP concepts• Working SQL Server Analysis

Server 2005 Developer• Interest in or basic knowledge of

Data Mining concepts

Page 3: SQL Server 2008 Data Mining

Objectives and Agenda• Understand what, why, when & how of SQL

Server 2008 Data Mining• Examine the core functionality of the Data

Mining Extensions• Hear about the new and/or advanced

functionality of Data Mining

Page 4: SQL Server 2008 Data Mining

Predictive AnalyticsPredictive AnalyticsPredictive AnalyticsPredictive Analytics

PresentatioPresentationn

ExplorationExploration DiscoveryDiscovery

PassivePassive

InteractiInteractiveve

ProactivProactivee

Role of SoftwareRole of Software

Business Business InsightInsight

Canned reportingCanned reporting

Ad-hoc reportingAd-hoc reporting

OLAPOLAP

Data miningData mining

What and Why Data Mining?

Page 5: SQL Server 2008 Data Mining

Cubes vs. Data Mining

Page 6: SQL Server 2008 Data Mining

DM - Scenarios to Tasks

Page 7: SQL Server 2008 Data Mining

Tasks to Techniques

Page 8: SQL Server 2008 Data Mining

BI for Everyone

Individual – Excel Individual – Excel

Project – Share PointProject – Share Point

Page 9: SQL Server 2008 Data Mining

Microsoft’s Predictive Analytics

Data Mining SQL extensionsData Mining SQL extensions(DMX)(DMX)

Application Application DeveloperDeveloper

Data Mining Data Mining SpecialistSpecialist

Microsoft Dynamics CRMMicrosoft Dynamics CRMAnalytics FoundationAnalytics Foundation

SQL Server 2008 SQL Server 2008 Business Intelligence Development StudioBusiness Intelligence Development Studio

Microsoft SQL Server 2008 Analysis ServicesMicrosoft SQL Server 2008 Analysis Services

Information Information WorkerWorker

Data Mining Add-ins for Data Mining Add-ins for the 2007 Microsoft Office systemthe 2007 Microsoft Office system

Microsoft SQL Server 2008 Data MiningMicrosoft SQL Server 2008 Data Mining

BI AnalystBI Analyst

Custom Custom AlgorithmsAlgorithms

SQL Services SQL Services AzureAzure

Page 10: SQL Server 2008 Data Mining

Data Mining Add-ins for Office 2007Table Analysis Tools for Excel 2007Table Analysis Tools for Excel 2007

Data Mining Template for Visio 2007Data Mining Template for Visio 2007

Data Mining Client for Excel 2007Data Mining Client for Excel 2007

Information Information WorkerWorker

BI AnalystBI Analyst

Data Mining Data Mining SpecialistSpecialist

Page 11: SQL Server 2008 Data Mining

SSASSSAS(Data(Data

Mining)Mining)ExcelExcel

SSAS SSAS (DSV)(DSV)QueryQueryExcelExcel

SSISSSISSSASSSASSSRSSSRSExcelExcelYour AppsYour Apps

SSISSSISSSASSSASExcelExcel

Business Business UnderstandiUnderstandi

ngng

Data Data UnderstandiUnderstandi

ngng

Data Data PreparationPreparation

ModelingModeling

EvaluationEvaluation

DeploymentDeployment

DataData

Microsoft Data Mining Lifecycle CRISP-DM

www.crisp-dm.org

Page 12: SQL Server 2008 Data Mining

Understand & Prepare specifics

Page 13: SQL Server 2008 Data Mining

Demo

1 – Explore / Clean / Partition Data2 – Prepare Data

Page 14: SQL Server 2008 Data Mining

Modeling Specifics

Page 15: SQL Server 2008 Data Mining

Demo

3 – Select algorithm4 – Create model

Page 16: SQL Server 2008 Data Mining

Evaluation Specifics

Page 17: SQL Server 2008 Data Mining

Demo

5 – Evaluate Model6 – Deploy model7- Update model8 – Query model

Page 18: SQL Server 2008 Data Mining

Data Mining – Logical Model

Mining ModelMining Model

Mining ModelMining Model

Training DataTraining Data

DB dataDB dataClient dataClient dataApplication dataApplication data

Data MiningData MiningEngineEngine

To To PredictPredict

Predicted DataPredicted Data

Mining ModelMining ModelDB dataDB dataClient dataClient dataApplication dataApplication data““Just one rowJust one row””

Data MiningData MiningEngineEngine

algorithmalgorithm

Page 19: SQL Server 2008 Data Mining

Analysis ServicesAnalysis ServicesServerServer

Mining ModelMining Model

Data Mining AlgorithmData Mining Algorithm DataDataSourceSource

Data Mining - Physical Model

Your ApplicationYour Application

OLE DB/ ADOMD/ XMLAOLE DB/ ADOMD/ XMLA

DeploDeployy

BI Dev BI Dev StudioStudio (Visual (Visual Studio)Studio)

App DataApp Data

Page 20: SQL Server 2008 Data Mining

Data Mining Interfaces – APIs

Analysis Server (msmdsrv.exe)

OLAP Data Mining

Server ADOMD.NET

.Net Stored Procedures Microsoft Algorithms Third Party Algorithms

XMLAXMLAOver TCP/IPOver TCP/IP

OLEDB for OLAP/DM ADO/DSO

XMLAXMLAOver HTTPOver HTTP

Any Platform, Any Device

C++ App VB App .Net App

AMO

Any App

ADOMD.NET

WANWAN

DM Interfaces

Page 21: SQL Server 2008 Data Mining

Configuration & Deployment

Model Creation/Management Database Administrators Session Mining Models

Model Application Permissions on models Permissions on data sources

• Browse• Copy to Excel• Drillthrough

• Query• Default• Advanced

• Excel Services• Manage models and structures

• Export/Import• Rename

• Connection• Database• Trace

Page 22: SQL Server 2008 Data Mining

Data Mining Extensions (DMX) CREATE MINING MODELCREATE MINING MODEL

CreditRiskCreditRisk

(CustID(CustID LONG KEY, LONG KEY,

Gender TEXT DISCRETE,Gender TEXT DISCRETE,

Income Income LONG LONG CONTINUOUS,CONTINUOUS,

Profession TEXT DISCRETE,Profession TEXT DISCRETE,

RiskRisk TEXT DISCRETE PREDICT) TEXT DISCRETE PREDICT)

USINGUSING Microsoft_Decision_Trees Microsoft_Decision_Trees

INSERT INTOINSERT INTO CreditRisk CreditRisk

(CustId, Gender, Income, (CustId, Gender, Income, Profession, Risk)Profession, Risk)

Select Select

CustomerID, Gender, Income, CustomerID, Gender, Income, Profession,RiskProfession,Risk

From CustomersFrom Customers

SelectSelect NewCustomers.CustomerID, NewCustomers.CustomerID, CreditRisk.Risk, CreditRisk.Risk, PredictProbability(CreditRisk.Risk)PredictProbability(CreditRisk.Risk)

FROMFROM CreditRisk CreditRisk PREDICTION JOINPREDICTION JOIN NewCustomersNewCustomers

ONON CreditRisk.Gender=NewCustomer.GenderCreditRisk.Gender=NewCustomer.Gender

ANDAND CreditRisk.Income=NewCustomer.Income CreditRisk.Income=NewCustomer.Income

AND AND CreditRisk.Profession=NewCustomer.ProfessionCreditRisk.Profession=NewCustomer.Profession

Page 23: SQL Server 2008 Data Mining

DMX Column Expressions

• Predictable Columns• Source Data Columns

• Functions - Predict“Workhorse”Discrete scalar valuesContinuous scalar valuesAssociative nested tablesSequence nested tablesTime SeriesOverloaded to

PredictAssociationPredictSequencePredictTimeSeries

PredictProbability PredictSupport PredictHistogram Cluster ClusterProbability GetNodeId IsInNode

Arithmetic operators Stored Procedure Subselect

Select from nested tables

Page 24: SQL Server 2008 Data Mining

Demo – Data Mining & Excel 20007

integration

Page 25: SQL Server 2008 Data Mining

Excel Functions*

DMPREDICTTABLEROW ( Connection, ModelName, PredictionResult, TableRowRange[, string CommaSeparatedColumnNames])

DMPREDICT ( Connection, Model, PredictionResult,

Value1, Name1, [...,Value32, Name32])

DMCONTENTQUERY (Connection, Model, PredictionResult[, WhereClause])

Page 26: SQL Server 2008 Data Mining

DM in the Cloud

Test Data Types•Relational•CSV•SQL Services (Azure Services)

Page 27: SQL Server 2008 Data Mining

Try it in the cloud…

Page 28: SQL Server 2008 Data Mining

Analysis Results in the Cloud…

Page 29: SQL Server 2008 Data Mining

Calling the Cloud…(from Excel 2007)

Page 30: SQL Server 2008 Data Mining

New to SQL Server 2008 DM

• Microsoft Time Series algorithm improved • ARIMA plus ARTxp method, and a blending algorithm = better results • New prediction mode allows adding new data to time series models

• Holdout Support added• Easily partition data into training and test sets that are stored in mining structure &

available to query after processing

• Ability to build mining models based on filtered subsets added• Results in less structures, i.e. can just filter existing

• Drillthrough functionality extended • makes all mining structure columns available, not just columns included in the model• allows you to build more compact models

• Cross-validation added• allows users to quickly validate their modeling approach by automatically building

temporary models and evaluating accuracy measures across K folds. The feature is available through a new cross-validation tab under Accuracy Charts in BIDS, in addition to being accessible programmatically via a stored procedure call.

Page 31: SQL Server 2008 Data Mining

Summary

• Data Mining in SQL Server 2008 is mature, powerful and accessible

• Can use Excel 2007• Familiar client for BI – OLAP cubes AND Data Mining

models• Model Creators / Users• Excel Data or Server Data

• SSAS and Excel both support the full DM Cycle• Data Understanding & Data Preparation• Modeling, Validation & Deployment

• SQL Services Incubations available now• Data Mining from the Cloud• More

Page 32: SQL Server 2008 Data Mining

DM Webcasts

Fri, 02 Nov 2007MSDN Webcast: Build Smart Web Applications with SQL Server Data Mining (Level 200)Thu, 08 Nov 2007MSDN Webcast: Building Adaptive Applications with SQL Server Data Mining (Level 300)Mon, 19 Nov 2007MSDN Webcast: Extending and Customizing SQL Server Data Mining (Level 300)Fri, 30 Nov 2007MSDN Webcast: Creating Visualizations for SQL Server Data Mining (Level 300)Thu, 01 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 1 of 3): Your First Project with SQL Server Data Mining (Level 200)Thu, 15 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 2 of 3): Understand SQL Server Data Mining Add-ins for the 2007 Office System (Level 200)Thu, 29 Nov 2007TechNet Webcast: Deliver Actionable Insight Throughout Your Organization with Data Mining (Part 3 of 3): Use Predictive Intelligence to Create Smarter KPIs (Level 200)

Page 33: SQL Server 2008 Data Mining

DM Resources

Technical Communities, Webcasts, Blogs, Chats & User Groupshttp://www.microsoft.com/communities/default.mspx

Microsoft Developer Network (MSDN) & TechNet http://microsoft.com/msdn http://microsoft.com/technet

Trial Software and Virtual Labshttp://www.microsoft.com/technet/downloads/trials/default.mspx

Microsoft Learning and Certificationhttp://www.microsoft.com/learning/default.mspx

SQL Server Data Mininghttp://www.sqlserverdatamining.comhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspxhttp://www.microsoft.com/bi/bicapabilities/data-mining.aspx

Page 34: SQL Server 2008 Data Mining

BI Resources from Lynn Langit

http://blogs.msdn.com/SoCalDevGalhttp://blogs.msdn.com/SoCalDevGal

““How Do I…BI?” screencast series on MSDNHow Do I…BI?” screencast series on MSDN

““Smart Business Intelligence Solutions with Microsoft SQL Server Smart Business Intelligence Solutions with Microsoft SQL Server 2008” 2008” MSPress Feb 2009

““Foundations of SQL Server 2005 Business IntelligenceFoundations of SQL Server 2005 Business Intelligence” ” APress April 2007

Page 35: SQL Server 2008 Data Mining