building data integration solutions with integration services donald farmer group program manager...

22
Building Data Integration Solutions with Integration Services Donald Farmer Donald Farmer Group Program Manager Group Program Manager Microsoft Corporation Microsoft Corporation

Upload: brittney-cummings

Post on 29-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Building Data Integration Solutions with Integration Services

Building Data Integration Solutions with Integration Services

Donald FarmerDonald FarmerGroup Program ManagerGroup Program ManagerMicrosoft CorporationMicrosoft Corporation

Page 2: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

AgendaAgenda

Integration Services Overview Integration Services Overview Building Packages DemoBuilding Packages Demo SSIS LifecycleSSIS Lifecycle

Tools, management, security, deploymentTools, management, security, deployment Managing Packages Demo Managing Packages Demo Troubleshooting Troubleshooting

Log data, error flows, Log data, error flows, SummarySummary

Page 3: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Enterprise ETL platformEnterprise ETL platform High performanceHigh performance High scaleHigh scale

Best in class usabilityBest in class usability Rich development Rich development

environmentenvironment Source controlSource control Visual debugging of Visual debugging of

control control flow and dataflow and data

Great range of transforms Great range of transforms

out-of-the-boxout-of-the-box Highly extensibleHighly extensible

Custom tasksCustom tasks Custom enumerationsCustom enumerations Custom transformationsCustom transformations Custom data sourcesCustom data sources

Integration ServicesIntegration ServicesBreakthrough ETL CapabilitiesBreakthrough ETL Capabilities

Page 4: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Data Integration Data Integration ArchitectureArchitecture Before Integration ServicesBefore Integration Services

Call centre data: semi structured

Legacy data: binary files

Application database

ETL Warehouse

Reports

Mobiledata

Data mining

Alerts and escalation

Integration and warehousing require separate, staged, operations. Preparation of data requires different, often incompatible, tools. Reporting and escalation is a slow process, delaying smart

responses. Heavy data volumes make this scenario increasingly unworkable.

Handcoding

StagingText Mining

ETL Staging

Cleansing &

ETL

Staging

ETL

Page 5: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Call centre: Semi-structured data

Legacy data: Binary files

Application database

Alerts and escalation

Integration and warehousing are a seamless, manageable, operation.

Sourced, prepare and load data in a single, auditable process. Reporting and escalation can be parallelized with the warehouse

load. Scales to handle heavy and complex data requirements.

SQL Server Integration Services

Text miningcomponents

Customsource

Standardsources

Data cleansingcomponents

Merges

Data miningcomponents

Warehouse

Reports

Mobiledata

Data Integration Data Integration ArchitectureArchitecture With Integration ServicesWith Integration Services

Page 6: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

How SQL Server How SQL Server Integration Services Integration Services WorksWorks

Data sources can be diverse, including custom or scripted adapters Transformation components shape and modify data in many ways. Data is routed by rules or error conditions for cleansing and

conforming. Flows can be as complex as your business rules, but highly

concurrent. And finally data can be loaded in parallel to many varied

destinations.

Page 7: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Data CleansingData Cleansing Provides data mining and AI expertiseProvides data mining and AI expertise Domain-independent data cleansingDomain-independent data cleansing

Fuzzy lookupFuzzy lookup Lookup on approximate matchesLookup on approximate matches Tune for best matchTune for best match

De-duplicationDe-duplication Eliminate approximate duplicatesEliminate approximate duplicates ““Windows XP”, “WinXP”, etc. Windows XP”, “WinXP”, etc. Tune for confidenceTune for confidence

Managing Slowly Changing Managing Slowly Changing DimensionsDimensions E.g. Sales organization changesE.g. Sales organization changes E.g. Customer movementE.g. Customer movement E.g. Product category changesE.g. Product category changes

SQL Server Integration ServicesSQL Server Integration ServicesNew Paradigm for the ETL PlatformNew Paradigm for the ETL Platform

Page 8: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Building PackagesBuilding Packages

Page 9: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

SSIS Life Cycle toolsSSIS Life Cycle toolsFeature summaryFeature summary

Design the SSIS ‘Package’Design the SSIS ‘Package’ Business Intelligence Studio (visual Studio)Business Intelligence Studio (visual Studio) Migration wizard for pre SQL 2005 packagesMigration wizard for pre SQL 2005 packages Visual Source Safe IntegrationVisual Source Safe Integration

Deployment/ExecutionDeployment/Execution Deployment Utility to copy packagesDeployment Utility to copy packages Command Line execution (dtexec.exe and dtexecui.exe)Command Line execution (dtexec.exe and dtexecui.exe) Flexible Configuration OptionsFlexible Configuration Options

SupportabilitySupportability Rich per package Logging (Log Providers)Rich per package Logging (Log Providers) SQL Management Studio for monitoring running SQL Management Studio for monitoring running

packages and organizing stored packages (using SSIS packages and organizing stored packages (using SSIS windows service).windows service).

Checkpoint - Restart abilityCheckpoint - Restart ability

Page 10: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Sample Server LayoutSample Server Layout

Destination data

Source data

Source Flat Files

SSIS package

error rows

SSIS package Logging

SSIS packages stored in

SQL

SSIS Package Execution

SSIS support Servers

Packages on file system

Page 11: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Logging and Log ProvidersLogging and Log Providers Log entries are a blend of status and result Log entries are a blend of status and result

messages.messages. Can select what ‘details’ per control flow Can select what ‘details’ per control flow

object within each package (e.g. OnError, object within each package (e.g. OnError, OnWarning, OnPreExecute)OnWarning, OnPreExecute)

Can select what fields (e.g. computer, Can select what fields (e.g. computer, operator, ExecutionID…)operator, ExecutionID…)

Can define multiple log providers (SQL, text Can define multiple log providers (SQL, text file, Windows Event..) per packagefile, Windows Event..) per package

BIDS has a Log Events window to see the log BIDS has a Log Events window to see the log entries that are headed for the log providerentries that are headed for the log provider

Page 12: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

SSIS Windows ServiceSSIS Windows Service Monitors and manages running as well as Monitors and manages running as well as

stored packages, via SQL Management stored packages, via SQL Management Studio Studio

Service Installed when you install SSISService Installed when you install SSIS Service is not required to design or execute Service is not required to design or execute

PackagesPackages Stored Packages tree is based on XML Stored Packages tree is based on XML

configuration file configuration file You can customize the file contents, name, You can customize the file contents, name,

and locationand location Windows events for service (service start, Windows events for service (service start,

service failed to start, package started, service failed to start, package started, package stopped…)package stopped…)

Page 13: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Overview of SSIS securityOverview of SSIS security

‘‘Security’ is comprised of several Security’ is comprised of several layers to support both SQL and File layers to support both SQL and File System Based scenarios.System Based scenarios.

Packages can be EncryptedPackages can be Encrypted Packages can be Digitally SignedPackages can be Digitally Signed Packages can be stored in SQL DB and Packages can be stored in SQL DB and

protected with SQL Rolesprotected with SQL Roles

Page 14: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Overview of SSIS securityOverview of SSIS security

(A) A package can be encrypted via the Package Protection (A) A package can be encrypted via the Package Protection level property as well as stored inside MSDB, where access to level property as well as stored inside MSDB, where access to the package is limited by SQL Database rolesthe package is limited by SQL Database roles

OS Permissions•File and folder access control for file based packages

•View/stop running packages

Package Protection

Level Property

•Reader and Writer roles for packages stored in MSDB

SQL DB Roles

AB

(B) A package can be encrypted via the Package Protection level (B) A package can be encrypted via the Package Protection level property as well as access to the package file is controlled by property as well as access to the package file is controlled by folder/file permissions at the operating system levelfolder/file permissions at the operating system level

Encrypt or clear sensitive properties

Package Signature

Page 15: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

DeploymenDeployment Flowt Flow

Tools to Tools to organize and organize and ‘copy’ ‘copy’ packages and packages and supporting supporting filesfiles

•Design Package•Add Configurations•Add Miscellaneous files•Set Project Deployment properties•Build Project

Installation Installation WizardWizard

•Execute manifest file•Choose Destination (SQL File System) •Modify protection level•Choose location of supporting files•Change configurations

Bi StudioBi Studio

•Copy/Move Deployment folder\files YouYou

•Create desired agent jobs SQL AgentSQL Agent

Page 16: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Deploying SSIS packagesDeploying SSIS packages Designer can build a deployment file set Designer can build a deployment file set

which includes a project’s package(s), which includes a project’s package(s), configuration files, and installer filesconfiguration files, and installer files

You move the installer file set to another You move the installer file set to another server\environment and execute, to server\environment and execute, to install packages to SQL or the File install packages to SQL or the File system.system.

Deployment is not a version sync\check Deployment is not a version sync\check tool such as SMStool such as SMS

Installer can ‘copy’ miscellaneous files Installer can ‘copy’ miscellaneous files but will not ‘install’ them, such as but will not ‘install’ them, such as custom component.custom component.

Page 17: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

SQL Management StudioSQL Management Studio

Requires the SSIS serviceRequires the SSIS service Allows Monitoring and Stopping of Allows Monitoring and Stopping of

currently Executing packagescurrently Executing packages Maintain stored package structure and Maintain stored package structure and

set roles for SQL stored packagesset roles for SQL stored packages You can connect\view multiple SSIS You can connect\view multiple SSIS

servers at one timeservers at one time Adhoc Package execution from folder Adhoc Package execution from folder

treetree

Page 18: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Log Data for Log Data for TroubleshootingTroubleshooting Logging and Error Flow data are core for Logging and Error Flow data are core for troubleshootingtroubleshooting

Can save\load logging detail templatesCan save\load logging detail templates Children packages bubble entries to parentChildren packages bubble entries to parent Package IDs need to be unique. You can re-Package IDs need to be unique. You can re-

generate the ID via BIDS or dtutil.exe generate the ID via BIDS or dtutil.exe Example: This will regenerate package IDs Example: This will regenerate package IDs

for all packages in a folder…be it 2 or 200:for all packages in a folder…be it 2 or 200: for %%f in (C:\_work\SSISPackages\for %%f in (C:\_work\SSISPackages\

_quick\Notepad\*.dtsx) do dtutil.exe /i _quick\Notepad\*.dtsx) do dtutil.exe /i /File %%f/File %%f

Page 19: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Error Flows in Data Flow Error Flows in Data Flow TaskTask Error flows can be controlled per Error flows can be controlled per field per row (fail component, field per row (fail component, redirect, ignore)redirect, ignore)

Just another flow…to your Just another flow…to your destinationdestination

Error flows can all be directed to a Error flows can all be directed to a central location for centralized central location for centralized operationsoperations

Includes error code and column idIncludes error code and column id Can couple with a Row Sampling Can couple with a Row Sampling

transformtransform

Page 20: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

Other Troubleshooting Other Troubleshooting ideasideas Performance Counters Performance Counters (SQLServer:SSISPipeline and (SQLServer:SSISPipeline and SQLServer:SSISService:)SQLServer:SSISService:)

Integration with MicrosoftIntegration with Microsoft®® Operations Operations ManagerManager

SQL Agent has log data as well and proxies SQL Agent has log data as well and proxies to limit package executionto limit package execution

General package design to log row counts, General package design to log row counts, multicast, and save variablesmulticast, and save variables

Webcast: Scalability, Performance and Webcast: Scalability, Performance and Optimization in SSISOptimization in SSIS

Page 21: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

SummarySummary

SQL Server Integration Services is an SQL Server Integration Services is an exceptionally high performance exceptionally high performance integration and transformation toolintegration and transformation tool

Some processes benefit more from Some processes benefit more from parallelism, some from memoryparallelism, some from memory

32-bit performance and scale is best 32-bit performance and scale is best increased by parallelismincreased by parallelism

64-bit enables highly scalable 64-bit enables highly scalable memory operationsmemory operations

Page 22: Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation

For More InformationFor More Information Analysis Services TechCenterAnalysis Services TechCenter

http://www.microsoft.com/technet/prodtechnol/http://www.microsoft.com/technet/prodtechnol/sql/2005/technologies/ssisvcs.mspxsql/2005/technologies/ssisvcs.mspx

Developer CenterDeveloper Center http://msdn.microsoft.com/sql/bi/integration/defhttp://msdn.microsoft.com/sql/bi/integration/def

ault.aspxault.aspx

Great information available at Great information available at www.sqlis.comwww.sqlis.com

Project RealProject Real http://www.microsoft.com/sql/solutions/bi/projechttp://www.microsoft.com/sql/solutions/bi/projec

treal.mspxtreal.mspx

On-demand WebcastsOn-demand Webcasts http://www.microsoft.com/events/series/sqlservhttp://www.microsoft.com/events/series/sqlserv

erbi.mspxerbi.mspx