building data integration solutions with integration services donald farmer group program manager...
TRANSCRIPT
Building Data Integration Solutions with Integration Services
Building Data Integration Solutions with Integration Services
Donald FarmerDonald FarmerGroup Program ManagerGroup Program ManagerMicrosoft CorporationMicrosoft Corporation
AgendaAgenda
Integration Services Overview Integration Services Overview Building Packages DemoBuilding Packages Demo SSIS LifecycleSSIS Lifecycle
Tools, management, security, deploymentTools, management, security, deployment Managing Packages Demo Managing Packages Demo Troubleshooting Troubleshooting
Log data, error flows, Log data, error flows, SummarySummary
Enterprise ETL platformEnterprise ETL platform High performanceHigh performance High scaleHigh scale
Best in class usabilityBest in class usability Rich development Rich development
environmentenvironment Source controlSource control Visual debugging of Visual debugging of
control control flow and dataflow and data
Great range of transforms Great range of transforms
out-of-the-boxout-of-the-box Highly extensibleHighly extensible
Custom tasksCustom tasks Custom enumerationsCustom enumerations Custom transformationsCustom transformations Custom data sourcesCustom data sources
Integration ServicesIntegration ServicesBreakthrough ETL CapabilitiesBreakthrough ETL Capabilities
Data Integration Data Integration ArchitectureArchitecture Before Integration ServicesBefore Integration Services
Call centre data: semi structured
Legacy data: binary files
Application database
ETL Warehouse
Reports
Mobiledata
Data mining
Alerts and escalation
Integration and warehousing require separate, staged, operations. Preparation of data requires different, often incompatible, tools. Reporting and escalation is a slow process, delaying smart
responses. Heavy data volumes make this scenario increasingly unworkable.
Handcoding
StagingText Mining
ETL Staging
Cleansing &
ETL
Staging
ETL
Call centre: Semi-structured data
Legacy data: Binary files
Application database
Alerts and escalation
Integration and warehousing are a seamless, manageable, operation.
Sourced, prepare and load data in a single, auditable process. Reporting and escalation can be parallelized with the warehouse
load. Scales to handle heavy and complex data requirements.
SQL Server Integration Services
Text miningcomponents
Customsource
Standardsources
Data cleansingcomponents
Merges
Data miningcomponents
Warehouse
Reports
Mobiledata
Data Integration Data Integration ArchitectureArchitecture With Integration ServicesWith Integration Services
How SQL Server How SQL Server Integration Services Integration Services WorksWorks
Data sources can be diverse, including custom or scripted adapters Transformation components shape and modify data in many ways. Data is routed by rules or error conditions for cleansing and
conforming. Flows can be as complex as your business rules, but highly
concurrent. And finally data can be loaded in parallel to many varied
destinations.
Data CleansingData Cleansing Provides data mining and AI expertiseProvides data mining and AI expertise Domain-independent data cleansingDomain-independent data cleansing
Fuzzy lookupFuzzy lookup Lookup on approximate matchesLookup on approximate matches Tune for best matchTune for best match
De-duplicationDe-duplication Eliminate approximate duplicatesEliminate approximate duplicates ““Windows XP”, “WinXP”, etc. Windows XP”, “WinXP”, etc. Tune for confidenceTune for confidence
Managing Slowly Changing Managing Slowly Changing DimensionsDimensions E.g. Sales organization changesE.g. Sales organization changes E.g. Customer movementE.g. Customer movement E.g. Product category changesE.g. Product category changes
SQL Server Integration ServicesSQL Server Integration ServicesNew Paradigm for the ETL PlatformNew Paradigm for the ETL Platform
Building PackagesBuilding Packages
SSIS Life Cycle toolsSSIS Life Cycle toolsFeature summaryFeature summary
Design the SSIS ‘Package’Design the SSIS ‘Package’ Business Intelligence Studio (visual Studio)Business Intelligence Studio (visual Studio) Migration wizard for pre SQL 2005 packagesMigration wizard for pre SQL 2005 packages Visual Source Safe IntegrationVisual Source Safe Integration
Deployment/ExecutionDeployment/Execution Deployment Utility to copy packagesDeployment Utility to copy packages Command Line execution (dtexec.exe and dtexecui.exe)Command Line execution (dtexec.exe and dtexecui.exe) Flexible Configuration OptionsFlexible Configuration Options
SupportabilitySupportability Rich per package Logging (Log Providers)Rich per package Logging (Log Providers) SQL Management Studio for monitoring running SQL Management Studio for monitoring running
packages and organizing stored packages (using SSIS packages and organizing stored packages (using SSIS windows service).windows service).
Checkpoint - Restart abilityCheckpoint - Restart ability
Sample Server LayoutSample Server Layout
Destination data
Source data
Source Flat Files
SSIS package
error rows
SSIS package Logging
SSIS packages stored in
SQL
SSIS Package Execution
SSIS support Servers
Packages on file system
Logging and Log ProvidersLogging and Log Providers Log entries are a blend of status and result Log entries are a blend of status and result
messages.messages. Can select what ‘details’ per control flow Can select what ‘details’ per control flow
object within each package (e.g. OnError, object within each package (e.g. OnError, OnWarning, OnPreExecute)OnWarning, OnPreExecute)
Can select what fields (e.g. computer, Can select what fields (e.g. computer, operator, ExecutionID…)operator, ExecutionID…)
Can define multiple log providers (SQL, text Can define multiple log providers (SQL, text file, Windows Event..) per packagefile, Windows Event..) per package
BIDS has a Log Events window to see the log BIDS has a Log Events window to see the log entries that are headed for the log providerentries that are headed for the log provider
SSIS Windows ServiceSSIS Windows Service Monitors and manages running as well as Monitors and manages running as well as
stored packages, via SQL Management stored packages, via SQL Management Studio Studio
Service Installed when you install SSISService Installed when you install SSIS Service is not required to design or execute Service is not required to design or execute
PackagesPackages Stored Packages tree is based on XML Stored Packages tree is based on XML
configuration file configuration file You can customize the file contents, name, You can customize the file contents, name,
and locationand location Windows events for service (service start, Windows events for service (service start,
service failed to start, package started, service failed to start, package started, package stopped…)package stopped…)
Overview of SSIS securityOverview of SSIS security
‘‘Security’ is comprised of several Security’ is comprised of several layers to support both SQL and File layers to support both SQL and File System Based scenarios.System Based scenarios.
Packages can be EncryptedPackages can be Encrypted Packages can be Digitally SignedPackages can be Digitally Signed Packages can be stored in SQL DB and Packages can be stored in SQL DB and
protected with SQL Rolesprotected with SQL Roles
Overview of SSIS securityOverview of SSIS security
(A) A package can be encrypted via the Package Protection (A) A package can be encrypted via the Package Protection level property as well as stored inside MSDB, where access to level property as well as stored inside MSDB, where access to the package is limited by SQL Database rolesthe package is limited by SQL Database roles
OS Permissions•File and folder access control for file based packages
•View/stop running packages
Package Protection
Level Property
•Reader and Writer roles for packages stored in MSDB
SQL DB Roles
AB
(B) A package can be encrypted via the Package Protection level (B) A package can be encrypted via the Package Protection level property as well as access to the package file is controlled by property as well as access to the package file is controlled by folder/file permissions at the operating system levelfolder/file permissions at the operating system level
Encrypt or clear sensitive properties
Package Signature
DeploymenDeployment Flowt Flow
Tools to Tools to organize and organize and ‘copy’ ‘copy’ packages and packages and supporting supporting filesfiles
•Design Package•Add Configurations•Add Miscellaneous files•Set Project Deployment properties•Build Project
Installation Installation WizardWizard
•Execute manifest file•Choose Destination (SQL File System) •Modify protection level•Choose location of supporting files•Change configurations
Bi StudioBi Studio
•Copy/Move Deployment folder\files YouYou
•Create desired agent jobs SQL AgentSQL Agent
Deploying SSIS packagesDeploying SSIS packages Designer can build a deployment file set Designer can build a deployment file set
which includes a project’s package(s), which includes a project’s package(s), configuration files, and installer filesconfiguration files, and installer files
You move the installer file set to another You move the installer file set to another server\environment and execute, to server\environment and execute, to install packages to SQL or the File install packages to SQL or the File system.system.
Deployment is not a version sync\check Deployment is not a version sync\check tool such as SMStool such as SMS
Installer can ‘copy’ miscellaneous files Installer can ‘copy’ miscellaneous files but will not ‘install’ them, such as but will not ‘install’ them, such as custom component.custom component.
SQL Management StudioSQL Management Studio
Requires the SSIS serviceRequires the SSIS service Allows Monitoring and Stopping of Allows Monitoring and Stopping of
currently Executing packagescurrently Executing packages Maintain stored package structure and Maintain stored package structure and
set roles for SQL stored packagesset roles for SQL stored packages You can connect\view multiple SSIS You can connect\view multiple SSIS
servers at one timeservers at one time Adhoc Package execution from folder Adhoc Package execution from folder
treetree
Log Data for Log Data for TroubleshootingTroubleshooting Logging and Error Flow data are core for Logging and Error Flow data are core for troubleshootingtroubleshooting
Can save\load logging detail templatesCan save\load logging detail templates Children packages bubble entries to parentChildren packages bubble entries to parent Package IDs need to be unique. You can re-Package IDs need to be unique. You can re-
generate the ID via BIDS or dtutil.exe generate the ID via BIDS or dtutil.exe Example: This will regenerate package IDs Example: This will regenerate package IDs
for all packages in a folder…be it 2 or 200:for all packages in a folder…be it 2 or 200: for %%f in (C:\_work\SSISPackages\for %%f in (C:\_work\SSISPackages\
_quick\Notepad\*.dtsx) do dtutil.exe /i _quick\Notepad\*.dtsx) do dtutil.exe /i /File %%f/File %%f
Error Flows in Data Flow Error Flows in Data Flow TaskTask Error flows can be controlled per Error flows can be controlled per field per row (fail component, field per row (fail component, redirect, ignore)redirect, ignore)
Just another flow…to your Just another flow…to your destinationdestination
Error flows can all be directed to a Error flows can all be directed to a central location for centralized central location for centralized operationsoperations
Includes error code and column idIncludes error code and column id Can couple with a Row Sampling Can couple with a Row Sampling
transformtransform
Other Troubleshooting Other Troubleshooting ideasideas Performance Counters Performance Counters (SQLServer:SSISPipeline and (SQLServer:SSISPipeline and SQLServer:SSISService:)SQLServer:SSISService:)
Integration with MicrosoftIntegration with Microsoft®® Operations Operations ManagerManager
SQL Agent has log data as well and proxies SQL Agent has log data as well and proxies to limit package executionto limit package execution
General package design to log row counts, General package design to log row counts, multicast, and save variablesmulticast, and save variables
Webcast: Scalability, Performance and Webcast: Scalability, Performance and Optimization in SSISOptimization in SSIS
SummarySummary
SQL Server Integration Services is an SQL Server Integration Services is an exceptionally high performance exceptionally high performance integration and transformation toolintegration and transformation tool
Some processes benefit more from Some processes benefit more from parallelism, some from memoryparallelism, some from memory
32-bit performance and scale is best 32-bit performance and scale is best increased by parallelismincreased by parallelism
64-bit enables highly scalable 64-bit enables highly scalable memory operationsmemory operations
For More InformationFor More Information Analysis Services TechCenterAnalysis Services TechCenter
http://www.microsoft.com/technet/prodtechnol/http://www.microsoft.com/technet/prodtechnol/sql/2005/technologies/ssisvcs.mspxsql/2005/technologies/ssisvcs.mspx
Developer CenterDeveloper Center http://msdn.microsoft.com/sql/bi/integration/defhttp://msdn.microsoft.com/sql/bi/integration/def
ault.aspxault.aspx
Great information available at Great information available at www.sqlis.comwww.sqlis.com
Project RealProject Real http://www.microsoft.com/sql/solutions/bi/projechttp://www.microsoft.com/sql/solutions/bi/projec
treal.mspxtreal.mspx
On-demand WebcastsOn-demand Webcasts http://www.microsoft.com/events/series/sqlservhttp://www.microsoft.com/events/series/sqlserv
erbi.mspxerbi.mspx