{ Integration Services Best Practices}Itay Braun
BI and SQL Server Consultant
Email: [email protected]: http://blogs.microsoft.co.il/blogs/itaybraun/
BI User Group Messages
New website for SQL Server in Hebrew: www.sqlserver.co.ilTwingo is looking for experienced BI / SQL Server developers. At least two years experience. Please contact [email protected] for more detailsIf you are looking for employees or looking for a job, please contact Yossi Elkayam [email protected]
Agenda
If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks
If It moves – Log it!
SSIS Log ProvidersEvent handlersAnalyzing the dataDon’t forget the jobs
SSIS Log Providers
Used to capture run-time information about a packageHelps to audit and troubleshoot a package every time it is runIntegration Services includes the following log providers:
The Text File log provider (CSV)The SQL Server Profiler log providerThe SQL Server log provider (sysssislog table) The Windows Event log providerThe XML File log provider
SSIS Log Providers
All tasks share the same basic eventsEach task also has unique events
Custom Logging Using Event Handlers
Build manually the table and eventsAllows better control on the collected dataFor Ex.
Row countImportant step was finished
Event Handlers
Simple SSIS package within the packageMostly used to response to OnError events
Log and sending email
Analyzing the Data
SQL 2008 – sysssislog table
http://technet.microsoft.com/en-us/library/ms186984.aspx SQL 2005 – sysdtslog90http://msdn.microsoft.com/en-us/library/ms186984(SQL.90).
aspx Analyze:
Total execution timeSSAS partition processing timeErrors and Warnings Time elapsed between PackageStart and PackageEnd
Don’t forget the jobs
Don’t forget to monitor the execution of the ETL jobs. Use Reporting Services to write simple reports about the ETL execution process.
Agenda
If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks
Performance Counters
Understanding resource utilizationCPU BoundMemory BoundI/O BoundNetwork Bound
Performance Counters - CPU
Processor timeProcess / % Processor Time (Total)
sqlservr.exe and dtexec.exe Do the tasks run in parallel
Performance Counters – Memory
Process / Private Bytes (DTEXEC.exe) – The amount of memory currently in use by Integration Services. Process / Working Set (DTEXEC.exe) – The total amount of allocated memory by Integration Services.SQL Server: Memory Manager / Total Server Memory: The total amount of memory allocated by SQL Server. Memory / Page Reads / sec – Represents to total memory pressure on the system.
If this consistently goes above 500, the system is under memory pressure.
Performance Counters - Memory
SSIS Pipeline/ Buffers in use - the number of pipeline buffers in use throughout the pipeline.Buffer Spooled / Buffer Spooled - The number of buffers spooled to disk. Buffer spooled has initial value of 0. When it goes above 0, it indicates that the engine has started memory swapping.Rows Read - The number of rows read from all data sources in total.Rows Written - The number of rows written to all data destinations in total.
Performance Counters – I/O
To ensure that Integration Services is minimally writing to disk, SSIS should only hit the disk when it reads from the source and writes to the target. For SAN / NAS use the vendors applications
Performance Counters - Network
SSIS moves data as fast as the network is able to handle it.Network Interface / Current Bandwidth: This counter provides an estimate of current bandwidth.Network Interface / Bytes Total / sec: The rate at which bytes are sent and received over each network adapter.Network Interface / Transfers/sec: Tells how many network transfers per second are occurring.
If it is approaching 40,000 IOPs, then get another NIC card and use teaming between the NIC cards.
Agenda
If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks
Package Configuration
the package needs to know where it is moving data from and where it is moving data toTypically Integration Services packages are built on a different environment to where they are intended to be executed in production.
Package Configuration
Object which can be configures:TasksContainers VariablesConnection ManagersData Flow Components
Configuration Types
XML Configuration FileMost popular configuration typeEasy deploymentDisadvantage - Path to the .dtsconfig file must be hard coded within the package
Environment Variable Configuration Takes the value for a property from whatever is stored in a named environment vriableStores the property path inside the package and the value outside the package
Configuration Types
Parent Package ConfigurationFetch a value from a variable in a calling packageStores the property path inside the package and the value outside the package.
Registry ConfigurationThe value to be applied to a package property is stored in a registry entrystores the property path inside the package and the value outside the package
Configuration Types
SQL Server Configuration stored in a SQL Server table. The table can have any name you like, and can be in any database on any server that you like.
Configuration Best Practices
Consider command-line options as an alternative to configurations
The /SET option used to apply a value to some property in the package that is being runThe /CONFIGFILE option used to tell the package to use an XML configuration file, even if one has not been defined in the package
Configure Only the ConnectionString Property for Connection Managers
Instead of Servername, initialCatalog, UserName, Password
Don’t save the password in XML files
Agenda
If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks
Lookup Optimization
Use the NOLOCK or TABLOCK hints to remove locking overheadTo optimize memory usage, SELECT only the columns you actually needIf possible, perform datetime conversions at the source or target databases, as it is more expensive to perform within Integration Services.In SQL Server 2008 Integration Services, there is a new feature of the shared lookup cache.
Lookup Optimization
Commit size 0 is fastest on heap bulk targets
because only one transaction is committed
If commit size = 0 is not possible, use the highest possible value of commit size
to reduce the overhead of multiple-batch writing
Commit size = 0 is a bad idea if inserting into a Btree
all incoming rows must be sorted at once into the target Btree
Lookup Optimization
Batchsize = 0 is ideal for inserting into a heap.
For an indexed destination, I recommend testing between 100,000 and 1,000,000 as batch size.
Use a commit size of <5000 to avoid lock escalation when insertingUse partitions and partition SWITCH commandMore info here: Getting Optimal Performance with Integration Services Lookups.
Agenda
If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks
Data Profiling
New Feature in SSIS 2008Used to profile the data
Null valuesValues distributionColumn length
Agenda
If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks
Other tips
Make data types as narrow as possible so you will allocate less memory for your transformationWatch precision issues when using the money, float, and decimal types.
money is faster than decimal, and money has fewer precision considerations than float
Other Tips
Do not sort within Integration Services unless it is absolutely necessary.
In order to perform a sort, Integration Services allocates the memory space of the entire data set that needs to be transformed
There are times where using Transact-SQL will be faster than processing the data in SSIS.
As a general rule, any and all set-based operations will perform faster in Transact-SQL.
Other Tips
To perform delta detection, you can use a change detection mechanism such as the new SQL Server 2008 Change Data Capture (CDC) functionality
Resources
Custom logging using event handlers: http://blogs.conchango.com/jamiethomson/archive/2005/06/11/SSIS_3A00_-Custom-Logging-Using-Event-Handlers.aspx Best Practices for Integration Services Configurations - http://msdn.microsoft.com/en-us/library/cc671628.aspx Other best practices - http://bi-polar23.blogspot.com/2007/11/ssis-best-practices-part-1.html
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.