ssis best practices israel bi u ser group itay braun

Post on 24-Dec-2014

1.756 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

{ Integration Services Best Practices}Itay Braun

BI and SQL Server Consultant

Email: itay@twingo.co.ilBlog: http://blogs.microsoft.co.il/blogs/itaybraun/

BI User Group Messages

New website for SQL Server in Hebrew: www.sqlserver.co.ilTwingo is looking for experienced BI / SQL Server developers. At least two years experience. Please contact itay@twingo.co.il for more detailsIf you are looking for employees or looking for a job, please contact Yossi Elkayam yelkayam@microsoft.com

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

If It moves – Log it!

SSIS Log ProvidersEvent handlersAnalyzing the dataDon’t forget the jobs

SSIS Log Providers

Used to capture run-time information about a packageHelps to audit and troubleshoot a package every time it is runIntegration Services includes the following log providers:

The Text File log provider (CSV)The SQL Server Profiler log providerThe SQL Server log provider (sysssislog table) The Windows Event log providerThe XML File log provider

SSIS Log Providers

All tasks share the same basic eventsEach task also has unique events

Custom Logging Using Event Handlers

Build manually the table and eventsAllows better control on the collected dataFor Ex.

Row countImportant step was finished

Event Handlers

Simple SSIS package within the packageMostly used to response to OnError events

Log and sending email

Analyzing the Data

SQL 2008 – sysssislog table

http://technet.microsoft.com/en-us/library/ms186984.aspx SQL 2005 – sysdtslog90http://msdn.microsoft.com/en-us/library/ms186984(SQL.90).

aspx Analyze:

Total execution timeSSAS partition processing timeErrors and Warnings Time elapsed between PackageStart and PackageEnd

Don’t forget the jobs

Don’t forget to monitor the execution of the ETL jobs. Use Reporting Services to write simple reports about the ETL execution process.

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Performance Counters

Understanding resource utilizationCPU BoundMemory BoundI/O BoundNetwork Bound

Performance Counters - CPU

Processor timeProcess / % Processor Time (Total)

sqlservr.exe and dtexec.exe Do the tasks run in parallel

Performance Counters – Memory

Process / Private Bytes (DTEXEC.exe) – The amount of memory currently in use by Integration Services. Process / Working Set (DTEXEC.exe) – The total amount of allocated memory by Integration Services.SQL Server: Memory Manager / Total Server Memory: The total amount of memory allocated by SQL Server. Memory / Page Reads / sec – Represents to total memory pressure on the system.

If this consistently goes above 500, the system is under memory pressure.

Performance Counters - Memory

SSIS Pipeline/ Buffers in use - the number of pipeline buffers in use throughout the pipeline.Buffer Spooled / Buffer Spooled - The number of buffers spooled to disk. Buffer spooled has initial value of 0. When it goes above 0, it indicates that the engine has started memory swapping.Rows Read - The number of rows read from all data sources in total.Rows Written - The number of rows written to all data destinations in total.

Performance Counters – I/O

To ensure that Integration Services is minimally writing to disk, SSIS should only hit the disk when it reads from the source and writes to the target. For SAN / NAS use the vendors applications

Performance Counters - Network

SSIS moves data as fast as the network is able to handle it.Network Interface / Current Bandwidth: This counter provides an estimate of current bandwidth.Network Interface / Bytes Total / sec: The rate at which bytes are sent and received over each network adapter.Network Interface / Transfers/sec: Tells how many network transfers per second are occurring.

If it is approaching 40,000 IOPs, then get another NIC card and use teaming between the NIC cards.

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Package Configuration

the package needs to know where it is moving data from and where it is moving data toTypically Integration Services packages are built on a different environment to where they are intended to be executed in production.

Package Configuration

Object which can be configures:TasksContainers VariablesConnection ManagersData Flow Components

Configuration Types

XML Configuration FileMost popular configuration typeEasy deploymentDisadvantage - Path to the .dtsconfig file must be hard coded within the package

Environment Variable Configuration Takes the value for a property from whatever is stored in a named environment vriableStores the property path inside the package and the value outside the package

Configuration Types

Parent Package ConfigurationFetch a value from a variable in a calling packageStores the property path inside the package and the value outside the package.

Registry ConfigurationThe value to be applied to a package property is stored in a registry entrystores the property path inside the package and the value outside the package

Configuration Types

SQL Server Configuration stored in a SQL Server table. The table can have any name you like, and can be in any database on any server that you like.

Configuration Best Practices

Consider command-line options as an alternative to configurations

The /SET option used to apply a value to some property in the package that is being runThe /CONFIGFILE option used to tell the package to use an XML configuration file, even if one has not been defined in the package

Configure Only the ConnectionString Property for Connection Managers

Instead of Servername, initialCatalog, UserName, Password

Don’t save the password in XML files

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Lookup Optimization

Use the NOLOCK or TABLOCK hints to remove locking overheadTo optimize memory usage, SELECT only the columns you actually needIf possible, perform datetime conversions at the source or target databases, as it is more expensive to perform within Integration Services.In SQL Server 2008 Integration Services, there is a new feature of the shared lookup cache.

Lookup Optimization

Commit size 0 is fastest on heap bulk targets

because only one transaction is committed

If commit size = 0 is not possible, use the highest possible value of commit size

to reduce the overhead of multiple-batch writing

Commit size = 0 is a bad idea if inserting into a Btree

all incoming rows must be sorted at once into the target Btree

Lookup Optimization

Batchsize = 0 is ideal for inserting into a heap.

For an indexed destination, I recommend testing between 100,000 and 1,000,000 as batch size.

Use a commit size of <5000 to avoid lock escalation when insertingUse partitions and partition SWITCH commandMore info here: Getting Optimal Performance with Integration Services Lookups.

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Data Profiling

New Feature in SSIS 2008Used to profile the data

Null valuesValues distributionColumn length

Agenda

If it moves – Log it!Establishing performance baselinePackage ConfigurationLookup OptimizationData ProfilingOther tips and tricks

Other tips

Make data types as narrow as possible so you will allocate less memory for your transformationWatch precision issues when using the money, float, and decimal types.

money is faster than decimal, and money has fewer precision considerations than float

Other Tips

Do not sort within Integration Services unless it is absolutely necessary.

In order to perform a sort, Integration Services allocates the memory space of the entire data set that needs to be transformed

There are times where using Transact-SQL will be faster than processing the data in SSIS.

As a general rule, any and all set-based operations will perform faster in Transact-SQL.

Other Tips

To perform delta detection, you can use a change detection mechanism such as the new SQL Server 2008 Change Data Capture (CDC) functionality

© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.

The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after

the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

top related