exploring scalability, performance and deployment
DESCRIPTION
TRANSCRIPT
SSIS Exploring Scalability, Performance and Deployment
Vinod Kumar MTechnology Evangelist – DB and BIMicrosoftwww.ExtremeExperts.com
Objectives and Takeaways
A high level viewDesign considerationsHow to measure performancePerformance implications of architectureManageability aspects of SSISDeployment tips
Out of scopePrescriptive guidance for specific situations
Agenda
Quick IntroductionUnderstanding Buffers and MemoryOVAL Concept DetailedComponent Specific NotesManageability FeaturesDeployment Considerations
Introduction
SSIS Life Cycle toolsDesign the SSIS Package
Business Intelligence Studio (visual Studio)Migration wizard for pre SQL 2005 packagesVersion Control Integration (VSS)
Deployment/ExecutionDeployment Utility to copy packagesCommand Line execution (dtexec.exe and dtexecui.exe)Flexible Configuration Options
SupportabilityRich per package Logging SQL Management Studio for monitoring running packages and organizing stored packages Checkpoint - Restartability
Deep dive - Performance
Buffers and MemoryBuffers based on design time metadata
The width of a row determines the size of the bufferSmaller rows = more rows in memory = greater efficiency
Memory copies are expensive!A buffer might have placeholder columns filled by downstream componentsPointer magic where possible
Component Types
Logically works at a row levelBuffer ReusedData Convert, Derived Column
Row based(synchronousoutputs)
Partially Blocking(asynchronousoutputs)
Blocking(asynchronousoutputs)
May logically work at a row levelData copied to new buffersMerge, Merge Join, Union All
Needs all input buffers before producing any output rowsData copied to new buffersAggregate, Sort
CPU Utilization
Execution TreeStarts from a source or an async outputEnds at a destination or an input that has no sync outputs
Each Execution Tree can get a worker threadMaxEngineThreads to control parallelism
Performance StrategyUse OVAL to identify the factors affecting data integration performance…
Operations
Which app is best suited to these operations on this volume of data? For example, use SQL Server or SSIS for sorting data?
Volume
Application
Location
How much data must be processed?
What logic should be applied to the data?
Where should the app run? For example, on a shared server, or on a standalone machine?
An OVAL Example—Loading a Text File
Simple scenario…
Interesting performance considerations!Text file on Server 1 SQL Server on Server 2
Understand all operations performed
Operations
Beware of hidden operationsData conversion in either step 3 or 4
1. Open a transaction on SQL Server2. Read data from the text file3. Load data into the SSIS data flow4. Load the data into SQL Server5. Commit the transaction
VolumeReduce where possible
Don’t push unneeded columnsConditional split for filtering rowsDo not parse or convert columns unnecessarily
In a fixed-width format you can combine adjacent unneeded columns into oneLeave unneeded columns as strings
Application Is SSIS right for this?
Overhead of starting up an SSIS package may offset any performance gain over BCP for small data sets.
Is BCP good enough?Is the greater manageability and control of SSIS needed?
Bulk Import Task vs. Data Flow
LocationConsider the following configuration …
Text file on Server 1 SQL Server on Server 2
Where should SSIS run? (Licensing issues aside)
Measuring Performance
OVAL does not provide prescriptive guidanceToo many variables
Improve performance by applying OVAL and measuring
SSIS LoggingPerformance countersSQL Server Profiler
For extract queries, lookups and loading
ParallelismFocus on critical pathUtilize available resources
Memory Constrained Reader and CPU Constrained
Let it rip! Optimize the slowest
Moving Ahead
Manageability Features
Logging and Log ProvidersCheckpoint RestartabilityPrecedence ConstraintsConfigurationsSSIS Service
CheckpointingCheckpoint File Created
Write Checkpoint
Write Checkpoint
Write Checkpoint
Checkpoint File deleted
Package Loads
Package Completes
Data Flow Task
Data Flow Task
Send Mail Task
Configuration Scenario
Dev DB
Multiple Configurations
DevTest Production
Test DB Prod DB
Machines where packages are being designed /tested /executed
Configuration updates package on load with DB locations (and mail server, file share locations….)
Package Handoff
Precedence constraints
Directs Flow from object to object…Basically, ‘when do I move on’Success, Failure, Completion or one of those plus an expression (condition)
Dataflow Task
SendMail Task
Success
Completion
Failure
Success & expression
Tackle the basics …Manageability …
Deployment Flow
Tools to organize and ‘copy’ packages and supporting files
•Design Package•Add Configurations•Add Miscellaneous files•Set Project Deployment properties•Build
•Choose Destination (SQL File System) •Modify protection level•Choose location of supporting files•Change configurations•Execute Installation Wizard
Bi Studio
•Copy/Move Deployment folder\files User
•Create desired agent jobs SQL Agent
•Copy/Move Deployment folder\files User
SQL Management Studio
Utilizes the SSIS serviceAllows Monitoring of currently Executing packagesMaintain stored package structureAd hoc Package execution
Simple flow …Deployment …
SSIS: SummaryFast !
Data flows process large volumes of data efficiently - even through complex operationsExceptional price / performance on multi-core
Feature RichMany pre-built adapters and transformations reduce hand codingExtensible object model enables specialized custom or scripted componentsHighly productive visual environment speeds development and debuggingIntegral part of a complete BI stack (IS-AS-RS)
Beyond ETLEnables integration of XML, RSS and Web Services dataData cleansing features enable “difficult” data to be handled during loadingData and Text mining allow “smart” handling of data for imputation of incomplete data, conditional processing of potential problems, or smart escalation of issues such as fraud detection
Your Feedbackis Important!
Please Fill Out the feedback form
Questions !!!
धन्यवा�दઆભા�ર ধন্য�বা�দ
ਧੰ�ਨਵਾ�ਦ
ଧନ୍ୟ�ବା�ଦ
நன்றி�
ధన్య�వాదాలు� ಧನ್ಯ�ವಾ�ದಗಳು
നി�ങ്ങള്ക്ക്� നിന്ദി�
question & answer
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.