copying, managing, and transforming data with dts

29
Copying, Managing, and Transforming Data With DTS

Upload: cecilia-barber

Post on 23-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Copying, Managing, and Transforming Data With DTS

Copying, Managing, and Transforming Data

With DTS

Page 2: Copying, Managing, and Transforming Data With DTS

Defining Bulk Insert Task Functionality

Quickly Loads Data from a File into SQL Server

Encapsulates the Transact-SQL Bulk Insert Statement

Supports Table or View Destinations in SQL Server

Loads Data with No Applied Transformations

Supports Format Files to Specify File Layout

Requires Sysadmin or Bulkadmin Fixed Server Roles Membership

The Bulk Insert Task is One of Three Ways to Run SQL Server Bulk Copy Operations

Page 3: Copying, Managing, and Transforming Data With DTS

Sidebar: SQL Server Bulk Copy Operations

1) Bcp Utility

2) Bulk Insert Task or T-SQL Bulk Insert Statement

3) Bulk Copy APIs for OLE DB, ODBC, DB-Library Applications

What Do Bulk Copy Operations Offer?

Allow Fast Loading of Data into SQL Server

Configure Data Load Batches

Allow You to Control Logging Operations

Ways to Access Bulk Copy Operations

Page 4: Copying, Managing, and Transforming Data With DTS

Defining the Sales_stage Table Load

DTSDTS

PolarisPolarisTab Delimited FileTab Delimited File

Using the Bulk Insert Task to Load Tab-delimited File Data into Sales_stage

Loading Sales_stage with Data Bound for Sales_fact

Page 5: Copying, Managing, and Transforming Data With DTS

Defining Execute SQL Task Functionality

Executing SQL Statements

Source database must understand SQL syntax SQL statement determines task performance Task supports single or multiple SQL statements You can create queries in the DTS Query Designer

Running Parameterized Queries

Input parameters Output parameters

Page 6: Copying, Managing, and Transforming Data With DTS

Using Parameterized Queries

Understanding Global Variable Basics

User-defined storage locations

Information is shared across package steps

Using Parameters with Global Variables

Assign global variable values to query input parameters

Store query results to a global variable with output parameters

Page 7: Copying, Managing, and Transforming Data With DTS

Creating Dynamic Queries

SELECT *

FROM product_dim

WHERE product_name = ?

AND category_name = ?

Parameter 1

Parameter 2

Parameter

Parameter 1 The Parameter’s Position in the Query Determines Its Name

ProductName

CategoryName

Global Variables

Global Variables Provide Data to Input Parameters

? Question Marks Represent Query Parameters

Page 8: Copying, Managing, and Transforming Data With DTS

Storing Query Results

SELECT begin_date,

end_date

FROM financial_period

WHERE quarter = 1

SELECT *

FROM product

Global Variables

BeginDate

EndDate

Product

Global Variable

Output Parameter

begin_date

end_date

Storing Row Values

Storing Entire Rowsets

Store Query Results in Global Variables

Output Parameter

Entire Rowset

Page 9: Copying, Managing, and Transforming Data With DTS

Time_dim_build

Stored Procedure

DTSDTSInput Parameters

- @p_start_date

- @p_end_date

Defining the Time_dim Data Load

Page 10: Copying, Managing, and Transforming Data With DTS

Defining the DTS Data Pump

DTS Mechanism for Moving and Transforming Data

Allows for High-speed Batch Copying of Data

Contains Supplied Data Transformations

Can Also Define ActiveX Script Transformations

Provides An Extendable COM-based Architecture That Allows for Custom Transformations (C++)

Permits the Application of Transformation Logic to Specific Phases of a Data Pump Operation

Multi Phase Data Pump

Page 11: Copying, Managing, and Transforming Data With DTS

Understanding How the Data Pump Processes Data

OLE DB OLE DB ODBCODBC

Source

OLE DB OLE DB ODBCODBC

Destination

DTS Data PumpDTS Data PumpInIn OutOut

ActiveX ScriptActiveX ScriptCopyCopyTrim StringTrim String……CustomCustom

X Forms

1. Connects to the source and destination

2. Reads OLE DB metadata about source and destination columns

3. Gathers data transformation definitions

4. Implements the transformation

5. Writes completed record to the destination

Page 12: Copying, Managing, and Transforming Data With DTS

Defining the Tasks That Transform Data

The Transform Data Task

Inserts

The Data Driven Query Task

Inserts

Updates

Deletes

The ParallelDataPumpTask

Processes hierarchical rowsets

The Transform Data Task

Inserts

Page 13: Copying, Managing, and Transforming Data With DTS

Defining the Transform Data Task

Data Movement and Transformation Functionality

Copying data between heterogeneous data sources

Applying optional column level transformations

Extended Data Transfer Functionality

Supporting batch processing of data

Providing error-handling capabilities

Containing optimization settings for SQL Server destinations

Page 14: Copying, Managing, and Transforming Data With DTS

Selecting Transformation Types

TransformationTransformationTransformationTransformation DescriptionDescriptionDescriptionDescription

ActiveX ScriptActiveX Script Invokes user-defined ActiveX scripts.Invokes user-defined ActiveX scripts.

Copy ColumnCopy Column Copies data from source to destination.Copies data from source to destination.

DateTime StringDateTime String Converts a date to a new destination format.Converts a date to a new destination format.

Lowercase StringLowercase String Converts a string to lowercase characters.Converts a string to lowercase characters.

Uppercase StringUppercase String Converts a string to uppercase characters.Converts a string to uppercase characters.

Middle of StringMiddle of String Extracts a sub string of source data.Extracts a sub string of source data.

Trim StringTrim String Removes white space from a source string. Removes white space from a source string.

Read FileRead File Copies contents of a file to a destination column. File path is specified by a source column.Copies contents of a file to a destination column. File path is specified by a source column.

Write FileWrite File Copies contents of a source column to a file. File path is specified by a second source column.Copies contents of a source column to a file. File path is specified by a second source column.

Page 15: Copying, Managing, and Transforming Data With DTS

Defining Column Mappings

One-to-One Mappings

Symmetric Many-to-Many Mappings

Asymmetric Mappings

Page 16: Copying, Managing, and Transforming Data With DTS

Creating Efficient Column Mappings

Minimizing the Number of Column Mappings

Using Many-to-Many Mappings When Possible

Grouping Common Transformations Together

Page 17: Copying, Managing, and Transforming Data With DTS

Loading Customer_dim

Northwind OLTP SQL Server Database

Page 18: Copying, Managing, and Transforming Data With DTS

Performance Settings

Enabling Fast Load

Using high-speed bulk copy processing

Accepting batches of transformed data

Only applies to SQL Server destinations

Using a Table Lock

Configuring Batch Size

Page 19: Copying, Managing, and Transforming Data With DTS

Configuring Batch Size

Assembling Records into Groups

DTS commits records to database as a group

Insert batch size sets the number of records in the group

Understanding Default Behavior

Insert batch size is 0

DTS assigns one batch for all records

Setting the Insert Batch Size

Value between 0-9999

Setting value can improve performance

Page 20: Copying, Managing, and Transforming Data With DTS

Defining SQL Solutions

You Can Use the Source Query of the Transform Data Task to Implement Data Transformations

The Source SQL Statement Must Be Understood by the Source Database

The Performance of the Source Query Depends on the SQL Statement

You Can Use Parameters in the Source Query to Create Dynamic Source SQL Statements

If You Use the Source Query to Manipulate Data, You Can Use the Copy Column Transformation to Load Data into the Destination

Page 21: Copying, Managing, and Transforming Data With DTS

Applying SQL Solutions to Load Fact Tables

Using the Source Query to Join Staging Table Data to Dimension Tables

Retrieving Primary Key Values to Store as Foreign Keys on the Fact Table

Using a Copy Column Transformation in the Transform Data Task

Configuring Fast Load for SQL Server Destinations

Page 22: Copying, Managing, and Transforming Data With DTS

Loading the Fact Table

DimensionTables

DimensionTables

customer_dimcustomer_dimcustomer_dimcustomer_dim201 ALFI Alfreds201 ALFI Alfreds

product_dimproduct_dimproduct_dimproduct_dim 25 123 Chai 25 123 Chai

Source Data

customer idcustomer id

ALFI ALFIALFI

123 1/1/2000 400

134 1/1/2000134 1/1/2000

time_dimtime_dimtime_dimtime_dim

product idproduct id order dateorder date quantity_salesquantity_sales amount_salesamount_sales

10,789123 1/1/2000 400 10,789

cust_keycust_key

123 1/1/2000 400

prod_keyprod_key time_keytime_key quantity_salesquantity_sales amount_salesamount_sales

25 134 400 10,789201

Sales Fact Data

Identifying Dimension Application Key Values in the Fact Table Source Data

Retrieving Primary Keys from Each Dimension Table to Assign Foreign Keys

Page 23: Copying, Managing, and Transforming Data With DTS

Loading Sales_fact

DTSDTS

Extracting Data from the Sales_stage Table

Assigning Foreign Keys by Retrieving Primary Keys from the Product_dim, Customer_dim, and Time_dim Dimensions

Page 24: Copying, Managing, and Transforming Data With DTS

Best Practices - Performing Inserts

Bulk Insert Task

Accessing data in files Loading data into SQL Server destinations Copying data with no transformations

Transform Data Task

Accessing any source Loading to any destination Creating data transformations Using input parameters in the source query Applying custom logic to phases of the data pump

Page 25: Copying, Managing, and Transforming Data With DTS

Best Practices - Performance Settings

Tuning the Transform Data Task

Fast load for SQL Server destinations

Batch size

Table lock

Tuning the Bulk Insert Task

Sort order for clustered indexes

Batch size

Table lock

Page 26: Copying, Managing, and Transforming Data With DTS

Best Practices - Executing Flexible Queries

The Data Driven Query Task

Execute flexible queries on a row-by-row basis

Meet flexibility needs that outweigh performance needs

Perform non-insert queries

The Execute SQL Task

Execute SQL statements and extended SQL statements

Perform parameterized queries

Assign query outputs to global variables

Page 27: Copying, Managing, and Transforming Data With DTS

Best Practices - Using Custom Tasks

Creating Reusable Functions and Utilities

Adding Functionality to DTS Package Designer

Implementing a Faster Alternative to ActiveX Script Tasks

Page 28: Copying, Managing, and Transforming Data With DTS

Best Practices - Creating Efficient Column Mappings

Minimizing the Number of Column Mappings

Using Many-to-Many Mappings When Possible

Grouping Common Transformations Together

Page 29: Copying, Managing, and Transforming Data With DTS

Best Practices - The Right Transformation Type

Using Supplied Transformations When Possible

Minimizing ActiveX Script Transformations When Performance Outweighs Flexibility

Using SQL Solutions with Copy Column Transformations

Developing Custom Transformations as a Faster Alternative to ActiveX Script Transformations