9 chapter nine extracting and transforming data with sql server 2000

55
9 Chapter Nine Extracting and Transforming Data with SQL Server 2000

Upload: coleen-fox

Post on 27-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

9Chapter NineChapter Nine

Extracting and Transforming Data with SQL Server 2000

9 ObjectivesObjectives

• Understand the option in SQL Server 2000 for inserting from and extracting to external data sources

• Use the BULK INSERT statement and the BCP utility in SQL Server 2000 to move records into and out of tables and views

• Use the Data Transformation Services (DTS) interface to import and export data

9 ObjectivesObjectives

• Create and save DTS packages to perform transformations and add workflow to packages

• Identify and implement an appropriate strategy for replicating data among instances of SQL Server 2000

• Understand and configure SQL Server 2000 XML support

9 The BULK INSERT Statement

The BULK INSERT Statement

• BULK INSERT statement– Used in SQL Server 2000 to insert data into a

table or view from a data file• Architecture

– The OLE DB provider in SQL Server 2000 interprets the BULK INSERT statement natively• As a result, it is possible to use the bulk copy API

from ODBC, OLE DB, SQL-DMO and DB-Library-based applications

9 The BULK INSERT Statement

The BULK INSERT Statement

• The BULK INSERT statement uses a simple syntax for inserting data from a file into a table– The destination table name, qualified by the

database name, and the source data file are the only two parameters that must be specified for each BULK INSERT operation to succeed

9 BULK INSERT Statement Usage Scenarios

BULK INSERT Statement Usage Scenarios

• The BULK INSERT statement is a high-performance way to insert data from a file into SQL Server 2000 using a standard T-SQL statement– Best used when the data to be imported exist in a tabular,

non-hierarchical format, such as a comma-delimited file– Can be used within a stored procedure as part of a data

feed and can be initiated by scheduling a job that executes the stored procedure

– Can be used if an event or program external to SQL Server 2000 initiates the data import

9 BULK INSERT Statement Performance

BULK INSERT Statement Performance

• BULK INSERT statement and BCP are two high-performance ways to insert data into database tables in SQL Server 2000– There is an overhead associated with logging the

insert of records using these methods– SQL Server 2000 incurs this overhead to ensure

that records inserted can be rolled back and rolled forward

9 BULK INSERT Statement Performance

BULK INSERT Statement Performance

• To optimize the performance of a bulk insert, it is possible to cause SQL Server 2000 not to log the inserts of records– To do this, ensure the following are true:

• Database option “select into/bulkcopy” is set to true• Target table has no indexes or if the table has indexes, it

is empty when the bulk copy starts• Target table is not being replicated• TABLOCK hint is specified

9 Bulk Copy ProgramBulk Copy Program

• Bulk Copy Program– Used to move data into and out of SQL Server

2000• BCP syntax

– BCP utility requires a slightly different syntax than the BULK INSERT statement but uses similar parameters• A key difference in syntax is that it is necessary to

specify user name, password, database name and server name when using the BCP utility

9 BCP in ActionBCP in Action

Figure 9-1: The BCP command to export order data

9 BCP Usage ScenariosBCP Usage Scenarios

• The same requirements that lead to the selection of the BULK INSERT statement for inserting large sets of data into SQL Server 2000 can be used to select the BCP utility– In addition, you should select the BCP utility when

you need to export data from a SQL Server table into a data file and extensive transformations are not necessary

9 BCP PerformanceBCP Performance

• Like BULK INSERT statement, BCP utility is a high-performance, batch-oriented mechanism for transferring large sets of records– Because both implement the Bulk Copy API, the

performance of BCP will not differ much from the performance of the BULK INSERT statement

9 Data Transformation Services

Data Transformation Services

• Data Transformation Services (DTS)– Set of features bundled together for use in

Enterprise Manager• Interface

– SQL Server 2000 provides an intuitive, easy-to-use interface to the Data Transformation Services in the format of the Data Transformation Services Import/Export Wizard

9 InterfaceInterface

Figure 9-2: Invoking the DTS Wizard from Enterprise Manager

9 InterfaceInterface

Figure 9-3: Invoking the DTS Wizard from the program group

9 Executing a Transformation Using DTS

Executing a Transformation Using DTS

• The first two steps when executing a transformation using DTS are to correctly specify the source and target data repositories– Subsequently, it is necessary to specify several

options about the data transfer and customer transformations• At a minimum, the user executing a transformation needs

to select one or more tables or views to copy from the source to the target data repositories

9 Choosing a Data Source/DestinationChoosing a Data

Source/Destination• It is necessary to identify source and target data

repositories involved in the data transfer/transformation after invoking DTS Import/Export Wizard

• To identify each repository, specify the following:– Type of data held in each repository– Physical location of the repository– Any information required to gain access to the repository

9 Choosing a Data Source/DestinationChoosing a Data

Source/Destination

Figure 9-4: The Choose a Data Source dialog box in the DTS Wizard

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-5: The Specify Table Copy or Query dialog box

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-6: The Select Source Tables and Views dialog box

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-7: The Type SQL Statement dialog box

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-8: Using Query Builder to create SQL statements

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-9: Column Mappings tab in the Column Mappings and Transformation dialog box, which allows customization of destination column mappings

9 Specifying Table Copy or Query

Specifying Table Copy or Query

Figure 9-10: Transformation tab in the Column Mappings and Transformations dialog box, which allows custom transformations to the source data

9 Copying Objects and Data between SQL Server Databases

Copying Objects and Data between SQL Server Databases

Figure 9-11: The Select Objects to Copy dialog box

9 Saving DTS PackagesSaving DTS Packages

Figure 9-12: The Save, schedule and replicate package dialog box

9 Saving DTS PackagesSaving DTS Packages

• SQL Server– With this default save option, you can store a package as a table

in the SQL Server msdb database• SQL Server Meta Data Services

– With this save option, you can maintain historical information about the data manipulated by the package

• Structured Storage File– With this save option, you can copy, move and send a package

across the network without storing the file a SQL Server database• Visual Basic File

– With this save option, you can programmatically customize a package created in DTS Designer or DTS Import/Export Wizard

9 Editing DTS PackagesEditing DTS Packages

Figure 9-13: Invoking the DTS Designer from Enterprise Manager

9 Editing DTS PackagesEditing DTS Packages

Figure 9-14: The DTS Designer interface

9 ReplicationReplication

• Replication– Process of synchronizing copies of data stored in

multiple sites

• Uses of replication:– Allowing multiple locations to keep copies of the same

data– Separating transaction processing applications from

analytical processing applications– Allowing greater autonomy

9 ReplicationReplication

• Uses of replication (cont.):– Creating a scalable application that is based on

strategically locating data that will be accessed– Bringing data closer to individuals or groups– Using replication as part of a customized

standby server strategy

9 Replication OverviewReplication Overview

• Publisher– Server that makes data available for replication to

other servers

• Subscribers– Servers that receive replicated data

• Distributor– Server that contains the distribution database and

stores meta-data, history data and/or transactions

9 Snapshot ReplicationSnapshot Replication

• Snapshot replication– Periodically

copies entire publications as they exist at a moment in time

Figure 9-15: The components involved in snapshot replication

9 Usage Scenarios for Snapshot ReplicationUsage Scenarios for Snapshot Replication

• Snapshot replication is the best choice when the following are true:– Most of the data are static and do not change often– It is acceptable to have copies of data that are out

of data for some period of time– The total size of data publications to replicate is

small– The sites are often disconnected, eliminating the

possibility to send constant updates of data as in transactional replication

9 Transactional ReplicationTransactional Replication

• Transactional replication– Incremental

changes made on the Publisher are sent to each Subscriber

Figure 9-16: The components involved in transactional replication

9 Usage Scenarios for Transactional Replication

Usage Scenarios for Transactional Replication

• Transactional replication is an excellent choice when:– Data modification should be propagated to Subscribers

within seconds of when they occur– Transactions should be atomic– Subscribers are connected to the Publisher most of the

time– The application cannot tolerate out-of-date data on the

subscribers that receive the changes

9 Merge ReplicationMerge Replication

• Merge replication– Data

modifications can be made at each site

Figure 9-17: The components involved in merge replication

9 Usage Scenarios for Merge Replication

Usage Scenarios for Merge Replication

• Merge replication is the best selection when:– Multiple Subscribers need to update data at

various times and propagate those changes to the Publisher and to other Subscribers

– Subscribers need to receive data, make changes offline and synchronize changes later with the Publisher and other Subscribers

– The ability for each site to operate independently is critical

9 Replication ToolsReplication Tools

• SQL Server 2000 provides several easy-to-use tools for establishing and configuring replication– Each of these tools is available through Enterprise

Manager– Most take the form of wizards that allow the

definition of Publishers, Distributors, Subscribers, subscriptions and their properties

9 Replication WizardsReplication Wizards

• Configure Publishing and Distribution Wizard– Typically the first wizard that is invoked to establish replication

• Create Publication Wizard– Allows creation of publication

• Push Subscription Wizard– Allows the creation of a subscription to an existing publication

• Pull Subscription Wizard– Allows the creation of a subscription to a publication requested by

the Subscriber on the SQL Server specified in this wizard• Disable Publishing and Distribution Wizard

– Allows removal of replication publications and distribution service from the SQL Server machine

9 Filtering Published DataFiltering Published Data

• By filtering published data, it is possible to:– Minimize the amount of data sent over the network– Reduce the amount of storage space required at

the Subscriber– Customize publications and applications based on

individual Subscriber requirements– Avoid or reduce conflicts that could result if the

same data were sent multiple Subscribers

9 Replication Performance Optimization

Replication Performance Optimization

• There are several application-specific and network-specific techniques for optimizing the performance of replication– Simple way is to set minimum amount of memory allocated

to SQL Server 2000 by using the Min Server Memory option– Using a separate disk drive for the transaction log on any

server involved in replication will help to avoid the disk I/O bottleneck that can result from logging many operations

– Ensuring that the data that is replicated is then filtered as much as possible will also help to avoid unnecessary performance problems

9 Heterogeneous QueriesHeterogeneous Queries

• A heterogeneous query retrieves, inserts or updates data by joining together sets of data from multiple data source types– It is necessary to initiate a heterogeneous query from a

SQL Server database that is linked to another data source

– A linked server configuration allows remote server access and allows access of non-SQL Server data sources to be treated the same way as the access of SQL Server data sources

9 Configuring a Linked Server

Configuring a Linked Server

• To configure a linked server definition, a SQL Server administrator specifies an OLE DB provider and OLE DB data source

Table 9-1: OLE DB Providers and Data Sources

9 Configuring a Linked Server

Configuring a Linked Server

Figure 9-18: Linked server configuration

9 Managing a Linked Server Definition

Managing a Linked Server Definition

• When setting up a linked server, register connection information and data source with SQL Server 2000– After registration is accomplished, it is possible to refer to

the data source with a single logical name

• You can create or delete a linked server definition with stored procedures or through Enterprise Manager– It it possible to use stored procedures to manage a linked

server definition

9 ArgumentsArguments

• Table 9-2 describes data returned by sp_linkedservers system-stored procedure

Table 9-2: Returned Data

9 XML Support in SQL Server 2000

XML Support in SQL Server 2000

• Standard data format eXtensible Markup Language (XML) is a format for marking up data with extensible tags that can be used by applications to determine the purpose of the data– In SQL Server 2000, it is possible to make a request for relational

data in the form of XML and have the SQL OLE DB provider in SQL Server construct an XML document from the resulting data

– It is possible to insert, update or delete data in SQL Server using XML syntax to describe the data that should be inserted, updated or deleted

– Much of this functionality is available through stored procedures and general T-SQL statements that are executed against SQL Server 2000

9 XML Support in SQL Server 2000

XML Support in SQL Server 2000

• There are features that can be used directly from a browser and do not require writing T-SQL code, including:– Executing a T-SQL Query in the URL window of a browser to

retrieve the results as XML– Executing a query against a database object using the

Xpath query syntax– Defining an XML template with a T-SQL statement that can

be executed to return a set of data in the form of XML– Defining an XML View that uses XML Data Reduced (XDR)

syntax instead of T-SQL syntax

9 Configuring Internet Information Services for XML Support

Configuring Internet Information Services for XML Support

Figure 9-19: The MMC snap-in to configure SQL SML support in IIS

9 Virtual DirectoriesVirtual Directories

• sqlisapi.dll– Implementation of the Internet Services API (ISAPI) filter that

provides SQL-specific functionality to Internet Information Services (IIS)

• To allow IIS to differentiate between request files that exist in conventional virtual directories and XML requests, it is necessary to create SQL virtual directories

9 Virtual DirectoriesVirtual Directories

Figure 9-20: The settings tab of the Demos Properties dialog box

9 Virtual NamesVirtual Names

• Virtual names– Sub-namespace

in IIS whose HTTP requests are forwarded to SQL Server 2000 as a specific type of request

Figure 9-21: The Virtual Names tab of the Demos Properties dialog box

9 Virtual NamesVirtual Names

• Dbobject– Allows access to objects in underlying database of the

Allow XPath option is selected for the virtual directory

• Schema– Allows access to data in the underlying database

without the use of a T-SQL statement

• Template– Allows access to an XML file that acts as a template if

the “Allow template queries” option is selected for the virtual directory

9 Chapter SummaryChapter Summary

• There are a variety of data formats and methods for moving data into and out of SQL Server 2000

• For one-time importing and exporting of data, the bulk insert task and BCP utility can be executed using T-SQL statements and Windows command-line statements

• To apply user-defined transformations to data imports and exports, DTS packages can be managed in Enterprise Manager

• If it is necessary to retrieve or to insert data in XML format, the FOR XML clause can be used in T-SQL statements