idug 2015 na data movement utilities final

DB2 Data Movement Utilities: A Comparison

Speaker: Jeyabarathi(JB) ChakrapaniNASCOSession Code: D09Wed, May 06, 2015 (08:00 AM - 09:00 AM) : Hancock | Platform: DB2 LUW II

Agenda

Learn the various tools that are available with DB2 for achieving efficient data movement within the database environment.

Offer a brief introduction into each utility including the DB2 Admin Online Table Move procedure.

Learn the various enhancements offered in each DB2 version for each of these utilities

Understand how to use the different utilities with examples. Learn what it takes to maximize the performance of your choice

of data movement utility along with useful tricks and tips.

3

Introduction to DB2 data movement utilities

Load UtilityExport UtilityImport UtilityIngest UtilityDB2move tool

Restore utilityADMIN_COPY_SCHEMAADMIN_MOVE_TABLESplit MirrorIBM replication tools

What are the available tools and options for data movement?

4

LOAD UTILITY

5

Load utilityRequired input for Load:

The path and the name of the input file, named pipe, or device.

The name or alias of the target table. The format of the input source. This format can be DEL,

ASC, PC/IXF, or CURSOR. Whether the input data is to be appended to the table, or

is to replace the existing data in the table. A message file name, if the utility is invoked through the

application programming interface (API), db2Load.

6

Load

Load phases:

• Load• Build• Delete• Index Copy

Load modes:

• Insert• Replace• Restart• Terminate

7

Load options Include:

• If the load utility is invoked from a remotely connected client, the data file must be on the client. XML and LOB data are always read from the server, even you specify the CLIENT option.

• The method to use for loading the data: column location, column name, or relative column position.

• How often the utility is to establish consistency points.• The names of the table columns into which the data is to be

inserted.• Whether or not preexisting data in the table can be queried

while the load operation is in progress.• Whether the load operation should wait for other utilities or

applications to finish using the table or force the other applications off before proceeding.

Client OptionsMethodConsistency PointsAccess levelPaths TableSpaceStatisticsRecoveryCOPY NO/YES

8

Load options Include:

• An alternate system temporary table space in which to build the index.

• The paths and the names of the input files in which LOBs are stored.

• A message file name. • Whether the utility should modify the amount of free space

available after a table is loaded.• Whether statistics are to be gathered during the load process.

This option is only supported if the load operation is running in REPLACE mode.

• Whether to keep a copy of the changes made. This is done to enable rollforward recovery of the database.

• The fully qualified path to be used when creating temporary files during a load operation. The name is specified by the TEMPFILES PATH parameter of the LOAD command.

Client OptionsMethodConsistency PointsAccess levelPaths TableSpaceStatisticsRecoveryCOPY NO/YES

9

Load Restrictions:• Loading data into nicknames is not supported.• Loading data into typed tables, or tables with

structured type columns, is not supported.• Loading data into declared temporary tables and

created temporary tables is not supported.• XML data can only be read from the server side;

if you want to have the XML files read from the client, use the import utility.

• You cannot create or drop tables in a table space that is in Backup Pending state.

• If an error occurs during a LOAD REPLACE operation, the original data in the table is lost. Retain a copy of the input data to allow the load operation to be restarted.

• Triggers are not activated on newly loaded rows. Business rules associated with triggers are not enforced by the load utility.

• Loading encrypted data is not supported.

Nick namesStructured Data TypesTemporary TablesXML supportBackup PendingLoad ReplaceTriggersData encryptionsPartitioned Tables

10

Load from Cursor… Examples:DECLARE mycurs CURSOR FOR SELECT * FROM abc.table1;LOAD FROM mycurs OF cursor INSERT INTO abc.table2;

DECLARE C1 CURSOR FOR SELECT * FROM customersWHERE XMLEXISTS(’$DOC/customer[income_level=1]’);LOAD FROM C1 OF CURSOR INSERT INTO lvl1_customers;

The ANYORDER file type modifier is supported for loading XML data into an XML column.

• Loads the results of a query directly into the target table, no intermediate export necessary.• XML data can be loaded with the cursor option.•Nicknames can be referenced in the SQL query of the cursor using the DATABASE option.•Load from remote database using the DATABASE option.

11

Examples: Loading from a federated database:

Federation should be enabled and data source cataloged.

CREATE NICKNAME myschema1.table1 FOR source.abc.table1;DECLARE mycurs CURSOR FOR SELECT c1,c2,c3 FROM myschema1.table1LOAD FROM mycurs OF cursor INSERT INTO abc.table2

Loading from a remote database:

The remote database should be cataloged.

DECLARE mycurs CURSOR DATABASE dbsource USER dsciaraf USING mypasswdFOR SELECT * FROM abc.table1;LOAD FROM mycurs OF cursor INSERT INTO abc.table2;

12

Checking for Integrity violations….

Load puts the tables in check pending status when:

The table has check constraints or RI constraints.

The table has identity columns and a V7 or earlier client was used to load data.

The table has descendent immediate staging tables or MQT tables referencing it.

The table is a staging table or MQT table.

13

Load Performance….

CPU_PARALLELISM - specifies the number of threads used by the load utility to parse, convert, and format data records

DISK_PARALLELISM - specifies the number of processes or threads used by the load utility to write data records to disk

DATA_BUFFER - total amount of memory, in 4 KB units, allocated to the load utility as a buffer

NONRECOVERABLE – Does not put the table in backup pending.SAVE COUNT – Specifies consistency points.STATISTICS USE PROFILE – Collection of statistics after loadFASTPARSE – Used when data is known to be valid.NOROWWARNINGS - use this when multiple warnings are expected.PAGEFREESPACE,INDEXFREESPACE,TOTALFREESPACE – Specify to reduce the need for

reorg

14

EXPORT

15

EXPORT UTILITY

• Required input:

Pathname for the output file. Format of the output file -

IXF or DEL. Specification of data to be

extracted using SELECT statement

• Additional options:

subset of columns to be extracted using the METHOD option.

XML TO, XMLFILE, XML SAVESCHEMA - to export and store XML data in different ways.

The SELECT statement used for extracting data can be optimized the same way any SQL query can be optimized to improve export performance.

‘Messages’ option allows messages generated by the export utility to be written to a file.

Data extraction using SQL query or Xquery statements

16

EXPORT UTILITY

Examples…

• Export to table1.ixf of ixf messages msg.txt select * from myschema.table1.

This is a simple export command that exports all rows to the ixf file. • EXPORT TO table1export.del OF DEL XML TO /db/xmlpath XMLFILE xmldocs XMLSAVESCHEMA SELECT * FROM

myschema.table1

• export to table1.del of del lobs to /db/lob1, /db/lob2/ modified by lobsinfile select * from myschema.table1

17

IMPORT

18

IMPORT

• Required input for Import: The path and the name of the

input file The name or alias of the target

table or view The format of the data in the

input file The method by which the data is

to be imported The traverse order, when

importing hierarchical data The subtable list, when importing

typed tables

• Additional options: MODIFIED BY clause ALLOW WRITE ACCESS – Import

acquires non exclusive lock ALLOW NO ACCESS – Import

acquires exclusive lock, waits for other work to complete until it can acquire the lock.

COMMITCOUNT – Commits after the specified number of rows.

MESSAGES

Data append/update using SQL query or XQuery statements

19

Import

• Import Support Import supports IXF, ASC, and DEL

data formats. Used with file type modifiers to

customize the import operation. Used to move hierarchical data and

typed tables. Import logs all activity, updates

indexes, verifies constraints, and fires triggers.

Allows you to specify the names of the columns within the table or view into which the data is to be inserted.

• Import Modes INSERT – Adds data to the existing

table without changing existing data. INSERT_UPDATE – Updates data with

matching primary key values, otherwise inserts.

REPLACE – Deletes existing data and inserts new data.

CREATE - Creates the target table and its index definitions.

REPLACE_CREATE – Deletes existing data and inserts new data. If the target table does not exist, it is created

20

IMPORTRestrictions• If the table has primary key that is

referenced by a foreign key, it can be only appended to.

• You cannot perform an import replace operation into an underlying table of a materialized query table defined in refresh immediate mode.

• You cannot import data into a system table, a summary table, or a table with a structured type column.

• You cannot import data into declared temporary tables.

• Views cannot be created through the import utility.

• Cannot import encrypted data.

• Referential constraints and foreign key definitions are not preserved when creating tables from PC/IXF files. (Primary key definitions are preserved if the data was previously exported by using SELECT *.)

• Because the import utility generates its own SQL statements, the maximum statement size of 2 MB might, in some cases, be exceeded.

• You cannot re-create a partitioned table or a multidimensional clustered table (MDC) by using the CREATE or REPLACE_CREATE import parameters.

• Cannot re-create tables containing XML • Does not honor Not Logged Initially

clause.

21

IMPORT Restrictions …

Remote import is not allowed if

• The application and database code pages are different.

• The file being imported is a multiple-part PC/IXF file.

• The method used for importing the data is either column name or relative column position.

• The target column list provided is longer than 4 KB.

• The LOBS FROM clause or the lobsinfile modifier is specified.

• The NULL INDICATORS clause is specified for ASC files.

22

IMPORT Performance … If the workload is mostly insert, consider altering the table to

‘append on’.

To avoid transaction log full condition, consider an appropriate ‘commit count’ value.

Enable DB2_PARALLEL_IO registry variable.

Review logbuffer db cfg value and increase it as necessary.

Review utility heap db cfg value and increase as needed.

Review num_ioservers, num_iocleaners parameters.

23

INGEST

24

INGEST

• INGEST characteristics• Fast – Multithreaded design to process in parallel.• Available – Uses row level locking and so tables remain

concurrent.• Continuous – Can continuously ingest data streams from

pipes or files.• Robust – Handles unexpected failures. Can be restarted

from last commit point.• Flexible and Functional – Supports different input formats

and target tables types, has rich data manipulation capabilities.

25

INGEST

Supported Table Types• multidimensional clustering (MDC)

and insert time clustering (ITC) tables

• range-partitioned tables• range-clustered tables (RCT)• materialized query tables (MQTs)

that are defined as MAINTAINED BY USER, including summary tables

• temporal tables• updatable views (except typed

views)

Supported data formats• Delimited text• Positional text and binary• Columns in various orders and

formats

Ingest

Transporter Formatter

Formatter

Formatter

Transporter

SQL

DB Partition 1

Multiple files

or

Multiple pipes Formatter

Array InsertFlusher[Array Insert]



Hash by database partition

DB Partition 2

DB Partition n

Main components: Transporter Formatter Flusher

27

INGEST

• Transporter: Reads from data source and

writes to formatter queues. For INSERT and MERGE operations, there is one transporter thread for each input source. For UPDATE and DELETE operations, there is only one transporter thread.

• Formatter: Parse each record, convert the

data into the format that DB2 requires, & writes each formatted record to one of the flusher queues for that record's partition.

The num_formatters configuration parameter is used to specify the number of formatter threads. The default is (number of logical CPUs)/2.

28

INGEST

• Flusher:

The flushers issue the SQL statements to perform the operations on the DB2 tables. The number of flushers for each partition is specified by the num_flushers_per_partition configuration parameter. The default is max(1, ((number of logical CPUs)/2)/(number of partitions) ).

29

INGEST Examples

INGEST FROM FILE my_file.del FORMAT DELIMITED INSERT INTO my_table;

Input records are sent over a named pipe

INGEST FROM PIPE my_pipe FORMAT DELIMITED INSERT INTO my_table;

Input records delimited by CRLF; fields are delimited by vertical bar

INGEST FROM FILE my_file.del FORMAT DELIMITED '|' INSERT INTO my_table;

30

INGEST Examples

INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $key1 INTEGER EXTERNAL, $data1 CHAR(8), $data2 CHAR(32), $data3 DECIMAL(5,2) EXTERNAL ) MERGE INTO target_table ON (key1 = $key1) WHEN MATCHED THEN UPDATE SET (data1, data2, data3) = ($data1, $data2, $data3) WHEN NOT MATCHED THEN INSERT VALUES($key1, $data1, $data2, $data3);

31

INGEST – Examples…

Ingest configuration: connect to mydb user <username> using <password>;

INGEST SET num_flushers_per_partition 1; INGEST SET NUM_FORMATTERS 12; INGEST SET shm_max_size 12 GB; INGEST SET commit_count 20000; ingest from file/mydir/file1FORMAT DELIMITED by ',' RESTART OFF insert into myschema.tab1;

32

INGEST – Restart ..

Restart information is stored in a separate table (SYSTOOLS.INGESTRESTART) and it is created once.

To create the restart table on DB2 10.1

CALL SYSPROC.SYSINSTALLOBJECTS('INGEST', 'C', NULL, NULL).

The table contains some counters to keep track of which records have been ingested.

33

INGEST - Restart

RESTART CONTINUE to restart a previously failed job (and clean up the restart data)

RESTART TERMINATE to clean up the restart data from a failed job you don't plan to restart

RESTART OFF to suppress saving of restart information (in which case the ingest job is not restartable)

34

INGEST – Additional features

Commit by time or number of rows - Commit_count or commit_period configuration parameter

Support for copying rejected records to a file or table - DUMPFILE or EXCEPTION TABLE parameter

Support for restart and recovery - retry_count ingest configuration parameter.

35

INGEST - Monitoring

INGEST LIST and INGEST GET STATS commands

Reads information that the utility maintains in shared memory.

Must be run in a separate window on the same machine as the INGEST command.

Can display detailed information

36

INGEST and LOAD

• INGEST• When the table needs to

remain concurrent during load. • You need only some fields from

the input file to be loaded.• You need to specify an SQL

statement other than INSERT• You need to be able to use an

SQL expression (to construct a column value from field values)

• You need to recover and continue on when the utility gets a recoverable error

• LOAD• Don’t need the table to remain

concurrent.• XML & LOB data to be loaded.• Load from cursor or load from a

device• Input source file is in IXF format• Load a GENERATED ALWAYS column

or SYSTEM_TIME column from the input file

• Use SYSPROC.ADMIN_CMD• Invoke the utility through an API• Don't want the INSERTs to be logged

37

INGEST - Performance

• Field type and column type• Define fields to be the same type as their corresponding column types.

• Materialized query tables (MQTs)• If you are using Ingest utility against a base table of an MQT defined as

refresh immediate, performance can degrade significantly due to the time required to update the MQT.

• Row size• Increase the setting of the commit_count for tables with smaller row size

and reduce it for tables with larger row size • Other workloads

• If multiple workloads are running along with the ingest, consider increasing the locklist database configuration parameter and reduce the commit_count ingest configuration parameter.

•

38

Comparison between Import, Load and Ingest

Table type IMPORT LOAD INGEST Created global temporary table no no no Declared global temporary table no no no Detached table that has dependent table where SET INTEGRITY not run (detached table has SYSCAT.TABLES.TYPE = 'L')

no (SQL20285N, reason code 1)

no (SQL20285N, reason code 1)

no

Multidimensional clustering (MDC) table yes yes yes Materialized query table (MQT) that is maintained by user yes yes yes Nickname relational

except ODBC no (SQL02305N)

yes

Range-clustered table (RCT) yes no yes Range-partitioned table yes yes yes Summary table no yes yes Typed table yes no (SQL3211N) no Typed view yes no

(SQL2305N)no

Untyped (regular) table yes yes yes Updatable view yes no

(SQL02305N) yes

39

Comparison to IMPORT and LOAD – Column types

Column data type IMPORT LOAD INGEST Numeric: SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE, DECFLOAT

yes yes yes

Character: CHAR, VARCHAR, NCHAR, NVARCHAR, plus corresponding FOR BIT DATA types

yes yes yes

Graphic: GRAPHIC, VARGRAPHIC yes yes yes

Long types: LONG VARCHAR, LONG VARGRAPHIC yes yes yes

Date/time: DATE, TIME, TIMESTAMP(p) yes yes yes

DB2SECURITYLABEL yes yes yes

LOBs from files: BLOB, CLOB, DBCLOB, NCLOB yes yes no

Inline LOBs yes yes no

XML from files yes yes no

Inline XML no no no

Distinct Type (note 1) yes yes yes

Structured Type no no no

Reference Type yes yes yes

40

Comparison to IMPORT and LOADInput Types and Formats

Input type Import Load INGEST Cursor no yes no Device no yes no File yes yes yes Pipe no yes yes Multiple input files, multiple pipes, etc

no yes yes

Input format IMPORT LOAD INGEST ASC (including binary) yes, except binary yes yes DEL yes yes yes IXF yes yes no WSF (worksheet format) yes, but will be

discontinued in DB2 10.1

no no

41

Comparison to IMPORT and LOAD – Other features

Feature IMPORT LOAD INGEST

Can other apps update the table while utility is loading the table?

yes no yes

Can use SQL expressions? no no yesSupport for REPLACE yes yes yesSupport for UPDATE, MERGE,and DELETE

Update only no yes

Can update GENERATED ALWAYS and SYSTEM_TIME columns?

no yes no

Performance for large number of input records

slow best Comparable to load into staging table followed by multiple concurrent inserts from staging table to target table

API yes yes no (planned for a fix pack)SYSPROC.ADMIN_CMD support no yes noInserts and updates are logged? yes no yes (cannot be turned off,

and no support for NOT LOGGED INITIALLY)

Error recovery no no yesRestart no yes Yes

42

ADMIN_MOVE_TABLE Procedure

Can be done online or offline.

A shadow copy of the source table is taken.

Source table changes are captured and applied thru triggers.

Source table is taken offline briefly to rename the shadow

copy and its indexes to the source table name.

43


Call the stored procedure once specifying atleast the schema name and the table name.

CALL SYSPROC.ADMIN_MOVE_TABLE (’schema name’,’source table’, ’’,’’,’’,’’,’’,’’,’’,’’,’MOVE’)

Or call the procedure multiple times for each operation of the move.

CALL SYSPROC.ADMIN_MOVE_TABLE (’schema name’,’source table’, ’’,’’,’’,’’,’’,’’,’’,’’,’operation name’)

44


Moving range partitioned tablesCREATE TABLE “SCHEMA1 "."T1" ("I1" INTEGER ,"I2" INTEGER )DISTRIBUTE BY HASH("I1") PARTITION BY RANGE("I1")(PART "PART0" STARTING(0) ENDING(100) IN "TS1",PART "PART1" STARTING(101) ENDING(MAXVALUE) IN "TS2");

Move the T1 table from schema SCHEMA1 to the TS3 table space, leaving the first partition in TS1.

DB2 "CALL SYSPROC.ADMIN_MOVE_TABLE(’SCHEMA1’,’T1’,’TS3’,’TS3’,’TS3’,’’,’’,’(I1) (STARTING 0 ENDING 100 IN TS1 INDEX IN TS1 LONG IN TS1,STARTING 101 ENDING MAXVALUE IN TS3 INDEX IN TS3 LONG IN TS3)’, ’’,’’,’MOVE’)"

45

IBM Replication tools

Q replicationQ capture and Q apply componentsQ capture reads DB2 recovery logs and translates committed data into Websphere MQ messages.Q apply reads the messages from the queue translates them into SQL statements that can be applied to the target server.SQL replication

Capture and apply componentsCapture reads DB2 log data and writes to change data tables. Apply reads the change data tables and replicates the changes to the target tables.

46

DB2move utility and ADMIN_COPY_SCHEMA

ADMIN_COPY_SCHEMA procedure to copy a single schema within the

same database.Options:

- DDL, COPY, COPYNO.

db2move utility with the -co COPY action to copy a single schema or

multiple schemas from a source database to a target database. Eg:db2move <dbname> COPY -sn schema1 -co target_db target schema_map " ((schema1,schema2))" tablespace_map "((TS1, TS2),(TS3, TS4), SYS_ANY)“ -u userid -p password

47

DB2 Redirected Restore utilityPerform redirected restores to build partial or full database images.

db2 restore db test from <directory/tsm> taken at <timestamp>redirect generate script redirect.sql

Transport a set of table spaces, storage groups and SQL schemas from database backup image to a database using the TRANSPORT option (in DB2 Version 9.7 Fix Pack 2 and later fix packs).db2 restore db <sourcedb> tablespace (mydata1) schema(schema1,schema2) from <Media_Target_clause> taken at <date-time> transport into <targetdb> redirectdb2 list tablespacesdb2 set tablespace containers for <tablespace ID for mydata1> using (path '/db2DB/data1')

48

Suspended I/O and online split mirrorFor large databases, make copies from a mirrored image by using suspended I/O and the split mirror function. This approach also:

Eliminates backup operation overhead from the production machine

Represents a fast way to clone systems.

Represents a fast implementation of idle standby failover.

Disk mirroring is the process of writing data to two separate hard disks at the same time. One copy of the data is called a mirror of the other. Splitting a mirror is the process of separating the two copies.

49

Summary

Load - This utility is best suited to situations where performance is your primary concern.Ingest - This utility strikes a good balance between performance andavailability, but if the latter is more important to you, then you should choose the ingest utility instead of the load utility.Import - The import utility can be a good alternative to the load utilityin the following situations:- where the target table is a view.- the target table has constraints and you don't want the target table to be put in the Set Integrity Pending state.- the target table has triggers and you want them fired.

50

References

IBM Red book on DB2 Data Movement.IBM Knowledge center for DB2 V9.7 and V10.1.IBM Developer Works Technical Library.IDUG technical archives.

JEYABARATHI [email protected]

Session D09Title: DB2 Data Movement Utilities : A comparison

Please fill out your session evaluation before leaving!

idug 2015 na data movement utilities final

Documents