idug 2015 na data movement utilities final
TRANSCRIPT
DB2 Data Movement Utilities: A Comparison
Speaker: Jeyabarathi(JB) ChakrapaniNASCOSession Code: D09Wed, May 06, 2015 (08:00 AM - 09:00 AM) : Hancock | Platform: DB2 LUW II
Agenda
Learn the various tools that are available with DB2 for achieving efficient data movement within the database environment.
Offer a brief introduction into each utility including the DB2 Admin Online Table Move procedure.
Learn the various enhancements offered in each DB2 version for each of these utilities
Understand how to use the different utilities with examples. Learn what it takes to maximize the performance of your choice
of data movement utility along with useful tricks and tips.
3
Introduction to DB2 data movement utilities
Load UtilityExport UtilityImport UtilityIngest UtilityDB2move tool
Restore utilityADMIN_COPY_SCHEMAADMIN_MOVE_TABLESplit MirrorIBM replication tools
What are the available tools and options for data movement?
4
LOAD UTILITY
5
Load utilityRequired input for Load:
The path and the name of the input file, named pipe, or device.
The name or alias of the target table. The format of the input source. This format can be DEL,
ASC, PC/IXF, or CURSOR. Whether the input data is to be appended to the table, or
is to replace the existing data in the table. A message file name, if the utility is invoked through the
application programming interface (API), db2Load.
6
Load
Load phases:
• Load• Build• Delete• Index Copy
Load modes:
• Insert• Replace• Restart• Terminate
7
Load options Include:
• If the load utility is invoked from a remotely connected client, the data file must be on the client. XML and LOB data are always read from the server, even you specify the CLIENT option.
• The method to use for loading the data: column location, column name, or relative column position.
• How often the utility is to establish consistency points.• The names of the table columns into which the data is to be
inserted.• Whether or not preexisting data in the table can be queried
while the load operation is in progress.• Whether the load operation should wait for other utilities or
applications to finish using the table or force the other applications off before proceeding.
Client OptionsMethodConsistency PointsAccess levelPaths TableSpaceStatisticsRecoveryCOPY NO/YES
8
Load options Include:
• An alternate system temporary table space in which to build the index.
• The paths and the names of the input files in which LOBs are stored.
• A message file name. • Whether the utility should modify the amount of free space
available after a table is loaded.• Whether statistics are to be gathered during the load process.
This option is only supported if the load operation is running in REPLACE mode.
• Whether to keep a copy of the changes made. This is done to enable rollforward recovery of the database.
• The fully qualified path to be used when creating temporary files during a load operation. The name is specified by the TEMPFILES PATH parameter of the LOAD command.
Client OptionsMethodConsistency PointsAccess levelPaths TableSpaceStatisticsRecoveryCOPY NO/YES
9
Load Restrictions:• Loading data into nicknames is not supported.• Loading data into typed tables, or tables with
structured type columns, is not supported.• Loading data into declared temporary tables and
created temporary tables is not supported.• XML data can only be read from the server side;
if you want to have the XML files read from the client, use the import utility.
• You cannot create or drop tables in a table space that is in Backup Pending state.
• If an error occurs during a LOAD REPLACE operation, the original data in the table is lost. Retain a copy of the input data to allow the load operation to be restarted.
• Triggers are not activated on newly loaded rows. Business rules associated with triggers are not enforced by the load utility.
• Loading encrypted data is not supported.
Nick namesStructured Data TypesTemporary TablesXML supportBackup PendingLoad ReplaceTriggersData encryptionsPartitioned Tables
10
Load from Cursor… Examples:DECLARE mycurs CURSOR FOR SELECT * FROM abc.table1;LOAD FROM mycurs OF cursor INSERT INTO abc.table2;
DECLARE C1 CURSOR FOR SELECT * FROM customersWHERE XMLEXISTS(’$DOC/customer[income_level=1]’);LOAD FROM C1 OF CURSOR INSERT INTO lvl1_customers;
The ANYORDER file type modifier is supported for loading XML data into an XML column.
• Loads the results of a query directly into the target table, no intermediate export necessary.• XML data can be loaded with the cursor option.•Nicknames can be referenced in the SQL query of the cursor using the DATABASE option.•Load from remote database using the DATABASE option.
11
Examples: Loading from a federated database:
Federation should be enabled and data source cataloged.
CREATE NICKNAME myschema1.table1 FOR source.abc.table1;DECLARE mycurs CURSOR FOR SELECT c1,c2,c3 FROM myschema1.table1LOAD FROM mycurs OF cursor INSERT INTO abc.table2
Loading from a remote database:
The remote database should be cataloged.
DECLARE mycurs CURSOR DATABASE dbsource USER dsciaraf USING mypasswdFOR SELECT * FROM abc.table1;LOAD FROM mycurs OF cursor INSERT INTO abc.table2;
12
Checking for Integrity violations….
Load puts the tables in check pending status when:
The table has check constraints or RI constraints.
The table has identity columns and a V7 or earlier client was used to load data.
The table has descendent immediate staging tables or MQT tables referencing it.
The table is a staging table or MQT table.
13
Load Performance….
CPU_PARALLELISM - specifies the number of threads used by the load utility to parse, convert, and format data records
DISK_PARALLELISM - specifies the number of processes or threads used by the load utility to write data records to disk
DATA_BUFFER - total amount of memory, in 4 KB units, allocated to the load utility as a buffer
NONRECOVERABLE – Does not put the table in backup pending.SAVE COUNT – Specifies consistency points.STATISTICS USE PROFILE – Collection of statistics after loadFASTPARSE – Used when data is known to be valid.NOROWWARNINGS - use this when multiple warnings are expected.PAGEFREESPACE,INDEXFREESPACE,TOTALFREESPACE – Specify to reduce the need for
reorg
14
EXPORT
15
EXPORT UTILITY
• Required input:
Pathname for the output file. Format of the output file -
IXF or DEL. Specification of data to be
extracted using SELECT statement
• Additional options:
subset of columns to be extracted using the METHOD option.
XML TO, XMLFILE, XML SAVESCHEMA - to export and store XML data in different ways.
The SELECT statement used for extracting data can be optimized the same way any SQL query can be optimized to improve export performance.
‘Messages’ option allows messages generated by the export utility to be written to a file.
Data extraction using SQL query or Xquery statements
16
EXPORT UTILITY
Examples…
• Export to table1.ixf of ixf messages msg.txt select * from myschema.table1.
This is a simple export command that exports all rows to the ixf file. • EXPORT TO table1export.del OF DEL XML TO /db/xmlpath XMLFILE xmldocs XMLSAVESCHEMA SELECT * FROM
myschema.table1
• export to table1.del of del lobs to /db/lob1, /db/lob2/ modified by lobsinfile select * from myschema.table1
17
IMPORT
18
IMPORT
• Required input for Import: The path and the name of the
input file The name or alias of the target
table or view The format of the data in the
input file The method by which the data is
to be imported The traverse order, when
importing hierarchical data The subtable list, when importing
typed tables
• Additional options: MODIFIED BY clause ALLOW WRITE ACCESS – Import
acquires non exclusive lock ALLOW NO ACCESS – Import
acquires exclusive lock, waits for other work to complete until it can acquire the lock.
COMMITCOUNT – Commits after the specified number of rows.
MESSAGES
Data append/update using SQL query or XQuery statements
19
Import
• Import Support Import supports IXF, ASC, and DEL
data formats. Used with file type modifiers to
customize the import operation. Used to move hierarchical data and
typed tables. Import logs all activity, updates
indexes, verifies constraints, and fires triggers.
Allows you to specify the names of the columns within the table or view into which the data is to be inserted.
• Import Modes INSERT – Adds data to the existing
table without changing existing data. INSERT_UPDATE – Updates data with
matching primary key values, otherwise inserts.
REPLACE – Deletes existing data and inserts new data.
CREATE - Creates the target table and its index definitions.
REPLACE_CREATE – Deletes existing data and inserts new data. If the target table does not exist, it is created
20
IMPORTRestrictions• If the table has primary key that is
referenced by a foreign key, it can be only appended to.
• You cannot perform an import replace operation into an underlying table of a materialized query table defined in refresh immediate mode.
• You cannot import data into a system table, a summary table, or a table with a structured type column.
• You cannot import data into declared temporary tables.
• Views cannot be created through the import utility.
• Cannot import encrypted data.
• Referential constraints and foreign key definitions are not preserved when creating tables from PC/IXF files. (Primary key definitions are preserved if the data was previously exported by using SELECT *.)
• Because the import utility generates its own SQL statements, the maximum statement size of 2 MB might, in some cases, be exceeded.
• You cannot re-create a partitioned table or a multidimensional clustered table (MDC) by using the CREATE or REPLACE_CREATE import parameters.
• Cannot re-create tables containing XML • Does not honor Not Logged Initially
clause.
21
IMPORT Restrictions …
Remote import is not allowed if
• The application and database code pages are different.
• The file being imported is a multiple-part PC/IXF file.
• The method used for importing the data is either column name or relative column position.
• The target column list provided is longer than 4 KB.
• The LOBS FROM clause or the lobsinfile modifier is specified.
• The NULL INDICATORS clause is specified for ASC files.
22
IMPORT Performance … If the workload is mostly insert, consider altering the table to
‘append on’.
To avoid transaction log full condition, consider an appropriate ‘commit count’ value.
Enable DB2_PARALLEL_IO registry variable.
Review logbuffer db cfg value and increase it as necessary.
Review utility heap db cfg value and increase as needed.
Review num_ioservers, num_iocleaners parameters.
23
INGEST
24
INGEST
• INGEST characteristics• Fast – Multithreaded design to process in parallel.• Available – Uses row level locking and so tables remain
concurrent.• Continuous – Can continuously ingest data streams from
pipes or files.• Robust – Handles unexpected failures. Can be restarted
from last commit point.• Flexible and Functional – Supports different input formats
and target tables types, has rich data manipulation capabilities.
25
INGEST
Supported Table Types• multidimensional clustering (MDC)
and insert time clustering (ITC) tables
• range-partitioned tables• range-clustered tables (RCT)• materialized query tables (MQTs)
that are defined as MAINTAINED BY USER, including summary tables
• temporal tables• updatable views (except typed
views)
Supported data formats• Delimited text• Positional text and binary• Columns in various orders and
formats
Ingest
Transporter Formatter
Formatter
Formatter
Transporter
SQL
DB Partition 1
Multiple files
or
Multiple pipes Formatter
Array InsertFlusher[Array Insert]
Array InsertFlusher[Array Insert]
Array InsertFlusher[Array Insert]
Hash by database partition
DB Partition 2
DB Partition n
Main components: Transporter Formatter Flusher
27
INGEST
• Transporter: Reads from data source and
writes to formatter queues. For INSERT and MERGE operations, there is one transporter thread for each input source. For UPDATE and DELETE operations, there is only one transporter thread.
• Formatter: Parse each record, convert the
data into the format that DB2 requires, & writes each formatted record to one of the flusher queues for that record's partition.
The num_formatters configuration parameter is used to specify the number of formatter threads. The default is (number of logical CPUs)/2.
28
INGEST
• Flusher:
The flushers issue the SQL statements to perform the operations on the DB2 tables. The number of flushers for each partition is specified by the num_flushers_per_partition configuration parameter. The default is max(1, ((number of logical CPUs)/2)/(number of partitions) ).
29
INGEST Examples
INGEST FROM FILE my_file.del FORMAT DELIMITED INSERT INTO my_table;
Input records are sent over a named pipe
INGEST FROM PIPE my_pipe FORMAT DELIMITED INSERT INTO my_table;
Input records delimited by CRLF; fields are delimited by vertical bar
INGEST FROM FILE my_file.del FORMAT DELIMITED '|' INSERT INTO my_table;
30
INGEST Examples
INGEST FROM FILE input_file.txt FORMAT DELIMITED ( $key1 INTEGER EXTERNAL, $data1 CHAR(8), $data2 CHAR(32), $data3 DECIMAL(5,2) EXTERNAL ) MERGE INTO target_table ON (key1 = $key1) WHEN MATCHED THEN UPDATE SET (data1, data2, data3) = ($data1, $data2, $data3) WHEN NOT MATCHED THEN INSERT VALUES($key1, $data1, $data2, $data3);
31
INGEST – Examples…
Ingest configuration: connect to mydb user <username> using <password>;
INGEST SET num_flushers_per_partition 1; INGEST SET NUM_FORMATTERS 12; INGEST SET shm_max_size 12 GB; INGEST SET commit_count 20000; ingest from file/mydir/file1FORMAT DELIMITED by ',' RESTART OFF insert into myschema.tab1;
32
INGEST – Restart ..
Restart information is stored in a separate table (SYSTOOLS.INGESTRESTART) and it is created once.
To create the restart table on DB2 10.1
CALL SYSPROC.SYSINSTALLOBJECTS('INGEST', 'C', NULL, NULL).
The table contains some counters to keep track of which records have been ingested.
33
INGEST - Restart
RESTART CONTINUE to restart a previously failed job (and clean up the restart data)
RESTART TERMINATE to clean up the restart data from a failed job you don't plan to restart
RESTART OFF to suppress saving of restart information (in which case the ingest job is not restartable)
34
INGEST – Additional features
Commit by time or number of rows - Commit_count or commit_period configuration parameter
Support for copying rejected records to a file or table - DUMPFILE or EXCEPTION TABLE parameter
Support for restart and recovery - retry_count ingest configuration parameter.
35
INGEST - Monitoring
INGEST LIST and INGEST GET STATS commands
Reads information that the utility maintains in shared memory.
Must be run in a separate window on the same machine as the INGEST command.
Can display detailed information
36
INGEST and LOAD
• INGEST• When the table needs to
remain concurrent during load. • You need only some fields from
the input file to be loaded.• You need to specify an SQL
statement other than INSERT• You need to be able to use an
SQL expression (to construct a column value from field values)
• You need to recover and continue on when the utility gets a recoverable error
• LOAD• Don’t need the table to remain
concurrent.• XML & LOB data to be loaded.• Load from cursor or load from a
device• Input source file is in IXF format• Load a GENERATED ALWAYS column
or SYSTEM_TIME column from the input file
• Use SYSPROC.ADMIN_CMD• Invoke the utility through an API• Don't want the INSERTs to be logged
37
INGEST - Performance
• Field type and column type• Define fields to be the same type as their corresponding column types.
• Materialized query tables (MQTs)• If you are using Ingest utility against a base table of an MQT defined as
refresh immediate, performance can degrade significantly due to the time required to update the MQT.
• Row size• Increase the setting of the commit_count for tables with smaller row size
and reduce it for tables with larger row size • Other workloads
• If multiple workloads are running along with the ingest, consider increasing the locklist database configuration parameter and reduce the commit_count ingest configuration parameter.
•
38
Comparison between Import, Load and Ingest
Table type IMPORT LOAD INGEST Created global temporary table no no no Declared global temporary table no no no Detached table that has dependent table where SET INTEGRITY not run (detached table has SYSCAT.TABLES.TYPE = 'L')
no (SQL20285N, reason code 1)
no (SQL20285N, reason code 1)
no
Multidimensional clustering (MDC) table yes yes yes Materialized query table (MQT) that is maintained by user yes yes yes Nickname relational
except ODBC no (SQL02305N)
yes
Range-clustered table (RCT) yes no yes Range-partitioned table yes yes yes Summary table no yes yes Typed table yes no (SQL3211N) no Typed view yes no
(SQL2305N)no
Untyped (regular) table yes yes yes Updatable view yes no
(SQL02305N) yes
39
Comparison to IMPORT and LOAD – Column types
Column data type IMPORT LOAD INGEST Numeric: SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE, DECFLOAT
yes yes yes
Character: CHAR, VARCHAR, NCHAR, NVARCHAR, plus corresponding FOR BIT DATA types
yes yes yes
Graphic: GRAPHIC, VARGRAPHIC yes yes yes
Long types: LONG VARCHAR, LONG VARGRAPHIC yes yes yes
Date/time: DATE, TIME, TIMESTAMP(p) yes yes yes
DB2SECURITYLABEL yes yes yes
LOBs from files: BLOB, CLOB, DBCLOB, NCLOB yes yes no
Inline LOBs yes yes no
XML from files yes yes no
Inline XML no no no
Distinct Type (note 1) yes yes yes
Structured Type no no no
Reference Type yes yes yes
40
Comparison to IMPORT and LOADInput Types and Formats
Input type Import Load INGEST Cursor no yes no Device no yes no File yes yes yes Pipe no yes yes Multiple input files, multiple pipes, etc
no yes yes
Input format IMPORT LOAD INGEST ASC (including binary) yes, except binary yes yes DEL yes yes yes IXF yes yes no WSF (worksheet format) yes, but will be
discontinued in DB2 10.1
no no
41
Comparison to IMPORT and LOAD – Other features
Feature IMPORT LOAD INGEST
Can other apps update the table while utility is loading the table?
yes no yes
Can use SQL expressions? no no yesSupport for REPLACE yes yes yesSupport for UPDATE, MERGE,and DELETE
Update only no yes
Can update GENERATED ALWAYS and SYSTEM_TIME columns?
no yes no
Performance for large number of input records
slow best Comparable to load into staging table followed by multiple concurrent inserts from staging table to target table
API yes yes no (planned for a fix pack)SYSPROC.ADMIN_CMD support no yes noInserts and updates are logged? yes no yes (cannot be turned off,
and no support for NOT LOGGED INITIALLY)
Error recovery no no yesRestart no yes Yes
42
ADMIN_MOVE_TABLE Procedure
Can be done online or offline.
A shadow copy of the source table is taken.
Source table changes are captured and applied thru triggers.
Source table is taken offline briefly to rename the shadow
copy and its indexes to the source table name.
43
ADMIN_MOVE_TABLE Procedure
Call the stored procedure once specifying atleast the schema name and the table name.
CALL SYSPROC.ADMIN_MOVE_TABLE (’schema name’,’source table’, ’’,’’,’’,’’,’’,’’,’’,’’,’MOVE’)
Or call the procedure multiple times for each operation of the move.
CALL SYSPROC.ADMIN_MOVE_TABLE (’schema name’,’source table’, ’’,’’,’’,’’,’’,’’,’’,’’,’operation name’)
44
ADMIN_MOVE_TABLE Procedure
Moving range partitioned tablesCREATE TABLE “SCHEMA1 "."T1" ("I1" INTEGER ,"I2" INTEGER )DISTRIBUTE BY HASH("I1") PARTITION BY RANGE("I1")(PART "PART0" STARTING(0) ENDING(100) IN "TS1",PART "PART1" STARTING(101) ENDING(MAXVALUE) IN "TS2");
Move the T1 table from schema SCHEMA1 to the TS3 table space, leaving the first partition in TS1.
DB2 "CALL SYSPROC.ADMIN_MOVE_TABLE(’SCHEMA1’,’T1’,’TS3’,’TS3’,’TS3’,’’,’’,’(I1) (STARTING 0 ENDING 100 IN TS1 INDEX IN TS1 LONG IN TS1,STARTING 101 ENDING MAXVALUE IN TS3 INDEX IN TS3 LONG IN TS3)’, ’’,’’,’MOVE’)"
45
IBM Replication tools
Q replicationQ capture and Q apply componentsQ capture reads DB2 recovery logs and translates committed data into Websphere MQ messages.Q apply reads the messages from the queue translates them into SQL statements that can be applied to the target server.SQL replication
Capture and apply componentsCapture reads DB2 log data and writes to change data tables. Apply reads the change data tables and replicates the changes to the target tables.
46
DB2move utility and ADMIN_COPY_SCHEMA
ADMIN_COPY_SCHEMA procedure to copy a single schema within the
same database.Options:
- DDL, COPY, COPYNO.
db2move utility with the -co COPY action to copy a single schema or
multiple schemas from a source database to a target database. Eg:db2move <dbname> COPY -sn schema1 -co target_db target schema_map " ((schema1,schema2))" tablespace_map "((TS1, TS2),(TS3, TS4), SYS_ANY)“ -u userid -p password
47
DB2 Redirected Restore utilityPerform redirected restores to build partial or full database images.
db2 restore db test from <directory/tsm> taken at <timestamp>redirect generate script redirect.sql
Transport a set of table spaces, storage groups and SQL schemas from database backup image to a database using the TRANSPORT option (in DB2 Version 9.7 Fix Pack 2 and later fix packs).db2 restore db <sourcedb> tablespace (mydata1) schema(schema1,schema2) from <Media_Target_clause> taken at <date-time> transport into <targetdb> redirectdb2 list tablespacesdb2 set tablespace containers for <tablespace ID for mydata1> using (path '/db2DB/data1')
48
Suspended I/O and online split mirrorFor large databases, make copies from a mirrored image by using suspended I/O and the split mirror function. This approach also:
Eliminates backup operation overhead from the production machine
Represents a fast way to clone systems.
Represents a fast implementation of idle standby failover.
Disk mirroring is the process of writing data to two separate hard disks at the same time. One copy of the data is called a mirror of the other. Splitting a mirror is the process of separating the two copies.
49
Summary
Load - This utility is best suited to situations where performance is your primary concern.Ingest - This utility strikes a good balance between performance andavailability, but if the latter is more important to you, then you should choose the ingest utility instead of the load utility.Import - The import utility can be a good alternative to the load utilityin the following situations:- where the target table is a view.- the target table has constraints and you don't want the target table to be put in the Set Integrity Pending state.- the target table has triggers and you want them fired.
50
References
IBM Red book on DB2 Data Movement.IBM Knowledge center for DB2 V9.7 and V10.1.IBM Developer Works Technical Library.IDUG technical archives.
JEYABARATHI [email protected]
Session D09Title: DB2 Data Movement Utilities : A comparison
Please fill out your session evaluation before leaving!