a colorful introduction to db2 udb for unix, linux, and windows_ overview of the database...

15
Sign in (or register) English Technical topics Evaluation software Community Events A colorful introduction to DB2 UDB, Version 8 for UNIX, Linux, and Windows: Overview of the Database Partitioning Feature Part 2 of 2 Raul Chong ([email protected] ), Database consultant, SDI Corp. Summary: This article provides an introduction to DB2 UDB for Linux, UNIX and Windows Version 8 with the Database Partitioning Feature (DPF). It is the second part of a two part series. Date: 18 Mar 2004 Level: Introductory Activity: 8329 views Comments: 0 (View | Add comment - Sign in) Average rating (29 votes) Rate this article Introduction In part 1 of this article series, you were introduced to the IBM® DB2® Universal Database™ (UDB) for Linux®, UNIX®, and Windows® environments. Instances, configuration parameters, connectivity information, and different database objects were described. In this second part of the series we introduce you to the Database Partitioning Feature (DPF) available on DB2 UDB Enterprise Server Edition (ESE). With DPF your database is scalable as you can add new machines and spread your database across them. This means more CPUs, more memory and more disks from each of the additional machines for your database!. DB2 UDB ESE with DPF is ideal to manage data warehousing, data mining and online analytical processing (OLAP) workloads. It can also work well with online transaction processing (OLTP) workloads. Note that there is no technical requirement to enable this feature other than purchasing the required license. From a user's perspective, you will connect to the database and issue queries as usual without a need to know the database is spread among several machines. What you learned in part 1 where we discussed single-partition environments is still applicable to a multi-partition one; all the concepts remain the same, though we need to point out some implementation differences. For example, we explain in which machine your configuration files, connectivity information and so on reside. We also introduce a few new concepts like partitions, partition groups, coordinator node, and so on which were not relevant in a single-partition environment. As in the first part, figures are used to visually explain concepts. In this article, I use DB2 to refer to DB2 Universal Database Enterprise Server Edition Version 8.1 for Linux, UNIX and Windows. I would suggest you print this article rather than reading it online, as I will reference many of the figures all along. Concepts will be explained briefly at first, and will then be tied in last section in a case study. Note: Prior to Version 8, DB2 UDB ESE with DPF was known as DB2 UDB Enterprise Extended Edition (EEE). Database Partitions When a database is partitioned, you split your database into different independent parts, each consisting of its own data, configuration files, indexes and transaction logs. Each of these parts is a database partition. You can assign multiple partitions to a single physical machine. These are sometimes called 'logical partitions' and they share the resources of the machine. A single-partition database is a database with only one partition. We talked about this type of environment in part 1 of this article series. A multi-partition database (also referred to as a partitioned database), is a database with two or more partitions. This type of database is what we discuss in this part of this article series. Depending on your hardware environment, there can be several configurations in which you can partition your database. Figure 1 shows configurations of physical partitions, one partition per machine. Figure 1 - Database partition configurations with one partition per machine 02/03/2011 A colorful introduction to DB2 UDB, V… ibm.com/…/dm-0403chong/ 1/15

Upload: marcos-alvarado

Post on 28-Jul-2015

97 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

Sign in (or register)English

Technical topics Evaluation software Community Events

A colorful introduction to DB2 UDB, Version 8 for UNIX, Linux, and

Windows: Overview of the Database Partitioning Feature

Part 2 of 2

Raul Chong ([email protected]), Database consultant, SDI Corp.

Summary: This article provides an introduction to DB2 UDB for Linux, UNIX and Windows Version 8 with the Database Partitioning Feature (DPF). It is the

second part of a two part series.

Date: 18 Mar 2004

Level: Introductory

Activity: 8329 views

Comments: 0 (View | Add comment - Sign in)

Average rating (29 votes)

Rate this article

Introduction

In part 1 of this article series, you were introduced to the IBM® DB2® Universal Database™ (UDB) for Linux®, UNIX®, and Windows® environments.

Instances, configuration parameters, connectivity information, and different database objects were described. In this second part of the series we introduce you to

the Database Partitioning Feature (DPF) available on DB2 UDB Enterprise Server Edition (ESE). With DPF your database is scalable as you can add new

machines and spread your database across them. This means more CPUs, more memory and more disks from each of the additional machines for your database!.

DB2 UDB ESE with DPF is ideal to manage data warehousing, data mining and online analytical processing (OLAP) workloads. It can also work well with online

transaction processing (OLTP) workloads. Note that there is no technical requirement to enable this feature other than purchasing the required license. From a

user's perspective, you will connect to the database and issue queries as usual without a need to know the database is spread among several machines.

What you learned in part 1 where we discussed single-partition environments is still applicable to a multi-partition one; all the concepts remain the same, though we

need to point out some implementation differences. For example, we explain in which machine your configuration files, connectivity information and so on reside.

We also introduce a few new concepts like partitions, partition groups, coordinator node, and so on which were not relevant in a single-partition environment.

As in the first part, figures are used to visually explain concepts. In this article, I use DB2 to refer to DB2 Universal Database Enterprise Server Edition Version 8.1

for Linux, UNIX and Windows. I would suggest you print this article rather than reading it online, as I will reference many of the figures all along. Concepts will be

explained briefly at first, and will then be tied in last section in a case study.

Note: Prior to Version 8, DB2 UDB ESE with DPF was known as DB2 UDB Enterprise Extended Edition (EEE).

Database Partitions

When a database is partitioned, you split your database into different independent parts, each consisting of its own data, configuration files, indexes and transaction

logs. Each of these parts is a database partition. You can assign multiple partitions to a single physical machine. These are sometimes called 'logical partitions' and

they share the resources of the machine.

A single-partition database is a database with only one partition. We talked about this type of environment in part 1 of this article series.

A multi-partition database (also referred to as a partitioned database), is a database with two or more partitions. This type of database is what we discuss in this

part of this article series. Depending on your hardware environment, there can be several configurations in which you can partition your database. Figure 1 shows

configurations of physical partitions, one partition per machine.

Figure 1 - Database partition configurations with one partition per machine

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 1/15

Page 2: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

Note: In Figure 1, the SMP systems can be replaced by uniprocessor systems. Uniprocessor systems are fairly rare now for environments using DB2 with DPF,

therefore we have omitted them from the figure.

Figure 2 shows more multi-partition configurations with several partitions per machine.

Figure 2 - Database partition configurations with multiple partitions per machine.

Note: Prior to Version 8, the term 'node' was used instead of 'partition'. With version 8 there may still be some concepts, commands and messages using this term.

Also note that the node directory concept has no relationship whatsoever to the partition concept.

In order to leverage the knowledge gained in part 1 , and to visualize how a DB2 environment is split in a DPF system, we show in Figure 3 a partial reproduction

of the figure in part 1 of the article series, and how it has been split into three physical partitions, one partition per SMP machine. Thus, we show how the DB2

environment changes from a single partition multi-processor configuration (shown in I of Figure 1) to a multi-partition multi-processor configuration (shown in II of

Figure 1).

We have changed the machine in the original figure to use the Linux operating system instead of the Windows operating system. DPF is more popular in

Linux/UNIX environments than in Windows environments, therefore we will focus our discussion on these platforms, though most concepts are the same in

Windows.

Figure 3 - The DB2 Environment in DB2 UDB ESE with DPF

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 2/15

Page 3: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

As you can see from Figure 3, the DB2 environment is 'split' so that it now resides in three different multi-processor machines running the same operating system

(Linux, for this example), and also running the same DB2 version (version 8.1) at the same fixpak level. Note it is not the purpose of this figure to illustrate how to

migrate from an single-partition database system to a multi-partition database system, but to visually show where files would be located on a new installation of a

multi-partition system.

It is also important to note, all of the machines participating in a DPF environment have to be interconnected by a communication facility which could be a network

using the TCPIP protocol. Several TCPIP ports are reserved on each machine for this 'interpartition' communication. For example, by default after installation, the

services file in Linux (/etc/services) is updated as follows (assuming you chose to create the db2inst1 instance):

DB2_db2inst1 60000/tcp

DB2_db2inst1_1 60001/tcp

DB2_db2inst1_2 60002/tcp

DB2_db2inst1_END 60003/tcp

db2c_db2inst1 50000/tcp

The ports 60000 through 60003 are reserved for inter-partition communication by default. You need to update the services file with the correct number of entries

depending on the number of partitions you are configuring.

When partitions within the same machine are involved, communication between the partitions still requires this setup. Only in AIX can you force inter-partition

communication to be performed in memory by setting the DB2 registry variable DB2_FORCE_FCM_BP to YES.

Port 50000 is used by default to connect to the db2inst1 instance. When a client needs to connect to a DPF system, it needs to issue some catalog commands as

described in part 1 of this article series, and also explained in the Connectivity Cheatsheet article. The host name to be used in these commands can be from any of

the machines participating in the DPF environment. The machine selected would become the coordinator node, unless the NODE option of the CONNECT

statement is used. The concept of coordinator node is described later in this article.

Note: Note: The DB2 entries in the services file for each of the participating machines in a DPF environment should be the same.

The node configuration file (db2nodes.cfg)

The db2nodes.cfg file contains information about the database partitions and the servers they reside on that will participate in an instance. The information of this file

is used in some cases to determine the command scope. The section issuing commands and SQL statements in a DPF environment' later in this article discuss in

more detail command scope.

Figure 4 shows an example of the db2nodes.cfg file for a cluster of four UNIX servers where two partitions on each server are created (option IV in Figure 2).

Figure 4 - The db2nodes.cfg for a cluster of four UNIX servers with 2 logical partitions per server

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 3/15

Page 4: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

The columns in a Linux/UNIX environment for the db2nodes.cfg file are:

Partition_number Hostname Logical_Port netname resourcename

The partition number, the first column in the db2nodes.cfg file, indicates the number that identifies the partition within DB2. In the figure, we see there will be 8

partitions in total starting from partition 0. The numbering of the partitions must be in ascending order, and can start from any number. Gaps between the numbers

are also allowed. Note that the numbering used will be important as it will be taken into consideration in commands or SQL statements as we will see in subsequent

sections of this article.

The second column is the TCP/IP hostname of the server where the partition is created.

The third column is optional, but must be used if you create more than one partition on the same server. This column specifies the logical port for the partition within

the server and must be unique within a server. For the example highlighted in Fgure 4, we can see the mapping between the db2nodes.cfg entry for server

'myserverb', and how the physical machine looks like.

The fourth column, the netname, is optional and is used to support a host that has more than one active TCP/IP interface, each with its own hostname.

The fifth column, resourcename, is optional an applies only to AIX, HP-UX, and Solaris operating systems. It specifies the operating system resource that the

partition should be started in.

On Windows, there is also another column, the computer name, which contains the computer name for the machine on which a partition resides. The order of the

columns is:

Partition_number Hostname Computer_name Logical_Port netname

The db2nodes.cfg file must be located as follows:

Under the SQLLIB directory for the instance owner on Linux and UNIX

Under the SQLLIB\<Instance name> directory on Windows

In Figure 3, it would be included in the Linux3 machine (We did not show it there because we did not want to introduce new concepts on that figure).

On Linux and UNIX you can edit the db2nodes.cfg file with any ASCII editor or using DB2 commands. On Windows, you can only use the db2ncrt and

db2ndrop commands to create and drop database partitions; the db2nodes.cfg file should not be edited directly.

You can also use the db2start command to add and or remove a database partition from the DB2 instance and the db2nodes.cfg file using the add dbpartitionnum

and drop dbpartitionnum clauses respectively.

Installing DB2 in a DPF environment

In a DPF environment each of the participating machines needs to have DB2 installed locally. To ease the installation of DB2 in other machines, we recommend

you to create a response file when you first install DB2 on the first machine. Once this response file is created, you can transfer the file to the other machines and

run an unattended installation using the command db2setup -r <response file> on each machine.

The DB2 install code from the installation CD can reside on a shared disk, from which you proceed to locally install DB2. For example, in Figure 3 let's assume the

Linux2 machine has an associated disk where the install code from the installation CD has been copied. This disk is set up so that it can be shared by the other two

machines, Linux1 and Linux3. Next, from each of the machines, the db2setup command can be executed to locally install DB2, therefore each machine will have its

own local directory /opt/IBM/db2/V8.1 containing the DB2 installed binaries. You may think an alternative would be to install DB2 locally only on the Linux2

machine, and share the installed binaries with Linux1 and Linux3, however this configuration is not supported.

Note: In Linux, the DB2 installed binaries are locally stored in directory /opt/IBM/db2/V8.1. In other platforms DB2 is installed in other directories. Use the

db2level command to determine where the installed code has been locally installed in your machine.

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 4/15

Page 5: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

An instance in the DPF environment

Partitioning is a concept that applies at the database level, not the instance level. Figure 3 may give you the impression that an instance can be partitioned. What in

fact happens in a DPF environment is that an instance is separately created on each machine, however, they all use the same characteristics: The same instance

name, same password, and a shared home directory.

In Linux and UNIX, an instance maps to an operating system user, therefore, when an instance is created, it will have its own home directory. For example, in most

installations /home/<user name> is used as the home directory. All the instances separately created on each of the participating machines of a DPF environment

must use the same name and password. In addition, you must specify the home directory of the corresponding operating system user to be the same directory for

all instances, which must be created in a shared disk (NFS share). Figure 5 shows an example.

Figure 5 - An instance in a partitioned environment

In Figure 5, the instance 'myInst' has been created separately in each of the three Linux machines (Linux1, Linux2 and Linux3). myInst maps to an operating system

user of the same name which in the figure has a home directory of /db2home/myInst. Note that /db2home/myInst is shared by all the three Linux machines as it

resides in a shared disk which is local to Linux3. Since the instance owner directory is locally stored in the Linux3 machine, this machine is considered to be the

DB2 Instance owning machine.

Figure 5 also shows that the Database Administration Server (DAS) user, db2as, is created locally on each participating machine of a DPF environment. There can

only be one DAS per physical machine regardless of the number of partitions that machine contains. The DAS user's home directory cannot be mounted on a

shared disk. Alternatively, different userid and passwords can be used to create the DAS in different machines.

Note: In each of the participating machines of a DPF environment, make sure the passwords for the instances are the same, otherwise the partitioned system will

look like it is hanging as the partitions are not able to talk with each other.

The instance owning machine

As indicated earlier, the DB2 Instance owning machine in a DPF environment is the one whose associated disk physically stores the instance home directory. The

instance owning machine as shown in Figure 3 has several files under the sqllib directory or its subdirectories in the home directory:

Instance level profile registry

Database Manager Configuration File (dbm cfg)

System db directory

Node directory

DCS directory

If you change the value of any of the parameters or connectivity information in these files, all participating machines will be able to see the change as only one

shared file is being updated. For example, if you update the dbm cfg parameter intra_parallel from NO to YES using a command like this from the Linux1 machine:

db2 update dbm cfg using intra_parallel yes

and then you issue the following command from the Linux2 machine:

db2 get dbm cfg

You are able to see from Linux2 that the change took effect as you are updating one single shared file physically stored on machine Linux3. For parameters that are

not configurable online, a db2stop/db2start is required. Issuing these commands from any partition will affect all partitions specified in the db2nodes.cfg file. In a

Linux/UNIX environment, you will receive messages from each of the partitions. The case study will demonstrate how this is performed.

Partitioning a database

In order to 'partition' your database, create a directory with the same name locally in each of the different participating machines. For figure 1 for example, let's

assume the directory /data has been created locally in each machine. Then, when you execute the command:

create database MYDB2 on /data

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 5/15

Page 6: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

the following directory structure is automatically built for you:

/data/<instance name>/NODExxxx/SQLyyyyy

where:

/data: Directory specified in the create database command. This directory must exist before executing the create database command.

<instance name>: Name of the instance. For example 'myInst'

NODExxxx: This is used to distinguish which partition you are working with, where 'xxxx' represents the number of the partition as specified in the db2nodes.cfg

file which we will discuss later in this article

SQLyyyyy: This identifies the database, where 'yyyyy' represents a number. If you only have one database created in your system, then yyyyy is equal to 00001, if

you have 3 databases in your system, you will have different directories as follows: SQL00001, SQL00002, SQL00003. In order to map the database names to

these directories, you can review the local database directory using the command:

db2 list db directory on /data

Inside the SQLyyyyy directories, there will be subdirectories for table spaces, and within them, files containing database data (assuming all table spaces are defined

as SMS). Figure 6 shows a visual example of a partitioned database.

Figure 6 - A partitioned database created in the /data directory

If you don't specify a path in your create database command, by default the database is created in the directory specfied by the dbm cfg parameter DFTDBPATH,

which defaults to the the instance owner home directory. The structure created under this scenario would be:

/db2home

/myInst

/NODE0000

/SQL00001

/NODE0001

/SQL00001

/NODE0002

/SQL00001

This partitioning is not optimal as all the database data would reside in one single disk which is shared by the other machines. Creating a database like this in a DPF

environment defeats the purpose of partitioning.

Note: Before creating a database, change the value of dbm cfg parameter DFTDBPATH or make sure to include a path in your create database command. In

either case, the DFTDBPATH or path should point to a directory created locally with the same name in all the participating machines of your DPF system. This

directory must be created ahead of time. Similarly, to create the SAMPLE database, specify the path in the command as in: db2sampl <path>

Configuration Files in a DPF environment

In a previous section we indicated that the instance level profile registry, the database manager configuration (dbm cfg) file, the system database directory, node

directory and DCS directory are all part of the instance owning machine, and are not partitioned. What about the other configuration files?

Environment variables:

Each participating machine in a partitioned environment can have different environment variables.

Global level profile registry variable:

This is stored in file default.env, in a subdirectory under /var. There is a different local copy of this file for each machine.

Database Configuration File (db cfg):

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 6/15

Page 7: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

The db cfg is stored in file SQLDBCON, in the corresponding directory for the database in question. In a partitioned database environment, a separate

SQLDBCON file exists for each database partition.

The local database directory:

The local database directory is stored in file SQLDBDIR in the corresponding directory for the database in question. It has the same name as the system db

directory, which is located under the instance directory. A separate SQLDBDIR file exists for each database partition.

Note: The values of the Global level profile registry variables, database configuration file parameters and local database directory entries may be different at each

database partition, however the recommendation is that each of the values on all partitions be the same.

Logs in a DPF environment

Each database partition should keep its logs separate from each other. Therefore the db cfg parameter 'Path to log files' in each of the partitions should not point to

a shared disk directory, but to a local directory. The default log path in each partition will include a NODE000x subdirectory as part of the log path. For example,

the value of this parameter in the DPF system shown in Figure 3 could be:

For Partition 0: /datalogs/db2inst1/NODE0000/SQL00001/SQLOGDIR/

For Partition 1: /datalogs/db2inst1/NODE0001/SQL00001/SQLOGDIR/

For Partition 2: /datalogs/db2inst1/NODE0002/SQL00001/SQLOGDIR/

If you were to manually update this path, use the db cfg parameter NEWLOGPATH. As we will see in a later section this command can be executed once and

affect all partitions using the db2_all command.

The Catalog Partition

When a database is created, several table spaces are created by default. One of them is SYSCATSPACE which contains the DB2 Catalog. In a DPF

environment, SYSCATSPACE cannot be partitioned, but must reside in one partition, known as the Catalog Partition. The partition from which the create

database command is issued becomes the Catalog partition for the new database. All access to system tables must go through this database partition.

Figure 3 shows SYSCATSPACE residing on machine Linux1, therefore the create database command was issued from this partition and this becomes the Catalog

partition. If you create several databases for a particular instance, make sure to issue the create database command from different partitions so that their

corresponding Catalog tables are created on different partitions.

For an existing database, you can determine which partition is the Catalog partition by issuing the command list db directory The output of this command has

the field 'Catalog database partition number' for each of the entries indicating the Catalog partition number of the given database. Alternatively, you can connect to

the database from each of the different machines and issue the list tablespaces show detail command. Only the Catalog partition will show the SYSCATSPACE

table space.

Partition Groups

A partition group is a logical layer that allows the grouping of one or more partitions to perform operations on all the partitions in the group. A database partition

can belong to more than one partition group. When a database is created, DB2 will create three default partition groups which cannot be dropped:

IBMDEFAULTGROUP: This is the default partition group for any table you create. It consists of all database partitions as defined in the db2nodes.cfg. This

partition group cannot be modified. Table space USERSPACE1 is created in this partition group.

IBMTEMPGROUP: This partition group is used by all system temporary tables. It also consists of all database partitions as defined in the db2nodes.cfg.

Table space TEMPSPACE1 is created in this partition.

IBMCATGROUP: This partition group contains the catalog tables (table space SYSCATSPACE), and thus, it only includes the database's catalog

partition. This partition group cannot be modified.

To create new database partition groups, you can use the create database partition group statement. This statement will create the database partition group within

the database, assign database partitions that you specified to the partition group, and then record the partition group definition in the database system catalog

tables.

For example, the following statement creates a partition group 'pgrpall' on all partitions specified in the db2nodes.cfg file:

create database partition group pgrpall on all dbpartitionnums

To create a database partition group 'pg23' consisting of partitions 2 and 3, issue this command:

create database partition group pg23 on dbpartitionnums (2,3)

Other relevant partition group statements/commands are:

alter database partition group, statement to add or drop a partition in the group

drop database partition group, statement to drop a partition group

list database partition group, command to list all your partition groups (IBMTEMPGROUP will not be listed)

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 7/15

Page 8: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

Buffer pools in a DPF environment

In Figure 3, we showed bufferpools split across the different partitions. Interpreting this figure for buffer pools, is different than for the other objects, because the

data cached in the bufferpools in not partitioned as the figure may imply. What is actually happening is that buffer pools in a DPF environment can be tailored to the

different partitions. Using the CREATE BUFFERPOOL statement with the DATABASE PARTITION GROUP clause, you can associate a bufferpool to a given

partition group. What this means is that you can have the flexibility to define the buffer pool to the specific partitions defined in the partition group. In addition, the

size of the buffer pool on each partition in the partition group can be different if desired. For example, the following statement will create buffer pool 'bpool_1' in

partition group 'pg1', which let's assume consists of partitions 2, 3 and 4:

CREATE BUFFERPOOL bpool_1 DATABASE PARTITION GROUP pg1 1

SIZE 10000

EXCEPT ON DBPARTITIONNUM (3 TO 4) SIZE 5000

Each partition in partition group pg1 will have a bufferpool bpool_1 defined with a size of 10000. Partition 3 and 4 will have a bufferpool of size 5000. Using an

analogy, think of it as if you were issuing the CREATE BUFFERPOOL statement on each partition separately where the buffer pool name for each partition was

the same, but indicating different sizes, that is:

In partition 2: CREATE BUFFERPOOL bpool_1 SIZE 10000

In partition 3: CREATE BUFFERPOOL bpool_1 SIZE 5000

In partition 4: CREATE BUFFERPOOL bpool_1 SIZE 5000

Note the above statements are used to clarify the analogy, executing each of these commands as shown will attempt to create a bufferpool using all partitions. It is

not equivalent to using the DATABASE PARTITION GROUP clause of the CREATE BUFFERPOOL statement.

Buffer pools can in addition, be associated to several partition groups. All this means is that the bufferpool definition will be applied to the partitions in the given

associated partition groups.

Table spaces in a DPF environment

A table space can be created in specific partitions by associating it to a given partition group. The CREATE TABLESPACE statement with the IN DATABASE

PARTITION GROUP clause can be used for this purpose. This allows users to have flexibility as to which partitions will actually be storing their tables. For

example, the statement:

CREATE REGULAR TABLESPACE mytbls IN DATABASE PARTITION GROUP pg1

MANAGED BY SYSTEM USING ('/data') BUFFERPOOL bpool_1

creates the table space mytbls which spans partitions 2, 3 and 4 (assuming pg1 is a partition group consisting of these partitions). In addition, the table space is

associated with buffer pool bpool_1 defined in an earlier section. Note that the creation of the table space would fail if you provide conflicting partition information

between the table space and the associated buffer pool. For example, if bpool_1 was created for partitions 5 and 6, and table space mytbls was created for

partitions 2, 3 and 4, then you would get an error creating the table space.

The Coordinator node

In general, for each database connection there will be a corresponding DB2 agent performing the DB2 work. An agent can be thought of as a process (in

Linux/UNIX) or thread (in Windows) which performs DB2 work on behalf of the application. There are different types of agents, one of them being the

coordinator agent. The coordinator agent is the agent that communicates with the application, receiving requests and sending replies. It can either satisfy the request

itself or delegate the work to multiple subagents to work on the request.

The coordinator node of a given application is the partition where the coordinator agent exists. The coordinator node can also be set with the SET CLIENT

CONNECT_NODE command. When an application issues a query, parts of these database requests are sent by the coordinator node to subagents at the other

partitions; and all results from the other partitions are consolidated at the coordinator node before being sent back to the application.

Any partition can potentially be a coordinator node, therefore in Figure 3, we do not specifically point to any partition as the coordinator node. If you would like to

know more about DB2 agents and the DB2 process model, refer to the resources section in this article.

Issuing commands and SQL statements in a DPF environment

Say you have twenty physical machines, each with 2 partitions. Issuing individual commands to each physical machine or partition may be quite a task. Fortunately,

DB2 provides two main commands that will execute other commands in all machines or partitions:

rah:

Use this command when you want to execute another command for each physical machine. For example, if you want to create a directory for the twenty physical

machines, issue the following:

rah ")mkdir /tmp/$USER"

The ')' character indicates that the user's profile and file named in $RAHENV is suppressed. There are several characters that can be used instead. For an

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 8/15

Page 9: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

explanation of these characters issue the command: rah ?

db2_all:

Use this command when you want to execute another command for each database partition. For example, if you want to change the db cfg parameter logfilsiz for

database SAMPLE in each partition, issue the following:

db2_all ";db2 UPDATE DB CFG FOR sample USING LOGFILSIZ 500"

When the semicolon (;) character is placed inside double quotation marks, the request will run concurrently on all partitions. The characters that can be used in the

db2_all command are the same as the ones for the rah command.

For either rah or db2_all, the command executed can be almost anything which you could type at an interactive prompt, including several commands to be run in

sequence. On Linux and UNIX platforms, you separate multiple commands using a semicolon (;). On Windows, you separate multiple commands using an

ampersand (&). Do not use the separator character following the last command.

Command and SQL Statement scope

The scope of a command (such as db2 get db cfg, db2 list tablespaces, etc. ) is limited to the partition the session is attached to.

The scope of a command preceded by db2_all is the complete list of partitions as obtained from db2nodes.cfg

The scope of an SQL statement is the set of all partitions in db2nodes.cfg (unless WHERE DBPARTITIONNUM= predicate is used)

Using database partition expressions

In a partitioned database, if more than one database partition resides on the same physical machine, the same device or path cannot be specified for such database

partitions. You can manually specify a unique container for each database partition, or alternatively you can use database partition expressions in your command or

statement to generate values based on the partition number found in the db2nodes.cfg file. The following example should clarify this concept: Let's say you issue the

command:

CREATE TABLESPACE TS2

MANAGED BY DATABASE USING

(file '/data/TS2/container $N+100' 5000)

On a four database partition system, the following containers would be created:

/data/TS2/container100 - on DATABASE PARTITION 0

/data/TS2/container101 - on DATABASE PARTITION 1

/data/TS2/container102 - on DATABASE PARTITION 2

/data/TS2/container103 - on DATABASE PARTITION 3

A database partition expression is specified by the argument “ $N� (note there is a blank before $N). Table 1 shows other arguments for creating containers.

Operators are evaluated from left to right, and '%' represent the modulus (remainder of a division). Assuming the partition number in question is 3, the 'value'

column in table 1 shows the result of resolving the database partition expression.

Table 1. Database Partition Expressions

Database Partition Expressions Example Value

[blank]$N " $N" 3

[blank]$N+[number] " $N+500" 503

[blank]$N%[number] " $N%2" 1

[blank]$N+[number]%[number] " $N+15%13" 5

[blank]$N%[number]+[number] " $N%2+20" 21

The DB2NODE environment variable

In in part 1 of this article series we talked about the DB2INSTANCE environment variable used to switch between instances in your database system. The

DB2NODE environment variable is used in a similar way, but to switch between partitions in your DPF system. By default, the active partition is the one defined

with the logical port of zero (0) in the db2nodes.cfg file for a given server. In order to switch the active partition use the SET command in Windows, or export in

Linux/UNIX. Make sure to terminate all connections from any partition to your database after changing this variable, otherwise the change will not take effect.

For example, using the db2nodes.cfg file shown in Figure 4, you have four servers, each with 2 logical partitions. If you log on to server myserverb, any commands

you execute will affect partition 2, which is the one with logical port of zero, and thus the default active partition for that server machine. If you would like to switch

to partition 0 to be the active partition make this change in a Linux system:

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 9/15

Page 10: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

DB2NODE=0

export DB2NODE

db2 terminate

(Having no connections from any partitions is not enough, make sure a 'terminate' is issued).

Note that partition 0 is in server myservera. We can make this partition the active one even from myserverb. To determine which is your active node, you can issue

this statement after connecting to a database:

db2 "values (current dbpartitionnum)"

Partitioning Maps and Partitioning keys

By now, you should have a good understanding about how to set up a DPF environment. It is now time to understand how the distribution of the rows of data

across the different partitions is performed. Figure 7 shows an example of this distribution.

Figure 7 - Distributing data rows in a DPF environment

A partitioning map is an internally generated array containing either 4,096 entries for multiple-partition database partition groups, or a single entry for single-partition

database partition groups. The partition numbers of the database partition group are specified in a round-robin fashion.

A partitioning key is a column (or group of columns) that is used to determine the partition in which a particular row of data is stored. A partitioning key is defined

on a table using the CREATE TABLE statement.

When a database partition group is created or modified, a partitioning map is associated with it. A partitioning map, in conjunction with a partitioning key and a

hashing algorithm, is used by DB2 to determine which database partition in the database partition group will store a given row of data.

For the example in Figure 7, let's assume partition group 'pg0123' has been defined on partitions 0, 1, 2 and 3. An associated partitioning map is automatically

created, that is, an array with 4096 entries containing the values 0,1,2,3,0,1,2,3... (partition numbers are stored in round robin fashion by default, though this can be

changed). Let's also assume table 'mytable' has been created with a partitioning key consisting of columns col1, col2 and col3. For each row, the partitioning key

column values are passed to the hashing algorithm which will return an output number from 0 to 4095. This number corresponds to one of the entries in the

partitioning map array which contains the value of the partition number where the row is to be stored. In Figure 5, if the hashing algorithm had returned an output

value of 7, then the row would have been stored in partition p3.

Case Study

Now that you are familiar with DPF, let's review all the concepts in this article with a simple case study.

Your company is expanding, and has recently acquired two other firms. Since the amount of data will be increased twofold approximately, you are wondering if

your current single partition DB2 database server will be able to handle the load, or if DB2 with DPF is required. Since you are not too familiar with DB2 with

DPF, you decide to play around with it using your test machines: Two SMP machines with 4 processors each running the Linux Operating System. The previous

DBA already installed DB2 UDB ESE with DPF on these machines. He has left the company, but fortunately, he left the diagram shown in Figure 8 with his design.

Figure 8 - DB2 UDB ESE with DPF - A case study

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 10/15

Page 11: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

Figure 8 is a combination of physical and logical design. When you verify the correctness of the diagram with what your system has, you note the 'mydb1' database

has been dropped, so you decide to rebuild this database as practice. The instance 'db2inst1' is still there as well as other databases. These are the steps you

follow:

1. You open two telnet sessions, one for each SMP machine. From one of the sessions, you issue the db2stop followed by the db2start command as shown in

Figure 9.

Figure 9 - db2stop and db2start

The first thing you note is that for these two commands, there is no need to issue them from each partition. Issuing them from any partition once will affect all

partitions. You also can tell there are four partitions, as you will receive a message from each of them.

2. Next, you would like to review the db2nodes.cfg file to review the configuration of your partitions. Using operating system commands, you determine the home

directory for instance db2inst1, which for this example is /db2home/db2inst1. Thus, file db2nodes.cfg would be stored in directory /db2home/db2inst1/sqllib. The

contents of the file are shown in Figure 10.

Figure 10 - db2nodes.cfg

Figure 10 shows there are four partitions, two per server. The server host names are 65658572 and 65658161.

3. Next, you would like to create database 'mydb1', but you would like to issue this command from partition 0 because you want this partition to be the catalog

partition. So you issue the statement db2 "values (current dbpartitionnum)" to determine which partition is currently the active one from your session. Next you

change the DB2NODE environment variable to zero (0).

Just before creating the database, you also check the value of dbm cfg parameter DFTDBPATH. This is the path where the database will be created by default.

You want the path specified (/db2database in this example) to be created locally in all the partitions so that the data is spread across them. Using operating system

commands, you confirm this is the case, so you go ahead and create the database. This is shown in Figure 11.

Figure 11 - Determining and switching the active partition. Then creating a database

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 11/15

Page 12: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

4. To confirm partition 0 is indeed the catalog partition, simply issue a list db directory command and look for the 'Catalog database partition number' field under

the entry for the 'mydb1' database. Alternatively issue a list tablespaces command from each partition. The SYSCATSPACE tablespace should be listed only on

the Catalog partition.

5. Next, you need to create partition group pg23 on partitions 2 and 3. Figure 12 shows you how to accomplish this. In addition, the figure shows you how to list

your partition groups. The IBMTEMPGROUP is never listed with this command.

Figure 12 - Creating partition group pg23

6. You now want to create and manage your bufferpools. To create bufferpool BP23 on partition group pg23 issue this command:

db2 “create bufferpool BP23 database partition group pg23 size 500†�

Figure 13 shows this statement. It also shows you how associate this bufferpool to another partition group using the ALTER BUFFERPOOL statement.

To list your bufferpools and associated partition groups you can query the syscat.bufferpools view or sysibm.sysbufferpools catalog table.

Figure 13 - Managing bufferpools

Note that a bufferpool can be associated to any partition group. Its definition will be applied to all the partitions in the partition group, and you can specify different

sizes on the partitions if required.

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 12/15

Page 13: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

7. To create tablespace mytbls1, issue the following:

db2 "create tablespace mytbls1 in database partition group pg23

managed by system using ('/data') bufferpool bp23"

8. To create table table1 in tablespace mytbls1 and with a partitioning key of col1 and col2, issue the following:

db2 "create table table1 (col1 int, col2 int, col3 char(10))

in mytbls1

partitioning key (col1, col2)"

9. Creating the index index1 has nothing specific to a DPF environment:

db2 "create index index1 on table1 (col1, col2)"

The index will be constructed on each partition for its subset of rows.

10. Next, let's say you would like to update parameter INTRA_PARALLEL from NO to YES using the command update dbm cfg using INTRA_PARALLEL

YES as shown in Figure 14. Note as well that the get dbm cfg is also issued, but from the other telnet session. The update version of the value of this parameter is

shown from any partition, this shows that you are updating one shared file.

Figure 14 - Update dbm cfg

11. Next, you would like to test the db2_all command to update the db cfg file for all partitions with one command. Figure 15 shows an example of this. The

second window in the figure shows the get db cfg before and after the execution of the db2_all command on a partition other than the one where the db2_all

command was executed.

Figure 15 - Using db2_all to update the db cfg

12. The last thing you would like to test is the rah command. Figure 16 shows an example of using rah to create a subdirectory

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 13/15

Page 14: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

Figure 16 - Using rah

As you can see, the rah command works per machine, while the db2_all command works per database partition.

And you're done!

Acknowledgements

Special thanks to Bill Wilkins (IBM Content Management Partner Enablement consultant) and Scott Martin (IBM Innovation Center for Business Partners

consultant) for their technical review of this article.

Resources

A Colorful Introduction to DB2 UDB Version 8 for Linux, UNIX and Windows

DB2 Version 8 Connectivity Cheatsheet

Everything You Wanted to Know About DB2 Universal Database Processes

Understanding DB2 - Learning Visually With Examples. This book uses the same writing style of this article to explain DB2 UDB topics, and is an excellent

resource to prepare you for the DB2 UDB certification exams.

About the author

Raul F. Chong is a database consultant from the IBM Toronto Laboratory and works primarily with IBM Business Partners. Raul has worked at IBM for six

years, three years in DB2 Technical Support, and the other three as a consultant specializing in database application development and migrations from other

RDBMS to DB2.

Close [x]

developerWorks: Sign in

If you don't have an IBM ID and password, register here.

IBM ID:

Forgot your IBM ID?

Password:

Forgot your password?

Change your password

After sign in: Stay on the current page

By clicking Submit, you agree to the developerWorks terms of use.

Submit Cancel

The first time you sign into developerWorks, a profile is created for you. This profile includes the first name, last name, and display name you identified when you

registered with developerWorks. Select information in your developerWorks profile is displayed to the public, but you may edit the information at any

time. Your first name, last name (unless you choose to hide them), and display name will accompany the content that you post.

All information submitted is secure.

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 14/15

Page 15: A Colorful Introduction to DB2 UDB for UNIX, Linux, And Windows_ Overview of the Database Partitioning Feature

Print this page Share this page Follow developerWorks

Technical topics

AIX and UNIX

Information

Management

Lotus

Rational

Tivoli

WebSphere

Cloud computing

Industries

Integrated Service

Management

Java technology

Linux

Open source

SOA and web services

Web development

XML

More...

Evaluation software

By IBM product

By evaluation method

By industry

Events

Briefings

Webcasts

Find events

Community

Forums

Groups

Blogs

Wikis

Terms of use

Report abuse

More...

About developerWorks

Site help and feedback

Contacts

Article submissions

Related resources

Students and faculty

Business partners

IBM

Solutions

Software

Software services

Support

Product information

Redbooks

Privacy

Accessibility

Close [x]

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you

post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email

address for privacy reasons.

Display name: (Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

Submit Cancel

All information submitted is secure.

Average rating (29 votes)

1 star 1 star

2 stars 2 stars

3 stars 3 stars

4 stars 4 stars

5 stars 5 stars

Submit

Add comment:

Sign in or register to leave a comment.

Note: HTML elements are not supported within comments.

Notify me when a comment is added1000 characters left

Post

Be the first to add a comment

02/03/2011 A colorful introduction to DB2 UDB, V…

ibm.com/…/dm-0403chong/ 15/15