5558 chapter 1 teradata parallel architecture

10
7/31/2019 5558 Chapter 1 Teradata Parallel Architecture http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 1/10  Teradata SQL: Unleash the Power  by Michael Larkins and Tom Coffing Coffing Data Warehousing. (c) 2001. Copying Prohibited. Reprinted for [email protected] [email protected], Accenture [email protected] Reprinted with permission as a subscription benefit of Skillport , http://skillport.books24x7.com/  All rights reserved. Reproduction and/or distribution in whole or in part in electronic,paper or other forms without written permission is prohibited. 

Upload: madhu-devarasetti

Post on 04-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 5558 Chapter 1 Teradata Parallel Architecture

7/31/2019 5558 Chapter 1 Teradata Parallel Architecture

http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 1/10

 

Teradat a SQL: Un leash t he Pow er  by Michael Larkins and Tom Coffing

Coffing Data Warehousing. (c) 2001. Copying Prohibited.

Reprinted for [email protected] [email protected],

Accenture 

[email protected] 

Reprinted with permission as a subscription benefit of Sk i l lpor t ,

http://skillport.books24x7.com/ 

All rights reserved. Reproduction and/or distribution in whole or in part in electronic,paper orother forms without written permission is prohibited. 

Page 2: 5558 Chapter 1 Teradata Parallel Architecture

7/31/2019 5558 Chapter 1 Teradata Parallel Architecture

http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 2/10

Chapter 1: Teradata Parallel Architecture

Teradata Introduction

The world's largest data warehouses commonly use the superior technology of NCR's Teradata relational databasemanagement system (RDBMS). A data warehouse is normally loaded directly from operational data. The majority, if not all

of this data will be collected on-line as a result of normal business operations. The data warehouse therefore acts as acentral repository of the data that reflects the effectiveness of the methodologies used in running a business.

 As a result, the data loaded into the warehouse is mostly historic in nature. To get a true representation of the business,normally this data is not changed once it is loaded. Instead, it is interrogated repeatedly to transform data into usefulinformation, to discover trends and the effectiveness of operational procedures. This interrogation is based on businessrules to determine such aspects as profitability, return on investment and evaluation of risk.

For example, an airline might load all of its maintenance activity on every aircraft into the database. Subsequentinvestigation of the data could indicate the frequency at which certain parts tend to fail. Further analysis might show thatthe parts are failing more often on certain models of aircraft. The first benefit of the new found knowledge regards theability to plan for the next failure and maybe even the type of airplane on which the part will fail. Therefore, the part can beon hand when and maybe where it is needed, or the part might be proactively changed prior to its failure.

If the information reveals that the part is failing more frequently on a particular model of aircraft, this could be an indicationthat the aircraft manufacturer has a problem with the design or production of that aircraft. Another possible cause is thatthe maintenance crew is doing something incorrectly and contributing to the situation. Either way, you cannot fix a problemif you do not know that a problem exists. There is incredible power and savings in this type of knowledge.

 Another business area where the Teradata database excels is in retail. It provides an environment that can store billions of sales. This is a critical capability when you are recording and analyzing the sales of every item in every store around theworld. Whether it is used for inventory control, marketing research or credit analysis, the data provides an insight into thebusiness. This type of knowledge is not easily attainable without detailed data that records every aspect of the business.Tracking inventory turns, stock replenishment, or predicting the number of goods needed in a particular store yields apriceless perspective into the operation of a retail outlet. This information is what enables one retailer to thrive while othersgo out of business.

Teradata is flourishing with the realization that detail data is critical to the survival of a business in a competitive, lower margin environment. Continually, businesses are forced to do more with less. Therefore, it is vital to maximize the effortsthat work well to improve profit and minimize or correct those that do not work.

One computer vendor used these same techniques to determine that it cost more to sell into the desktop environment thanwas realized in profit. Prior to this realization, the sales effort had attempted to make up the loss by selling more computers.Unfortunately, increased sales meant increased losses. Today, that company is doing much better and has made a hugestep into profitability by discontinuing the small computer line.

Teradata Architecture

The Teradata database currently runs normally on NCR Corporation's WorldMark Systems in the UNIX MP-RAS

environment. Some of these systems consist of a single processing node (computer) while others are several hundrednodes working together in a single system. The NCR nodes are based entirely on industry standard CPU processor chips,standard internal and external bus architectures like PCI and SCSI, and standard memory modules with 4-way interleavingfor speed.

 At the same time, Teradata can run on any hardware server in the single node environment when the system runsMicrosoft NT and Windows 2000. This single node may be any computer from a large server to a laptop.

Whether the system consists of a single node or is a massively parallel system with hundreds of nodes, the TeradataRDBMS uses the exact same components executing on all the nodes in parallel. The only difference between small andlarge systems is the number of processing components.

When these components exist on different nodes, it is essential that the components communicate with each other at high

speed. To facilitate the communications, the multi-node systems use the BYNET interconnect. It is a high speed, multi-path, dual redundant communications channel. Another amazing capability of the BYNET is that the bandwidth increaseswith each consecutive node added into the system. There is more detail on the BYNET later in this chapter.

eradata SQL: Unleash the Power 

Reprinted for OET7P/[email protected], Accenture Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Page 2 / 10

Page 3: 5558 Chapter 1 Teradata Parallel Architecture

7/31/2019 5558 Chapter 1 Teradata Parallel Architecture

http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 3/10

Teradata Components

 As previously mentioned, Teradata is the superior product today because of its parallel operations based on itsarchitectural design. It is the parallel processing by the major components that provide the power to move mountains of data. Teradata works more like the early Egyptians who built the pyramids without heavy equipment using parallel,coordinated human efforts. It uses smaller nodes running several processing components all working together on the same

user request. Therefore, a monumental task is completed in record time.

Teradata operates with three major components to achieve the parallel operations. These components are called: ParsingEngine Processors, Access Module Processors and the Message Passing Layer. The role of each component is discussedin the next sections to provide a better understanding of Teradata. Once we understand how Teradata works, we willpursue the SQL that allows storage and access of the data.

Parsing Engine Processor (PEP or PE)

The Parsing Engine Processor (PEP) or Parsing Engine (PE), for short, is one of the two primary types of processing tasksused by Teradata. It provides the entry point into the database for users on mainframe and networked computer systems. Itis the primary director task within Teradata.

 As users "logon" to the database they establish a Teradata session. Each PE can manage 120 concurrent user sessions.Within each of these sessions users submit SQL as a request for the database server to take an action on their behalf.The PE will then parse the SQL statement to establish which database objects are involved. For now, let's assume that thedatabase object is a table. A table is a two-dimensional array that consists of rows and columns. A row represents an entitystored in a table and it is defined using columns. An example of a row might be the sale of an item and its columns includethe UPC, a description and the quantity sold.

 Any action a user requests must also go through a security check to validate their privileges as defined by the databaseadministrator. Once their authorization at the object level is verified, the PE will verify that the columns requested actuallyexist within the objects referenced.

Next, the PE optimizes the SQL to create an execution plan that is as efficient as possible based on the amount of data ineach table, the indices defined, the type of indices, the selectivity level of the indices, and the number of processing steps

needed to retrieve the data. The PE is responsible for passing the optimized execution plan to other components as thebest way to gather the data.

 An execution plan might use the primary index column assigned to the table, a secondary index or a full table scan. Theuse of an index is preferable and will be discussed later in this chapter. For now, it is sufficient to say that a full table scanmeans that all rows in the table must be read and compared to locate the requested data.

 Although a full table scan sounds really bad, within the architecture of Teradata, it is not necessarily a bad thing becausethe data is divided up and distributed to multiple, parallel components throughout the database. We will look next at the

 AMPs that perform the parallel disk access using their file system logic. The AMPs manage all data storage on disks. ThePE has no disks.

 Activities of a PE:

n Convert incoming requests from EBCDIC to ASCII (if from an IBM mainframe)

n Parse the SQL to determine type and validity

n Validate user privileges

n Optimize the access path(s) to retrieve the rows

n Build an execution plan with necessary steps for row access

n Send the plan steps to Access Module Processors (AMP) involved

Access Module Processor (AMP)

The next major component of Teradata's parallel architecture is called an Access Module Processor (AMP). It stores andretrieves the distributed data in parallel. Ideally, the data rows of each table are distributed evenly across all the AMPs.

eradata SQL: Unleash the Power 

Reprinted for OET7P/[email protected], Accenture Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Page 3 / 10

Page 4: 5558 Chapter 1 Teradata Parallel Architecture

7/31/2019 5558 Chapter 1 Teradata Parallel Architecture

http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 4/10

The AMPs read and write data and are the workhorses of the database. Their job is to receive the optimized plan steps,built by the PE after it completes the optimization, and execute them. The AMPs are designed to work in parallel tocomplete the request in the shortest possible time.

Optimally, every AMP should contain a subset of all the rows loaded into every table. By dividing up the data, itautomatically divides up the work of retrieving the data. Remember, all work comes as a result of a users' SQL request. If the SQL asks for a specific row, that row exists in its entirety (all columns) on a single AMP and other rows exist on the

other AMPs.

If the user request asks for all of the rows in a table, every AMP should participate along with all the other AMPs tocomplete the retrieval of all rows. This type of processing is called an all AMP operation and an all rows scan. However,each AMP is only responsible for its rows, not the rows that belong to a different AMP. As far as the AMPs are concerned,it owns all of the rows. Within Teradata, the AMP environment is a "shared nothing" configuration. The AMPs cannotaccess each others' data rows, and there is no need for them to do so.

Once the rows have been selected, the last step is to return them to the client program that initiated the SQL request.Since the rows are scattered across multiple AMPs, they must be consolidated before reaching the client. Thisconsolidation process is accomplished as a part of the transmission to the client so that a final comprehensive sort of allthe rows is never performed. Instead, all AMPs sort only their rows (at the same time  – in parallel) and the MessagePassing Layer is used to merge the rows as they are transmitted from all the AMPs.

Therefore, when a client wishes to sequence the rows of an answer set, this technique causes the sort of all the rows to bedone in parallel. Each AMP sorts only its subset of the rows at the same time all the other AMPs sort their rows. Once all of the individual sorts are complete, the BYNET merges the sorted rows. Pretty brilliant!

 Activities of the AMP:

n Store and retrieve data rows using the file system

n  Aggregate data

n Join processing between multiple tables

n Convert ASCII returned data to EBCDIC (IBM mainframes only)

n Sort and format output data

Message Passing Layer (BYNET)

The Message Passing Layer varies depending on the specific hardware on which the Teradata database is executing. In

the latter part of the 20th century, most Teradata database systems executed under the UNIX operating system. However,in 1998, Teradata was released on Microsoft's NT operating system. Today it also executes under Windows 2000. Theinitial release of Teradata, on the Microsoft systems, is for a single node.

When using the UNIX operating system, Teradata supports up to 512 nodes. This massively parallel system establishesthe basis for storing and retrieving data from the largest commercial databases in the world, Teradata. Today, the largestsystem in the world consists of 176 nodes. There is much room for growth as the databases begin to exceed 40 or 50terabytes.

For the NCR UNIX systems, the Message Passing Layer is called the BYNET. The amazing thing about the BYNET is itscapacity. Instead of a fixed bandwidth that is shared among multiple nodes, the bandwidth of the BYNET increases as thenumber of nodes increase. This feat is accomplished as a result of using virtual circuits instead of using a single fixedcable or a twisted pair configuration.

To understand the workings of the BYNET, think of a telephone switch used by local and long distance carriers. As moreand more people place phone calls, no one needs to speak slower. As one switch becomes saturated, another switch isautomatically used. When your phone call is routed through a different switch, you do not need to speak slower. If a naturalor other type of disaster occurs and a switch is destroyed, all subsequent calls are routed through other switches. TheBYNET is designed to work like a telephone switching network.

 An additional aspect of the BYNET is that it is really two connection paths, like having two phone lines for a business. Theredundancy allows for two different aspects of its performance. The first aspect is speed. Each path of the BYNET

eradata SQL: Unleash the Power 

Reprinted for OET7P/[email protected], Accenture Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Page 4 / 10

Page 5: 5558 Chapter 1 Teradata Parallel Architecture

7/31/2019 5558 Chapter 1 Teradata Parallel Architecture

http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 5/10

provides bandwidth of 10 Megabytes (MB) per second with Version 1 and 60 MB per second with Version 2. Therefore theaggregate speed of the two connections is 20MB/second or 120MB/second. However, as mentioned earlier, the bandwidthgrows linearly as more nodes are added.

Using Version 1 any two nodes communicate at 40MB/second (10MB/second * 2 BYNETs * 2 nodes). Therefore, 10 nodescan utilize 200MB/second and 100 nodes have 2000MB/second available between them. When using the version 2BYNET, the same 100 nodes communicate at 12,000MB/second (60MB/second * 2 BYNETs * 100 nodes).

The second and equally important aspect of the BYNET uses the two connections for availability. Regardless of the speedassociated with each BYNET connection, if one of the connections should fail, the second is completely independent andcan continue to function at its individual speed without the other connection. Therefore, communications continue to passbetween all nodes.

 Although the BYNET is performing at half the capacity during an outage, it is still operational and SQL is able to completewithout failing. In reality, when the BYNET is performing at only 10MB/second per node, it is still a lot faster than manynormal networks that typically transfer messages at 10MB per second.

 All messages going across the BYNET offer guaranteed delivery. So, any messages not successfully delivered because of a failure on one connection automatically route across the other connection. Since half of the BYNET is not working, thebandwidth reduces by half. However, when the failed connection is returned to service, its topology is automatically

configured back into service and it begins transferring messages along with the other connection. Once this occurs, thecapacity returns to normal.

A Teradata Database

Within Teradata, a database is a storage location for database objects (tables, views, macros, and triggers). Anadministrator can use Data Definition Language (DDL) to establish a database by using a CREATE DATABASE command.

 A database may have PERMANENT (PERM) space allocated to it. This PERM space establishes the maximum amount of disk space for storing user data rows in any table located in the database. However, if no tables are stored within adatabase, it is not required to have PERM space. Although a database without PERM space cannot store tables, it canstore views and macros because they are physically stored in the Data Dictionary (DD) PERM space and require no user storage space. The DD is in a "database" called DBC.

Teradata allocates PERM space to tables, up to the maximum, as rows are inserted. The space is not pre-allocated.Instead, it is allocated, as rows are stored in blocks on disk. The maximum block size is defined either at a system level inthe DBS Control Record, at the database level or individually for each table. Like PERM, the block size is a maximum size.Yet, it is only a maximum for blocks that contain multiple rows. By nature, the blocks are variable in length. So, disk spaceis not pre-allocated; instead, it is allocated on an as needed basis, one sector (512 bytes) at a time. Therefore, the largestpossible wasted disk space in a block is 511 bytes.

 A database can also have SPOOL space associated with it. All users who run queries need workspace at some point intime. This SPOOL space is workspace used for the temporary storage of rows during the execution of user SQLstatements. Like PERM space, SPOOL is defined as a maximum amount that can be used within a database or by a user.Since PERM is not pre-allocated, unused PERM space is automatically available for use as SPOOL. This maximizes thedisk space throughout the system.

It is a common practice in Teradata to have some databases with PERM space that contain only tables. Then, other databases contain only views. These view databases require no PERM space and are the only databases that users haveprivileges to access. The views in these databases control all access to the real tables in other databases. They insulatethe actual tables from user access. There will be more on views later in this book.

The newest type of space allocation within Teradata is TEMPORARY (TEMP) space. A database may or may not haveTEMP space, however, it is required if Global Temporary Tables are used. The use of temporary tables is also covered inmore detail later in the SQL portion of this book.

 A database is defined using a series of parameter values at creation time. The majority of the parameters can easily bechanged after a database has been created using the MODIFY DATABASE command. However, when attempting toincrease PERM or TEMP space maximums, there must be sufficient disk space available even though it is not immediately

allocated. There may not be more PERM space defined that actual disk on the system.

 A number of additional database parameters are listed below along with the user parameters in the next section. These

eradata SQL: Unleash the Power 

Reprinted for OET7P/[email protected], Accenture Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Page 5 / 10

Page 6: 5558 Chapter 1 Teradata Parallel Architecture

7/31/2019 5558 Chapter 1 Teradata Parallel Architecture

http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 6/10

parameters are tools for the database administrator and other experienced users when establishing databases for tablesand views.

CREATE / MODIFY DATABASE Parameters 

n PERMANENT

n TEMPORARY

n SPOOL

n  ACCOUNT

n FALLBACK

n JOURNAL

n DEFAULT JOURNAL

Teradata Users

In Teradata, a user is the same as a database with one exception. A user is able to logon to the system and a databasecannot. Therefore, to authenticate the user, a password must be established. The password is normally established at thesame time that the CREATE USER statement is executed. The password can also be changed using a MODIFY USERcommand.

Like a database, a user area can contain database objects (tables, views, macros and triggers). A user can have PERMand TEMP space and can also have spool space. On the other hand, a user might not have any of these types of space,exactly the same as a database.

The biggest difference between a database and a user is that a user must have a password. This similarity between thetwo makes administering the system easier and allows for default values that all databases and users can inherit.

The next two lists regard the creation and modification of databases and users.

{ CREATE | MODIFY } DATABASE or USER (in common)  

n PERMANENT

n TEMPORARY

n SPOOL

n  ACCOUNT

n FALLBACK

n JOURNAL

n DEFAULT JOURNAL

{ CREATE | MODIFY } USER (only) 

n PASSWORD

n STARTUP

n DEFAULT DATABASE

By no means are these all of the parameters. It is not the intent of this chapter, nor the intent of this book to teach databaseadministration. There are reference manuals and courses available to use. Teradata administration warrants a book byitself.

Symbols Used in this Book

eradata SQL: Unleash the Power 

Reprinted for OET7P/[email protected], Accenture Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Page 6 / 10

Page 7: 5558 Chapter 1 Teradata Parallel Architecture

7/31/2019 5558 Chapter 1 Teradata Parallel Architecture

http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 7/10

Since there are no standard symbols for teaching SQL, it is necessary to understand some of the symbols used in our syntax diagrams throughout this book.

Figure 1-1  

DATABASE Command

When users negotiate a successful logon to Teradata, they are automatically positioned in a default database as definedby the database administrator. When an SQL request is executed, by default, it looks in the current database for all

referenced objects.

There may be times when the object is not in the current database. When this happens, the user has one of two choices toresolve this situation. One solution is to qualify the name of the object along with the name of the database in which itresides. To do this, the user simply associates the database name to the object name by connecting them with a period (.)or dot as shown below:

<database-name>.<table-name>

The second solution is to use the database command. It repositions the user to the specified database. After the databasecommand is executed, there is no longer a need to qualify the objects in that database. Of course, if the SQL statementreferences additional objects in another database, they will have to be qualified in order for the system to locate them.Normally, you will DATABASE to the database that contains most of the objects that you need. Therefore it reduces thenumber of object names requiring qualification.

The following is the syntax for the DATABASE command.

DATABASE <database-name>

;

If you are not sure what database you are in, either the HELP SESSION or SELECT DATABASE command may be used tomake that determination. These commands and other HELP functions are covered in the SQL portion of this book.

Use of an Index

 Although a relational data model uses Primary Keys and Foreign Keys to establish the relationships between tables, thatdesign is a Logical Model. Each vendor uses specialized techniques to implement a Physical Model. Teradata does notuse keys in its physical model. Instead, Teradata is implemented using indices, both primary and secondary.

The Primary Index (PI) is the most important index in all of Teradata. The performance of Teradata can be linked directly tothe selection of this index. The data value in the PI column(s) is submitted to the hashing function. The resulting row hashvalue is used to map the row to a specific AMP for data distribution and storage.

To illustrate this concept, I have on several occasions used two decks of cards. Imagine if you will, fourteen people in aroom. To the largest, most powerful looking man in the room, you give one of the decks of cards. His large hands allow himto hold all fifty-two cards at one time, with some degree of success. The cards are arranged with the ace of spadescontinuing through the king of spades in ascending order. After the spades, are the hearts, then the clubs and last, thediamonds. Each suit is arranged starting with the ace and ascending up to the king. The cards are partitioned by suit.

The other deck of cards is divided among the other thirteen people. Using this procedure, all cards with the same value(i.e. aces) all go to the same person. Likewise, all the deuces, treys and subsequent cards each go to one of the thirteen

people. Each of the four cards will be in the same order as the suits contained in the single deck that went to the lone man:spades, hearts, clubs and diamonds. Once all the cards have been distributed, each of the thirteen people will be holdingfour cards of the same value (4*13=52). Now, the game can begin.

eradata SQL: Unleash the Power 

Reprinted for OET7P/[email protected], Accenture Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Page 7 / 10

Page 8: 5558 Chapter 1 Teradata Parallel Architecture

7/31/2019 5558 Chapter 1 Teradata Parallel Architecture

http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 8/10

The requests in this game come in the form of "give-me," one or more cards.

To make it easy for the lone player, we first request: give-me the ace of spades. The person with four aces finds their ace,as does the lone player with all 52 cards, both on the top other their cards. That was easy!

 As the difficulty of the give-me requests increase, the level of difficulty dramatically increases for the lone man. For instance, when the give-me request is for all of the twos, one of the thirteen people holds up all four of their cards and they

are done. The lone man must locate the 2 of spades between the ace and trey. Then, go and locate the 2 of hearts,thirteen cards later between the ace and trey. Then, find the 2 of clubs, thirteen cards after that, as well as the 2 of diamonds, thirteen cards after that to finally complete the request.

 Another request might be give-me all of the diamonds. For the thirteen people, each person locates and holds up one cardof their cards and the request is finished. For the lone person with the single deck, the request means finding and holdingup the last thirteen cards in their deck of fifty-two. In each of these give-me requests, the lone man had to negotiate all fiftytwo cards while the thirteen other people only needed to determine which of the four cards applied to the request, if any.This is the same procedure used by Teradata. It divides up the data like we divided up the cards.

 As illustrated, the thirteen people are faster than the lone man. However, the game is not limited to thirteen players. If therewere 26 people who wished to play on the same team, the cards simply need to be divided or distributed differently.

When using the value (ace through king) there are only 13 unique values. In order for 26 people to play, we need a way tocome up with 26 unique values for 26 people. To make the cards more unique, we might combine the value of the card (i.e.ace) with the color. Therefore, we have two red aces and two black aces as well as two sets for every other card. Nowwhen we distribute the cards, each of the twenty-six people receives only two cards instead of the original four. Thedistribution is still based on fifty-two cards (2 times 26).

 At the same time, 26 people is not the optimum number for the game. Based on what has been discussed so far, what isthe optimum number of people?

If your answer is 52, then you are absolutely correct.

With this many people, each person has one and only one card. Any time a give-me is requested of the participants, their one card either qualifies or it does not. It doesn't get any simpler or faster than this situation.

 As easy as this sounds, to accomplish this distribution the value of the card alone is not sufficient to manifest 52 uniquevalues. Neither is using the value and the color. That combination only gives us a distribution of 26 unique values when 52unique values are desired.

To achieve this distribution we need to establish still more uniqueness. Fortunately, we can use the suit along with thevalue. Therefore, the ace of spades is different than the ace of hearts, which is different from the ace of clubs and the aceof diamonds. In other words, there are now 52 unique identities to use for distribution.

To relate this distribution to Teradata, one or more columns of a table are chosen to be the Primary Index.

Primary Index

The Primary Index can consist of up to sixteen different columns. These columns, when considered together, provide acomprehensive technique to derive a Unique Primary Index (UPI, pronounced as "you-pea") value as we discussedpreviously regarding the card analogy. That is the good news.

To store the data, the value(s) in the PI are hashed via a calculation to determine which AMP will own the data. The samedata values always hash the same row hash and therefore are always associated with the same AMP.

The advantage to using up to sixteen columns is that row distribution is very smooth or evenly based on unique values.This simply means that each AMP contains the same number of rows. At the same time, there is a downside to usingseveral columns for a PI. The PE needs every data value for each column as input to the hashing calculation to directlyaccess a particular row. If a single column value is missing, a full table scan will result because the row hash cannot berecreated. Any row retrieval using the PI column(s) is always an efficient, one AMP operation.

 Although uniqueness is good in most cases, Teradata does not require that a UPI be used. It also allows for a Non-UniquePrimary Index (NUPI, pronounced as new-pea). The potential downside of a NUPI is that if several duplicate values (NUPIdups) are stored, they all go to the same AMP. This can cause an uneven distribution that places more rows on some of 

eradata SQL: Unleash the Power 

Reprinted for OET7P/[email protected], Accenture Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Page 8 / 10

Page 9: 5558 Chapter 1 Teradata Parallel Architecture

7/31/2019 5558 Chapter 1 Teradata Parallel Architecture

http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 9/10

the AMPs than on others. This means that any time an AMP with a larger number of rows is involved, it has to work harder than the other AMPs. The other AMPs will finish before the slower AMP. The time to process a single user request isalways based on the slowest AMP. Therefore, serious consideration should be used when making the decision to use aNUPI.

Every table must have a PI and it is established when the table is created. If the CREATE TABLE statement contains:UNIQUE PRIMARY INDEX( <column-list> ), the value in the column(s) will be distributed to an AMP as a UPI. However, if 

the statement reads: PRIMARY INDEX ( <column-list> ), the value in the column(s) will be distributed as a NUPI and allowduplicate values. Again, all the same values will go to the same AMP.

If the DDL statement does not specify a PI, but it specifies a PRIMARY KEY (PK), the named column(s) are used as theUPI. Although Teradata does not use primary keys, the DDL may be ported from another vendor's database system.

 A UPI is used because a primary key must be unique and cannot be null. By default, both UPIs and NUPIs allow a nullvalue to be stored unless the column definition indicates that null values are not allowed using a NOT NULL constraint.

Now, with that being said, when considering JOIN accesses on the tables, sometimes it is advantageous to use a NUPI.This is because the rows being joined between tables must be on the same AMP. If they are not on the same AMP, one of the rows must be moved to the same AMP as the matching row. Teradata will use one of two different strategies totemporarily move rows. It can copy all needed rows to all AMPs or it can redistribute them using the hashing mechanism on

the column defined as the join domain that is a PI. However, if neither join column is a PI, it might be necessary toredistribute all participating rows from both tables by hash code to get them together on a single AMP.

Planning data distribution, using access characteristics, can reduce the amount of data movement and therefore improve join performance. This works fine as long as there is a consistent number of duplicate values or only a small number of duplicate values. The logical data model needs to be extended with usage information in order to know the best way todistribute the data rows. This is done during the physical implementation phase before creating tables.

Secondary Index

 A Secondary Index (SI) is used in Teradata as a way to directly access rows in the data, sometimes called the base table,without requiring the use of PI values. Unlike the PI, an SI does not effect the distribution of the data rows. Instead, it is analternate read path and allows for a method to locate the PI value using the SI. Once the PI is obtained, the row can be

directly accessed using the PI. Like the PI, an SI can consist of up to 16 columns.

In order for an SI to retrieve the data row by way of the PI, it must store and retrieve an index row. To accomplish thisTeradata creates, maintains and uses a subtable. The PI of the subtable is the value in the column(s) that are defined asthe SI. The "data" stored in the subtable row is the previously hashed value of the real PI for the data row or rows in thebase table. The SI is a pointer to the real data row desired by the request. An SI can also be unique (USI, pronounced asyou-sea) or non-unique (NUSI, pronounced as new-sea).

The rows of the subtable contain the row hashed value of the SI, the actual data value(s) of the SI, and the row hashedvalue of the PI as the row ID. Once the row ID of the PI is obtained from the subtable row, using the hashed value of the SI,the last step is to get the actual data row from the AMP where it is stored. The action and hashing for an SI is exactly thesame as when starting with a PI. When using a USI, the access of the subtable is a one AMP operation and then accessingthe data row from the base table is another one AMP operation. Therefore, USI accesses are always a two AMP operation

based on two separate row hash operations.

When using a NUSI, the subtable access is always an all AMP operation. Since the data is distributed by the PI, NUSIduplicate values may exist and probably do exist on multiple AMPs. So, the best plan is to go to all AMPs and check for therequested NUSI value.

To make this more efficient, each AMP scans its subtable. These subtable rows contain the row hash of the NUSI, thevalue of the data that created the NUSI and one or more row IDs for all the PI rows on that AMP. This is still a fastoperation because these rows are quite small and several are stored in a single block. If the AMP determines that itcontains no rows for the value of the NUSI requested, it is finished with its portion of the request. However, if an AMP hasone or more rows with the NUSI value requested, it then goes and retrieves the data rows into spool space using the index.

With this said, the SQL optimizer may decide that there are too many base table data rows to make index access efficient.

When this happens, the AMPs will do a full base table scan to locate the data rows and ignore the NUSI. This situation iscalled a weakly selective NUSI. Even using old-fashioned indexed sequential files, it has always been more efficient toread the entire file and not use an index if more than 15% of the records were needed. This is compounded with Teradata

eradata SQL: Unleash the Power 

Reprinted for OET7P/[email protected], Accenture Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Page 9 / 10

Page 10: 5558 Chapter 1 Teradata Parallel Architecture

7/31/2019 5558 Chapter 1 Teradata Parallel Architecture

http://slidepdf.com/reader/full/5558-chapter-1-teradata-parallel-architecture 10/10

because the "file" is read in parallel instead of all data from a single file. So, the efficiency percentage is probably closer tobeing less than 3% of all the rows in order to use the NUSI.

If the SQL does not use a NUSI, you should consider dropping it, due to the fact that the subtable takes up PERM spacewith no benefit to the users. The Teradata EXPLAIN is covered in this book and it is the easiest way to determine if your SQL is using a NUSI. Furthermore, the optimizer will never use a NUSI without STATISTICS.

There has been another evolution in the use of NUSI processing. It is called NUSI Bitmapping. This means that if a tablehas two different NUSI indices and individually they are weakly selective, but together they can be bitmapped together toeliminate most of the non-conforming rows; it will use the two different NUSI columns together because they become highlyselective. Therefore, many times, it is better to use smaller individual NUSI indices instead of a large composite (more thanone column) NUSI.

There is another feature related to NUSI processing that can improve access time when a value range comparison isrequested. When using hash values, it is impossible to determine any value within the range. This is because large datavalues can generate small hash values and small data values can produce large hash values. So, to overcome the issueassociated with a hashed value, there is a range feature called Value Ordered NUSIs. At this time, it may only be used witha four byte or smaller numeric data column. Based on its functionality, a Value Ordered NUSI is perfect for date processing.See the DDL chapter in this book for more details on USI and NUSI usage.

eradata SQL: Unleash the Power 

Reprinted for OET7P/[email protected], Accenture Coffing Data Warehousing, Coffing Publishing (c) 2001, Copying Prohibited

Page 10 / 10