skew stuff

26
What is the meaning of skewness in teradata? When do we use nullif function in teradata? Skewness: Skewness is the statistical term, which refers to the row distribution on AMPs. If the data is highly skewed, it means some AMPs are having more rows and some very less i.e. data is not properly/evenly distributed. This affects the performance/Teradata's parallelism. The data distribution or skewness can be controlled by choosing indexes. When we choose wrong primary index the data unevenly distributed across all the amps means some amps have more records and some amps had less records this is called skewness. Percentage of skewness is called skew factor.30% skew factor is acceptable. We use nullif when we want to return a null value when the expression in that statement is true. In Teradata Administrator you can simply right-click the table and select "Space Summary". How will you avoid skewness ? Data or Amp skew occurs in teradata due to uneven distribution of data across all the amps. Often, this leads to spool space error too. To avoid skewness, try to select a Primary Index which has as many unique values as possible. PI columns like month, day, etc. will have very few unique values. So during data distribution only a few amps will hold

Upload: abhiora24

Post on 18-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

book

TRANSCRIPT

Page 1: Skew Stuff

What is the meaning of skewness in teradata? When do we use nullif function in teradata?

Skewness:

Skewness is the statistical term, which refers to the row distribution on AMPs. If the data is highly skewed, it means some AMPs are having more rows and some very less i.e. data is not properly/evenly distributed.

This affects the performance/Teradata's parallelism. The data distribution or skewness can be controlled by choosing indexes.

When we choose wrong primary index the data unevenly distributed across all the amps means some amps have more records and some amps had less records this is called skewness.Percentage of skewness is called skew factor.30% skew factor is acceptable.

We use nullif when we want to return a null value when the expression in that statement is true.

In Teradata Administrator you can simply right-click the table and select "Space Summary".

How will you avoid skewness ?

Data or Amp skew occurs in teradata due to uneven distribution of data across all the amps. Often, this leads to spool space error too. To avoid skewness, try to select a Primary Index which has as many unique values as possible.

PI columns like month, day, etc. will have very few unique values. So during data distribution only a few amps will hold all the data resulting in skew. If a column (or a combination of columns) is chosen a PI which enforces uniqueness on the table, then the data distribution will be even and the data will not be skewed.

Typical reasons of skewness :

(1) Skewed tables : Bad choice of PI, Skewed data (2) Bad execution plans (typically skewed redistributions)(3) Bad data model (normalization, data types,PI, etc.)(4) Missing or stale statistics(5) Too many joins (break up the query!)(6) Hash collision (load time problem)

Page 2: Skew Stuff

How to find the skew factor for a Teradata table (Or) Query to find SKEW FACTOR of a particular table :

When Teradata tables are not distributed evenly this can cause significant performance problems. Storage space is also not utilized properly.

For that reason identifying and monitoring tables that are not distributed evenly across AMPs is very important when working with Teradata databases.

The following query will reveal the skew factor for the SALES and CUSTOMERS tables from a Teradata table.

SELECT DatabaseName,TableName,SUM(CurrentPerm) AS CurrentPerm ,SUM(PeakPerm) AS PeakPerm,(100 - (AVG(CurrentPerm)/MAX(CurrentPerm)*100)) AS SkewFactor FROM DBC.TableSize WHERE databasename = database AND TableName IN ('SALES' , 'CUSTOMERS') GROUP BY 1,2 ORDER BY SkewFactor DESC;

Query to find SKEW FACTOR of a particular table :

SELECT TABLENAME,SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM,(100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR FROM DBC.TABLESIZE WHERE DATABASENAME= <DATABASENAME> AND TABLENAME =<TABLENAME> GROUP BY 1;(or)select databasename,tablename,cast((100 -(AVG(a.CurrentPerm)/MAX(a.CurrentPerm)*100)) as integer) AS SkewFactorfrom dbc.tablesize a where databasename='DatabaseName' and tablename ='TableName' group by 1,2 order by 1;

FINDING SKEW FACTOR FOR TABLE LEVEL :

SELECT databasename,tablename,100 * (1 - (AVG(currentperm) / MAX(currentperm))) skew_factor FROM dbc.tablesize where databasename='TEMP_DB'GROUP BY 1,2 ORDER BY 1,2;

To Find available and used space in Teradata (GB) :

select DB.OWNERNAME,DS.databasename,cast(sum(maxperm)/1024/1024/1024 as decimal (7,1)) MaxPerm_g,cast(sum(currentperm)/1024/1024/1024 as decimal (7,1)) CurrPerm_g

Page 3: Skew Stuff

,cast(sum(maxperm)/1024/1024/1024-sum(currentperm)/1024/1024/1024 as decimal (7,1)) FreePerm_g,cast(sum(currentperm)*100/nullifzero(sum(maxperm))as decimal (5,2)) PctUsed,sum(maxperm) MaxPerm,sum(currentperm) CurrPerm,sum(maxperm)- sum(currentperm) FreePerm ,db.commentstring from dbc.diskspace DS, DBC.DATABASES DB where db.databasename like '%IDR%' AND DS.DATABASENAME = DB.DATABASENAME group by 1,2,db.commentstringorder by 1,sum(maxperm)- sum(currentperm) desc,sum(maxperm) desc ,ds.databasename;);

Note: To fix skew problems, you need to change the table’s primary index.For an empty table, you can use ALTER TABLE:

ALTER TABLE orders MODIFY PRIMARY INDEX (ORDER_NUM);

For already populated table you might need to recreate the table.

You can use the following sequence:

RENAME TABLE orders TO orders_tmp;

Step 1. Create an empty table

CREATE TABLE orders_tmp AS (SELECT * FROM orders) WITH NO DATA;

Step 2. Alter the primary index for the new empty table

ALTER TABLE orders_tmp MODIFY PRIMARY INDEX (ORDER_NUM);

Step 3. Copy data from the skewed table

INSERT INTO orders_tmp SELECT * FROM orders;

Step 4. Remove the old table and rename the new table

RENAME TABLE orders TO orders_skewed;RENAME TABLE orders_tmp TO orders;

You might want to keep the old table for a while and after you are sure you are not going to need it, drop it:

DROP TABLE orders_skewed;

Page 4: Skew Stuff

Select * from employee1,employee2 where employee2.EMPLOYEE_ID=employee1.EMPLOYEE_ID and employee2.EMPLOYEE_ID=

Skew factor is tell the distribution of rows if uniformally distribution i.e skew is the zeroif the skew factor is reverse of parllel efficence

Finding Skewed Tables in Teradata

Skewed Tables in Teradata :

Teradata distributes its data for a given table based on its Primary Index and Primary Index alone .If this Primary Index is not selected appropriately it can cause performance bottle necks.

Following are general parameters we should consider while creating Primary Indexes 1:Access Path 2:Volatility 3: Data Distribution Volatility is usually not an issue in a well designed database .That is to say we do not expect update clauses updating the primary index itself. This usually leaves us with data distribution and Access Path.

Access Path implies particular column ( set of columns ) are always used in join conditions.Advantage of using these columns as PI is that it will avoid redistribution of data during join .( One of the most expensive operation for teradata ) and therefore can reduce usage of Spool files.

Data distribution implies, Data is evenly distributed amongst all amps of teradata. This will ensure that all amps would be doing same amount of work. In case of Skewed distribution one amp will end up doing most of the work this can increase execution time considerable.Following Queries can be used to determine table skeweness.

SELECT DatabaseName,TableName,SUM(CurrentPerm) AS CurrentPerm,(SUM(PeakPerm) AS PeakPerm,100 - (AVG(CurrentPerm)/MAX(CurrentPerm)*100)) AS SkewFactorFROM DBC.TableSize WHERE TABLENAME =:TABLENAME GROUP BY 1,2;

However tables which have very few rows (esp. lesser than amps) are inherently skewed and cannot be changed. So we should take into account size of table also while considering skeweness.

Page 5: Skew Stuff

Query can be further customized to have better details :

SELECT DATABASENAME, TABLENAME,SKEW_CATEGORY FROM(,SELECT DATABASENAME,TABLENAME,(MAX(CURRENTPERM)-AVG(CURRENTPERM))/MAX(CURRENTPERM) *100 SKEW_FACTOR,SUM(CURRENTPERM) /(1024*1024*1024) ACTUAL_SPACECASE WHEN SKEW_FACTOR >90 THEN 'EXTREMELY SKEWED,T0 BE RECTIFIED IMMEDIATELY'WHEN SKEW_FACTOR >70 THEN 'HIGHLY SKEWED'WHEN SKEW_FACTOR >50 THEN 'SKEWED TABLE'WHEN SKEW_FACTOR >40 THEN 'SLIGHTLY SKEWED'WHEN SKEW_FACTOR >30 AND ACTUAL_SPACE >.5 THEN 'SLIGHTLY SKEWED'ELSE 'ACCEPTABLE'END as SKEW_CATEGORY FROM DBC.TABLESIZE WHERE DATABASENAME in ('DATABASENAME') GROUP BY 1,2 )x ORDER BY SKEW_FACTOR DESC;

To find distribution of data amongst amps, we can use hash functions as follows

SELECT HASHAMP(HASHBUCKET(HASHROW( Column ))),COUNT(*) FROM DATABASENAME.TABLENAME GROUP BY 1;

This factor can further be exploited to find distribution for primary index as follows :

SEL 'ColumnName ' , (MAX(CN)-AVG(CN))/MAX(CN)*100 FROM ( SEL VPROC ,COUNT(*) CN FROM DATABASENAME.TABLENAME RIGHT OUTER JOIN (SEL VPROC FROM DBC.TABLESIZE GROUP BY 1 )X ON VPROC=HASHAMP(HASHBUCKET(HASHROW( ColumnName ))) GROUP BY 1) CGROUP BY 1 ;

This query can also be used to check which singular column can be best fitted for primary index on sole criteria of data distribution.

Following query is a generic query which would create SQL statements to check distribution for that column amongst amps.

SEL' SEL ' || '27'xc || COLUMNNAME || '27'xc || ' , (MAX(CN)-AVG(CN))/MAX(CN)*100FROM ( SEL VPROC ,COUNT(*) CN FROM' ' || DATABASENAME || '.' ||TABLENAME ||RIGHT OUTER JOIN (SEL VPROC FROM DBC.TABLESIZE GROUP BY 1)X ON VPROC=HASHAMP(HASHBUCKET(HASHROW( '|| COLUMNNAME || ' )))GROUP BY 1) C GROUP BY 1

Page 6: Skew Stuff

UNION ALL ' Title FROM DBC.COLUMNS WHERE TABLENAME='’AND DATABASENAME='’ ;

What does skew metric mean?

You can see this word "Skewness" or "Skew factor" in a lot of places regarding Teradata: documents, applications, etc. Skewed table, skewed cpu.It is something wrong, but what does it explicitly mean? How to interpret it?

Teradata is a massive parallel system, where uniform units (AMPs) do the same tasks on that data parcel they are responsible for. In an ideal world all AMPs share the work equally, no one must work more than the average. The reality is far more cold, it is a rare situation when this equality (called "even distribution") exists.

It is obvious that uneven distribution will cause wrong efficiency of using the parallel infrastructure.

But how bad is the situation? Exactly that is what Skewness characterizes.

Let "RESOURCE" mean the amount of resource (CPU, I/O, PERM space) consumed by an AMP.

Let AMPno is the number of AMPs in the Teradata system.

Skew factor := 100 - ( AVG ( "RESOURCE" ) / NULLIFZERO ( MAX ("RESOURCE") ) * 100 )

Total[Resource] := SUM("RESOURCE")

Impact[Resource] := MAX("RESOURCE") * AMPno

Parallel Efficiency := Total[Resource] / Impact[Resource] * 100

or with some transformation:

Parallel Efficiency := 100 - Skew factor

Analysis

Codomain

Page 7: Skew Stuff

0 <= "Skew factor" < 100

"Total[Resource]" <= "Impact[Resource]"

0<"Parallel Efficiency"<=100

Meaning

Skew factor : This percent of the consumed real resources are wastedEg. an 1Gbytes table with skew factor of 75 will allocate 4Gbytes*

Total[Resource] : Virtual resource consumption, single sum of individual resource consumptions , measured on AMPs as independent systems

Impact[Resource] : Real resource consumption impacted on the parallel infrastructure

Parallel Efficiency : As it says. Eg. Skew=80: 20%

Note : Theoretically if there is/are complementary characteristics resource allocation (consumes that less resources on that AMP where my load has excess) that can compensate the parallel inefficiency from system point of view, but the probability of it tends to zero.

(http://teradatatuning.blogspot.in/2013/12/interpreting-skewness.html)

Page 8: Skew Stuff

How many types of Skew exist?

If you utilized unequally TD resources (CPU,AMP,IO,Disk and etc) this is called skew exists.

Major are 3 types of skews (CPU skew, AMP/Data skew, IO Skew).

-Data skew?When data is not distributed equally on all the AMPs.

-Cpu skew?Who is taking/consuming more CPU called cpu skew.

-IO skew?Who perform more IO Operation? Resulting in IO Skew

Explain about Skew Factor?

The data distribution of table among AMPs is called Skew Factor . Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same row hash so all the same data will come to same amp, it makes data distribution inequality,One amp will store more data and other amp stores less amount of data, when we are accessing full table,The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage In this situation (unequal distribution of data)we get Skew Factor HighFor this type of tables we should avoid full table scans.

Ex:

AMP0 AMP110000(10%) 9000000(90%)

In this situation skew factor is very high 90%

How to "re-parallelize" skewed joins ?

Case descriptionAssume that we have 1M customers, 4M transactions and our top customer produce the 2.5% of all transactions. Others produce the remaining 97.5% of transactions approx. evenly.Scroll down to the bottom of the post for sample table and data generator SQL. 

Our task is to join a "Customer" and a "Transaction" tables on Customer_id.

Page 9: Skew Stuff

The joinSELECT Customer_name, count(*)FROM Customer cJOIN Transact t ON c.Customer_id = t.Customer_idGROUP BY 1;

We experience a pretty slow execution. On the ViewPoint we see that only one AMP is working, while others are not.

What is the problem?There are two  separate subsets of the Transact table from "joinability" point of view: "Peak" part (records of top customer(s)) Very few customers have very much Transact records. Product join would be cost effective "Even" part (records of other customers) Much customers have much, but specifically evenly few Transact records. Merge join would be ideal.Unfortunately Optimizer have to decide, only one operation type can be chosen. It will choose merge join which consumes far less CPU time.

Execution plan looks like this:

 This query is optimized using type 2 profile T2_Linux64, profileid 21.  1) First, we lock a distinct D_DB_TMP."pseudo table" for read on a     RowHash to prevent global deadlock for D_DB_TMP.t.  2) Next, we lock a distinct D_DB_TMP."pseudo table" for read on a     RowHash to prevent global deadlock for D_DB_TMP.c.  3) We lock D_DB_TMP.t for read, and we lock D_DB_TMP.c for read.  4) We do an all-AMPs RETRIEVE step from D_DB_TMP.t by way of an     all-rows scan with a condition of ("NOT (D_DB_TMP.t.Customer_ID IS     NULL)") into Spool 4 (all_amps), which is redistributed by the     hash code of (D_DB_TMP.t.Customer_ID) to all AMPs.  Then we do a     SORT to order Spool 4 by row hash.  The size of Spool 4 is     estimated with low confidence to be 125 rows (2,125 bytes).  The     estimated time for this step is 0.01 seconds.  5) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of a     RowHash match scan, which is joined to D_DB_TMP.c by way of a     RowHash match scan.  Spool 4 and D_DB_TMP.c are joined using a     merge join, with a join condition of ("D_DB_TMP.c.Customer_ID =     Customer_ID").  The result goes into Spool 3 (all_amps), which is     built locally on the AMPs.  The size of Spool 3 is estimated with

Page 10: Skew Stuff

     index join confidence to be 125 rows (10,375 bytes).  The     estimated time for this step is 0.02 seconds.  6) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by     way of an all-rows scan , grouping by field1 (     D_DB_TMP.c.Customer_name).  Aggregate Intermediate Results are     computed globally, then placed in Spool 5.  The size of Spool 5 is     estimated with no confidence to be 94 rows (14,758 bytes).  The     estimated time for this step is 0.02 seconds.  7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of     an all-rows scan into Spool 1 (all_amps), which is built locally     on the AMPs.  The size of Spool 1 is estimated with no confidence     to be 94 rows (8,742 bytes).  The estimated time for this step is     0.02 seconds.  8) Finally, we send out an END TRANSACTION step to all AMPs involved     in processing the request.  -> The contents of Spool 1 are sent back to the user as the result of     statement 1.  The total estimated time is 0.07 seconds.

How to identify

If you experience extremely asymmetric AMP load you can suspect on this case.Find highly skewed JOIN steps in the DBQL (set all logging options on):

select top 50 a.MaxAMPCPUTime * (hashamp()+1) / nullifzero(a.CPUTime) Skw,a.CPUTime,a.MaxAMPCPUTime * (hashamp()+1) CoveringCPUTime,b.* from dbc.dbqlsteptbl ajoin dbc.dbqlogtbl b on a.procid=b.procid and a.queryid=b.queryidwhere  StepName='JIN' and CPUtime > 100and Skw > 2 order by CoveringCPUTime desc; 

(Note: Covering CPU time is <No-of-AMPs> * <Max AMP's CPU time>. Virtually this amount of CPU is consumed because asymmetric load of the system) 

Or if you suspect a specific query, check the demography of the join field(s) in the "big" table:

SELECT TOP 100 <Join_field>, count(*) NbrFROM <Big_table> GROUP BY 1 ORDER BY 2 DESC;

If the top occurences are spectacularly larger than others (or than average) the idea likely matches.

Page 11: Skew Stuff

Solution :

Break the query into two parts: join the top customer(s) separately, and then all others. Finally union the results. (Sometimes additional modification also required if the embedding operation(s) - the group by here - is/are not decomposable on the same parameter.)First we have to identify the top customer(s):

SELECT TOP 5 Customer_id, count(*) NbrFROM Transact GROUP BY 1 ORDER BY 2 DESC;

Customer_id          Nbr------------------------------          345       100004     499873                4     677423                4     187236                4       23482                4      Replace the original query with his one:

SELECT Customer_name, count(*) FROM Customer c JOIN Transact t ON c.Customer_id = t.Customer_id where t.Customer_id in (345)  /*   ID of the top Customer(s).    If more customers are salient, list them, but max ~5*/ GROUP BY 1 UNION ALL SELECT Customer_name, count(*) FROM Customer c JOIN Transact t ON c.Customer_id = t.Customer_id where t.Customer_id not in (345)  -- Same customer(s) GROUP BY 1 ;

Be sure that Customer.Customer_id, Transact.Transact_id and Transact.Customer_id have statistics!This query is more complex, has more steps, scans Transact table 2 times, but runs much faster, you can check it.

But why? And how to determine which "top" customers worth to be handled separately?

Explanation : Calculation :

Page 12: Skew Stuff

Assume that we are on a 125 AMP system.

Customer table contains 1M records with unique ID.

We have ~4.1M records in the Transact table, 100k for the top customer (ID=345), and 4 for each other customers. This matches the 2.5% we assumed above.

If the  Transact table is redistributed on hash(Customer_id) then we will get ~33k records on each AMPs, excluding AMP(hash(345)). Here we'll get ~133k (33k + 100K).That means that this AMP will process ~4x more data than others, therefore runs 4x longer.With other words in 75% of this JOIN step's time 124 AMPs will DO NOTHING with the query.

Moreover the preparation and subsequent steps are problematic also: the JOIN is prepared by a redistribution which produces a strongly skewed spool, and the JOIN's result stays locally on the AMPs being skewed also.

Optimized version :

This query will consume moderately more CPU, but it is distributed evenly across the AMPs, utilizing the Teradata's full parallel capability.It contains a product join also, but is it no problem it joins 1 records to the selected 100k records of Transacts, that will be lightning fast.

All 

Look at the execution plan of the broken-up query:

 This query is optimized using type 2 profile T2_Linux64, profileid 21.  1) First, we lock a distinct D_DB_TMP."pseudo table" for read on a     RowHash to prevent global deadlock for D_DB_TMP.t.  2) Next, we lock a distinct D_DB_TMP."pseudo table" for read on a     RowHash to prevent global deadlock for D_DB_TMP.c.  3) We lock D_DB_TMP.t for read, and we lock D_DB_TMP.c for read.  4) We do a single-AMP RETRIEVE step from D_DB_TMP.c by way of the     unique primary index "D_DB_TMP.c.Customer_ID = 345" with no     residual conditions into Spool 4 (all_amps), which is duplicated     on all AMPs.  The size of Spool 4 is estimated with high     confidence to be 125 rows (10,625 bytes).  The estimated time for

Page 13: Skew Stuff

     this step is 0.01 seconds.  5) We do an all-AMPs JOIN step from Spool 4 (Last Use) by way of an     all-rows scan, which is joined to D_DB_TMP.t by way of an all-rows     scan with a condition of ("D_DB_TMP.t.Customer_ID = 345").  Spool     4 and D_DB_TMP.t are joined using a product join, with a join     condition of ("Customer_ID = D_DB_TMP.t.Customer_ID").  The result     goes into Spool 3 (all_amps), which is built locally on the AMPs.     The size of Spool 3 is estimated with low confidence to be 99,670     rows (8,272,610 bytes).  The estimated time for this step is 0.09     seconds.  6) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by     way of an all-rows scan , grouping by field1 (     D_DB_TMP.c.Customer_name).  Aggregate Intermediate Results are     computed globally, then placed in Spool 5.  The size of Spool 5 is     estimated with no confidence to be 74,753 rows (11,736,221 bytes).     The estimated time for this step is 0.20 seconds.  7) We execute the following steps in parallel.       1) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by          way of an all-rows scan into Spool 1 (all_amps), which is          built locally on the AMPs.  The size of Spool 1 is estimated          with no confidence to be 74,753 rows (22,052,135 bytes).  The          estimated time for this step is 0.02 seconds.       2) We do an all-AMPs RETRIEVE step from D_DB_TMP.t by way of an          all-rows scan with a condition of ("D_DB_TMP.t.Customer_ID <>          3454") into Spool 9 (all_amps), which is redistributed by the          hash code of (D_DB_TMP.t.Customer_ID) to all AMPs.  The size          of Spool 9 is estimated with high confidence to be 4,294,230          rows (73,001,910 bytes).  The estimated time for this step is          1.80 seconds.  8) We do an all-AMPs JOIN step from D_DB_TMP.c by way of an all-rows     scan with a condition of ("D_DB_TMP.c.Customer_ID <> 3454"), which     is joined to Spool 9 (Last Use) by way of an all-rows scan.     D_DB_TMP.c and Spool 9 are joined using a single partition hash     join, with a join condition of ("D_DB_TMP.c.Customer_ID =     Customer_ID").  The result goes into Spool 8 (all_amps), which is     built locally on the AMPs.  The size of Spool 8 is estimated with     low confidence to be 4,294,230 rows (356,421,090 bytes).  The     estimated time for this step is 0.72 seconds.  9) We do an all-AMPs SUM step to aggregate from Spool 8 (Last Use) by     way of an all-rows scan , grouping by field1 (

Page 14: Skew Stuff

     D_DB_TMP.c.Customer_name).  Aggregate Intermediate Results are     computed globally, then placed in Spool 10.  The size of Spool 10     is estimated with no confidence to be 3,220,673 rows (505,645,661     bytes).  The estimated time for this step is 8.46 seconds. 10) We do an all-AMPs RETRIEVE step from Spool 10 (Last Use) by way of     an all-rows scan into Spool 1 (all_amps), which is built locally     on the AMPs.  The size of Spool 1 is estimated with no confidence     to be 3,295,426 rows (972,150,670 bytes).  The estimated time for     this step is 0.32 seconds. 11) Finally, we send out an END TRANSACTION step to all AMPs involved     in processing the request.  -> The contents of Spool 1 are sent back to the user as the result of     statement 1.  The total estimated time is 11.60 seconds.

Find Skewed CPU and I/O in DBQL

Please note that the solutions found in the article works on the DBQL logs, which covers only that users' activity, for whom the logging is switched on.

About Skew

If you are not comfortably familiar with Skewness or DBQL, please read the corresponding posts before:

Teradata Tuning - SkewnessTeradata Tuning - DBQL IAs you can see the difference between Impact[Resource] and Total[Resource] is a net technological loss, which should be minimized.

Skewness is a good indicator for highlight those workload that wastes lots of valuable CPU and I/O resources because of inefficient parallelism.

Find bad queries in DBQL

With this SQL you can filter top impacting queries (replace the date value or maybe you have to adjust the date filtering according to local settings):

select top 50 ProcID

Page 15: Skew Stuff

, QueryID, AMPCPUTime, MaxAMPCPUTime * (hashamp () + 1) CPUImpact, CAST (100 - ((AmpCPUTime / (hashamp () + 1)) * 100 / NULLIFZERO (MaxAMPCPUTime)) AS INTEGER) "CPUSkew%", TotalIOCount, MaxAMPIO * (hashamp () + 1) IOImpact, CAST (100 - ((TotalIOCount / (hashamp () + 1)) * 100 / NULLIFZERO (MaxAMPIO) ) AS INTEGER) "IOSkew%", AMPCPUTime * 1000 / nullifzero (TotalIOCount) LHR, TotalIOCount / nullifzero (AMPCPUTime * 1000) RHL, ParserCPUTime, Queryband, Substr(QueryText,1,2000) QueryTextfrom/* For archived DBQL dbql_arch.dbqlogtbl_hst where logdate=1131201 and ampcputime>0*//* For online DBQL*/ dbc.dbqlogtbl where cast(cast(starttime as char(10)) as date) = '2013-12-18' (date) and ampcputime>0order by CPUImpact desc

Explanation of extra fields:

ParserCPUTime: Time parser spent on producing the execution plan. This can be high if SQL is too complex or too many random AMP sampling has to be done.

LHR/RHL: Larry Higa ( inverse Larry Higa) index. Empirical index that shows the CPU vs I/O rate. By experience it should be usually around one (can be different depending on your system configuration, but is a constant). If it is far from 1, that indicates CPU or I/O dominance, which means unbalanced resource consumption, but it is a different dimension that skew.

QueryBand: Labels that sessions use to identify themselves within the DBQL logs

QueryText: First 200 characters of the query (depending on DBQL log settings)

OK, we've listed the terrible top consumers, but what's next?

Page 16: Skew Stuff

Have to identify those queries. If your ETL and Analytics software is configured to user QueryBand properly (this area deserves a separate post...) , you can find which job or report issued that SQL, anyway, you can see the QueryText field.

If you want to get the full SQL text, select it from the DBQLSQLTbl (SQL logging needs to be switched on), replace the appropriate <procid> and <queryid> values:

select SQLTextInfo from dbc.dbqlsqltbl where procid=<procid> and queryid=<queryid>order by SQLRowNo asc

You will get the SQL in several records, broken up to 30K blocks, simply concatenate them. Unfortunately the SQL will have very ugly make up, you can use PRISE Tuning Assistant to beautify and highlight it for easy reading.

System level Skewness :

We've found those bad queries, nice. But what can we say about the whole system? What is the total parallel efficiency? Can we report how much resources were wasted due to bad parallel efficiency?

The answer is: yes, we can estimate quite closely. The exact value we cannot calculate because DBQL does not log AMP information for the query execution, but the most important metrics.We can not calculate that situation when more skewed queries run the same time, but have peaks on different AMPs. This reduces the system level resource wasting, but is hard to calculate with, however its probability and effect is negligible now.

select sum(AMPCPUTime) AMPCPUTimeSum, sum(MaxAMPCPUTime * (hashamp () + 1)) CPUImpactSum, sum(TotalIOCount) TotalIOCountSum, sum(MaxAMPIO * (hashamp () + 1)) IOImpactSum, cast(100 - (AMPCPUTimeSum / CPUImpactSum) * 100 as integer) "CPUSkew%", cast(100 - (TotalIOCountSum / IOImpactSum) * 100 as integer) "IOSkew%"from/* For archived DBQL dbql_arch.dbqlogtbl_hst where logdate = '2013-12-18' (date) and (ampcputime>0 or TotalIOCount > 0)*//* For online DBQL*/ dbc.dbqlogtbl where

Page 17: Skew Stuff

cast(cast(starttime as char(10)) as date) = '2013-12-18' (date) and (ampcputime>0 or TotalIOCount > 0)

Look at the last two columns. That percent of your CPU and I/O goes to the sink...

OK, let's check how many queries accumulate 5%,10%,25%,50%,75%,90% of this loss?

Here you are (CPU version, transform for I/O implicitly):

select 'How many queries?' as "_",min(limit5) "TOP5%Loss",min(limit10) "TOP10%Loss",min(limit25) "TOP25%Loss",min(limit50) "TOP50%Loss",min(limit75) "TOP75%Loss",min(limit90) "TOP90%Loss", max(rnk) TotalQueries, sum(ResourceTotal) "TotalResource", sum(ResourceImpact) "ImpactResource"from(select case when ResRatio < 5.00 then null else rnk end limit5,case when ResRatio < 10.00 then null else rnk end limit10,case when ResRatio < 25.00 then null else rnk end limit25,case when ResRatio < 50.00 then null else rnk end limit50,case when ResRatio < 75.00 then null else rnk end limit75,case when ResRatio < 90.00 then null else rnk end limit90,rnk, ResourceTotal, ResourceImpactfrom(select sum(ResourceLoss) over (order by ResourceLoss desc ) totalRes, sum(ResourceLoss) over (order by ResourceLoss desc rows unbounded preceding) subtotalRes, subtotalRes *100.00 / totalRes Resratio, sum(1) over (order by ResourceLoss desc rows unbounded preceding) rnk, ResourceTotal, ResourceImpactfrom(select AMPCPUTime ResourceTotal

Page 18: Skew Stuff

, (MaxAMPCPUTime * (hashamp () + 1)) ResourceImpact, ResourceImpact - ResourceTotal ResourceLoss/* For archived DBQLfrom dbql_arch.dbqlogtbl_hst where logdate=1131207and ampcputime>0*//* For online DBQL*/from dbc.dbqlogtbl wherecast(cast(starttime as char(10)) as date) = '2013-12-18' (date)and ampcputime>0) x) y) zgroup by 1

I expect you are a bit shocked now, how few queries waste how much golden resources.I think we will agree that it is worth to tune those dozen of queries, and you save in orders of 100K..MUSD for your company annually, am I right?