relational efficiencies: part i

39
Copyright 2007, Information Builders. Slide 1 Relational Efficiencies: Part I Renee Teatro Information Builders

Upload: tevy

Post on 11-Feb-2016

66 views

Category:

Documents


0 download

DESCRIPTION

Relational Efficiencies: Part I. Renee Teatro Information Builders. Relational Efficiencies Agenda. Optimization Overview JOINs Sorting Aggregation Expressions Direct SQL Passthru. Relational Efficiencies Layers of Processing. User executes a FOCUS request... - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 1

Relational Efficiencies: Part I

Renee TeatroInformation Builders

Page 2: Relational Efficiencies: Part I

Optimization Overview JOINs SortingAggregationExpressionsDirect SQL Passthru

Relational Efficiencies Agenda

Page 3: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 3

User executes a FOCUS request...

FOCUS calls the appropriate module (TABLE, GRAPH)

Reads and parses the MFDParses the requestCalls the Interface with information on

the MFD and the request available inwork areas

Reads and parses the AFDChecks the module specifics for

optimizationAnalyzes and optimizes the requestGenerates SELECT statement(s)Passes SELECT statements

to the appropriate ‘physical’ modulePrepares, allocates, and opens cursor(s)Calls the RDBMS to FETCH rows of data

Analyzes and optimizes the SQL translation of the FOCUS application request

Chooses the appropriate access pathand retrieval method

Retrieves dataCreates the answer set

FOCUSFOCUS Reporting Modules

TABLE

DEFINE

TABLEF

ANALYSE

GRAPH

FRL

JOIN

MATCH FILE

DB2 SQL/DS Teradata Oracle

Data Adapter

SQL Generation and Execution(‘logical layer’ – GNTINT )

SQL modules (‘physical layer’)

DB2 SQL/DS RRSET

TeradataDBTFOC

Oracle ORAFOC

DATA

Report is displayed

FOCUS reads a row from the answer set and processes the remaining actions on that row (IF/WHERE, DEFINEs…)

Puts the valid row into the Internal MatrixReads the next row and repeats the

process until the end of the answer setProcesses the Internal Matrix and

displays the report

The Interface receives a row and/or an SQL status code

Converts non-standard data into FOCUS format, making it available to FOCUS

Asks RDBMS for the next row (FETCH), until the end of the answer set is reached (SQL status code +100)

Sends back a row and/or an SQL status code to the Interface

FOCUS / WebFOCUS

Data Adapter

Relational Efficiencies Layers of Processing

Page 4: Relational Efficiencies: Part I

Interface optimization is the degree to which a TABLE request is translated to SQL

In other words, the process in which the interface translates projection, selection, JOIN, sort, and aggregation operations of a report request into its SQL equivalent and passes it to the RDBMS for processing

TABLE, MODIFY, MAINTAIN SQL Direct SQL Passthru Passing JOINs

Relational Efficiencies RDBMS Optimization

Page 5: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 5

TSOCMSSQL

DB2/SQLDSSQLDBCSQLORASQLMSS

SET OPTIMIZATIONSQLJOIN

ONOFFSQLFOCUS

*

Components Description

Target Database Engine DB2 / SQLDS for DB2SQLDBC for TeradataSQLORA for Oracle

SQLMSS for SQL Server

OUTER

Relational EfficienciesThe Optimization Command

Page 6: Relational Efficiencies: Part I

In the SQL engine SET OPTIMIZATION ON is the default, so the interface attempts to fully optimize all requests

RDBMS is preferred engine for processing More RDBMS processing is usually beneficial

Types of optimization ON – default OFF – Lets FOCUS handle all JOINs, sorts, and

aggregations SQL – pass SQL even if multiplicative effect found FOCUS – pass SQL only if results are identical to

FOCUS processing

Relational EfficienciesOptimization Settings

Page 7: Relational Efficiencies: Part I

SET TRACEON=component//destination Component

SQLDI - FSTRACE - All Interface-RDBMS activitySQLAGGR - FSTRACE3 - Optimization messagesSTMTRACE - FSTRACE4 - SQL onlySQLCALL - commands and data exchange between

the physical and the logical layers of the data adapter Destination

FSTRACE - allocation for the ddname of FSTRACE CLIENT - displays client session to the screen

Relational EfficienciesData Adapter TRACE Facility

Page 8: Relational Efficiencies: Part I

SET XRETRIEVAL=[ON | OFF]ON – the interface sends the request to the RDBMS

and it processes the requestOFF – the interface attempts to optimize the request,

but no RDBMS processing is done

Relational EfficienciesXRETRIEVAL Option

Page 9: Relational Efficiencies: Part I

SET TRACEOFF=ALL SET TRACEUSER=CLIENTSET TRACEON=STMTRACE//FSTRACETABLE FILE EMPLOYEECOUNT EMP_ID BY DEPARTMENTWHERE CURR_SAL GT 10000END

STMTRACE:

SELECT DEPARTMENT, COUNT(*) FROM EMPLOYEEWHERE (CURR_SAL > 10000)GROUP BY DEPARTMENT ORDER BY DEPARTMENT;

Relational EfficienciesFully Optimized Query

Page 10: Relational Efficiencies: Part I

DEFINE FILE EMPDB2 CATEGORY/A4 = IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH';END TABLE FILE EMPDB2 SUM CSAL CATEGORY BY EID END

STMTRACE:SELECT T1.EID,T1.CSAL FROM "PMSSAE"."EMPINFO" T1 ORDER BY T1.EID FOR FETCH ONLY;

SQLAGGR: (FOC2590) AGGREGATION NOT DONE FOR THE FOLLOWING REASON: (FOC2597) USE OF DEFINE FIELD THAT CANNOT BE AGGREGATED: CATEGORY

Relational EfficienciesNon-Optimized Query

Page 11: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 11

Relational EfficienciesNow Optimized Query

SQLAGGR:AGGREGATION DONE ... STMTRACE: SELECT T1.EID, SUM(T1.CSAL) FROM "PMSSAE"."EMPINFO" T1 GROUP BY T1.EID ORDER BY T1.EID FOR FETCH ONLY;

SET TRACEOFF=ALL SET TRACEUSER=CLIENTSET TRACEON=SQLAGGR//CLIENTSET TRACEON=STMTRACE//CLIENT

TABLE FILE EMPDB2

SUM CSAL COMPUTE CATEGORY/A4=IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH';

BY EID

END

Page 12: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 12

Projection and Selection

Page 13: Relational Efficiencies: Part I

Projection is the retrieval of specific columns onlyProjection is always optimized Interface retrieves columns referenced in

Print/sum/count commands Objects of JOINs and DEFINE statements

PRINT * and SEG.fieldname Will return all columns in master file only SELECT * never produced

TABLE FILE EMPLOYEEPRINT *END

SELECT EID, LN, FNFROM EMPLOYEE;

Relational EfficienciesProjection

Page 14: Relational Efficiencies: Part I

A master file can be considered a dynamic RDBMS view.A master can contain:

One or more columns of a relational table Multiple relational tables – called an embedded MFD Real relational views

Main advantages of a master file SQL JOIN syntax hidden from user View not stored in RDBMS catalog Activation of only necessary tables (segments)

Note: Not the case with dynamic JOIN

Relational EfficienciesProjection

Page 15: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 15

Type of Expressions

Expression Components Examples

Arithmetic-Valued Expressions(Expressions that return a single number)

Character String-Valued Expressions(Expressions that return a character string)

Logical Expressions(Expressions that return a single value, True or False)

• Real fields of datatype I, P, D, or F• Numeric constants• Arithmetic operators (+, -, *, / )• Aggregation operators (SUM.,

CNT., AVE., MIN., MAX.)

• Real fields of datatype A• String constants• Concatenation operator (I)• EDIT of alphanumeric fields

• Real fields with any FOCUSdatatype

• Constants of consistent datatype• Relational operator

(EQ, NE, GT, LE ..)• Logical operators (AND, OR, NOT)• Valued expression operands

WHERE TOTAL (AVE.CSAL *0.10) +AVE.CSALGT 55000;

WHERE EDIT(FN, ‘9.$’) |LNEQ ‘J.WANG’ ;

WHERE (CDIV EQ ‘CORP’ OR‘NE’) AND ((CSAL*0.10) +CSAL GT 55000);

Screening conditions on DEFINEd fields, which calculate the above type of expressions, arepassed to the RDBMS.

NOTE

Relational EfficienciesSelection: Translatable Screening

Page 16: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 16

User-written subroutines

IF–THEN–ELSE expressions*** optimized

Self-referential expressions

EDIT for field format conversions

Strong concatenation (II)

DECODE function

Non-SQL relational operators(INCLUDES, EXCLUDES)

FOCUS subroutines(ABS, INT, MAX, MIN, LOG, SQRT)

Expressions using fields withACTUAL=DATE

Expressions using Example

DEFINE... FNL/I3 = ARGLEN(15,LN,FNL);TABLE... IF FNL LE 6

DEFINE... DIVISION/A11=IF CDIV EQ ‘CORP’ THEN ‘CORPORATE’ ELSEIF CDIV EQ ‘NE’ THEN ‘NORTH-EAST’ ELSE ‘NA’;

TABLE... IF DIVISION EQ ‘CORPORATE’ OR ‘NORTH-EAST”

DEFINE... CPT/I2=CPT+1;TABLE... IF CPT NE 0

WHERE EDIT(ID) GT 20

DEFINE... NAME/A27=FN||(‘ ‘ | LN);TABLE... IF NAME EQ ‘DANIEL VALINO’

DEFINE... DEVISION/A11=DECODE CDIV (‘CORP’ ‘CORPORATE’ ‘NE’‘NORTH-EAST’ ELSE ‘NA’);

TABLE... IF DIVISION EQ ‘CORPORATE’ OR ‘NORTH-EAST’

IF LN INCLUDES ‘VALINO’

DEFINE... HDAT2/YYMD=HDAT+365;TABLE... IF HDAT2 GT ‘1990/03/01’

WHERE SQRT(CSAL) GT 260

Relational EfficienciesNon-Translatable Screening Conditions

Page 17: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 17

JOIN Processing

Page 18: Relational Efficiencies: Part I

Interface attempts to generate ONE SELECT statement to JOIN all tables

Applies to dynamic or embedded JOINs One OPEN cursor operation JOIN optimized more readily by RDBMS An optimized JOIN enables sorts and aggregations to be

passed Limits interface <==> RDBMS communications

Relational EfficienciesJOIN Optimization

Page 19: Relational Efficiencies: Part I

If JOIN is not passed to RDBMS Termed: FOCUS-managed JOIN

One SELECT statement for EACH table FOCUS executes a nested loop JOIN

Parent table (HOST) is the outer tableOne inner table OPEN for each row returned from the

outer tableOuter (host) table – choose the one with fewest rows

returned Sorts and aggregations are not passed SQLAGGR/FSTRACE3 displays reason

Relational EfficienciesJOIN Optimization

Page 20: Relational Efficiencies: Part I

JOIN F1SSN IN TABLE1 TO F2SSN IN TABLE2TABLE FILE TABLE1 PRINT F2SSN END Optimized JOIN:SELECT T1.F1SSN,T2.F2SSN FROM "PMSNJC".TABLE1 T1,

PMSNJC.TABLE2 T2 WHERE (T2.F2SSN = T1.F1SSN) FOR FETCH ONLY;

Non-Optimized JOIN:(FOC2510) FOCUS-MANAGED JOIN SELECTED FOR FOLLOWING

REASON(S):SELECT T1.F1SSN FROM "PMSNJC".TABLE1 T1 FOR FETCH ONLY;SELECT T2.F2SSN FROM PMSNJC.TABLE2 T2 WHERE (T2.F2SSN = ?)

FOR FETCH ONLY;

Relational EfficienciesOptimized JOIN vs. Non-Optimized JOIN

Page 21: Relational Efficiencies: Part I

In earlier releases, these types of JOINs disabled optimization: Multiplicative effect encountered for aggregated requests

Termed: Interface-managed native JOIN Check results, FOCUS managed may be more efficient

(SET OPTIMIZATION=OFF) Outer JOIN (SET ALL=ON)

Missing cross-referenced rows are processed RDBMS specific syntax in SQL SELECT statement SQL sqlengine SET SQLJOIN OUTER OFF|ON SET ALL=PASS not supported

WHERE field EQ ‘$*’ OR field IS-MISSING Create HOLD files/JOIN/SET ALL=PASS

Heterogeneous JOIN Differing file types (e.g., flat file, IMS, etc.)

Relational EfficienciesSpecial JOINs

Page 22: Relational Efficiencies: Part I

When a JOIN is not passed to RDBMS, make sure: The KEYS= parameter is defined correctly The JOIN command (unique or non-unique) corresponds

to the AFD KEYS= parameterSome other considerations:

(Over) normalized vs. non-normalized data Ensure referenced tables on same retrieval path Consider use of indices If Interface optimization is disabled, consider choice of

parent table, use of HOLD files JOIN on same data type and length

Relational EfficienciesJOIN Considerations

Page 23: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 23

Sort Processing

Page 24: Relational Efficiencies: Part I

FOCUS BY/ACROSS translated to SQL ORDER BY

Translating sort phrases (BY/ACROSS) to SQL is important Relational sort is usually more efficient RDBMS uses indices Sort enables RDBMS to perform aggregation FOCUS retrieves the answer set in sorted order Reduced I/O since answer set is aggregated

Relational EfficienciesSort Optimization

Page 25: Relational Efficiencies: Part I

FOCUS sort phrases are NOT translated to SQL and Optimization is disabled when:

Optimization was set OFF by user JOINs were not passed to RDBMS (and consequently

Optimization was disabled by the interface) A FOCUS sort phrase uses an FRL command:

BY field ROWS value1 OVER value2... FOR field ROWS value1 OVER value2…

FOCUS sort phrases are not fully translated to SQL and aggregation and optimization is automatically disabled when:

FOCUS BY/ACROSS…IN-GROUPS-OF is requested

Relational EfficienciesSort Optimization

Page 26: Relational Efficiencies: Part I

To get FOCUS Sort phrases translated to SQL Sort on real fields & use COMPUTEs instead of DEFINEs Sorts on most DEFINEd fields are now optimized

Use SQLAGGR/STMTRACE to evaluate if DEFINE fields are being translated. If not, reformulate if possible

With FST. and LST. ensure access file KEYS and KEYORDER parameters are correct

Considerations Consider indexes on sort objects SET OPTIMIZATION OFF/TABLEF/External Sort Consider using TABLEF if sort is passed

Relational EfficienciesSort Optimization

Page 27: Relational Efficiencies: Part I

Use TABLEF when all FOCUS sort phrases are translated to SQL Faster than TABLE Does not generate an internal matrix (FOCSORT) Eliminates FOCUS sorting

You cannot use TABLEF when FOCUS has to process some of the sorting with

ACROSS Direct operators requiring the FOCUS internal matrix (TOT.,

PCT., or RPCT.) COMPUTE expressions using direct operators Multi-verb requests RETYPE

Note: Locks are held with TABLEF until report is complete (commit issued)

Relational EfficienciesUsing TABLEF

Page 28: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 28

Aggregation Processing

Page 29: Relational Efficiencies: Part I

Aggregation translation is important RDBMS aggregation is more efficient: indices An aggregated answer set reduces FOCUS-to-RDBMS

communication A smaller answer set reduces FOCUS local processing

FOCUS SQL

SUM ..., WRITE ... BY field

SUM., CNT., MIN., MAX., AVE.

SELECT SUM(...) GROUP BY columnORDER BY column

SUM(...), COUNT(*), MIN(...), MAX(...), AVG(...)

Relational EfficienciesEfficient Aggregation

Page 30: Relational Efficiencies: Part I

Verbs: SUM, COUNT, WRITE Direct operators: MIN., MAX., AVE. Aggregating DEFINEd fields:

Constant DEFINEd fields translated with CNT.The following defined expressions can be translated

Type of expressions Expression components Examples

Arithmetic Valued(Expressions that return a single number)

Character String Valued(Expressions that return a character string)

• Real fields of datatype I, P, D, or F• Numeric constants• Arithmetic operators (+, -, *, / )

• Real fields of datatype A• String constants• Concatenation operator (I)• EDIT of alphanumeric fields

DEFINE FILE ORAEMPNEW_SAL/D12.2=(CSAL * 0.10)

+ CSAL ;END

DEFINE FILE ORAEMPNAME/A18=EDIT(FN,‘9.$’)|LN;END

Relational EfficienciesTranslatable Aggregation

Page 31: Relational Efficiencies: Part I

Aggregation is not translated to SQL and optimization is automatically disabled when:

Optimization was set off by user JOINs were not passed to RDBMS (and consequently

optimization was disabled by the interface) FOCUS sort phrase is not translated Some screening conditions not passed to RDBMS Some non-SQL operators are used Multi-verb requests COUNT with MISSING=ON

If the verbs PRINT or LIST are used, no aggregation is requested and FSTRACE3 returns the following message:

(FOC2590) AGGREGATION NOT DONE FOR THE FOLLOWING REASON:(FOC2594) AGGREGATION IS NOT APPLICABLE TO THE VERB USED

Relational EfficienciesNon-Translatable Aggregation

NOTE

Page 32: Relational Efficiencies: Part I

Possible index-only processingPossibly aggregate in RDBMS indexExplicit or implicit (e.g., in heading/footing) FST. and LST.

can be optimized using MIN and MAXAggregate on real fieldsUse COMPUTE in place of DEFINECreate aggregated extract files (HOLD files) in cases

where aggregation is not optimized

Relational EfficienciesAggregation Considerations

Page 33: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 33

Virtual Field Processing

Page 34: Relational Efficiencies: Part I

DEFINE fields can be optimized as part of aggregation or record selection

Aggregation or record selection can optimize: Arithmetic-valued expressions Character string-valued expressions Logical expressions (selection only)

Aggregation cannot be optimized for logical expressionsSingle segment DEFINEs passed when JOIN is not IF-THEN-ELSE DEFINEs capable of being passed

Relational EfficienciesVirtual Field (DEFINE) Optimization

Page 35: Relational Efficiencies: Part I

DEFINE FILE EMPINFO SAL_FLAG = IF (CURRENT_SALARY LT 10000) AND (DEPARTMENT_CD EQ 'MIS') THEN 1 ELSE 0; END TABLE FILE EMPINFO PRINT EMP_ID LAST_NAME FIRST_NAME IF SAL_FLAG EQ 1 END

STMTRACE: SELECT T1.EID,T1.LN,T1.FN, T1.DEPARTMENT_CD,T1.CURRENT_SALARY FROM "USER1"."EMPINFO" T1 WHERE ((((T1.CURRENT_SALARY < 10000) AND (T1.DEPARTMENT_CD = 'MIS')))) FOR FETCH ONLY;

Relational EfficienciesIF-THEN-ELSE DEFINE Example – Optimized

Page 36: Relational Efficiencies: Part I

DEFINE FILE EMPDB2CATEGORY/A4 = IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH'; CATEGORY1/I4 = IF CSAL LT 10000 THEN 0 ELSE 1 ; CATEGORY2/D10 = CSAL * 1.3; ENDTABLE FILE EMPDB2SUM CSAL CATEGORY2BY EIDENDSTMTRACE: SELECT T1.EID, SUM(T1.CSAL), SUM((T1.CSAL * 1.3)) FROM "PMSSAE"."EMPINFO" T1 GROUP BY T1.EID ORDER BY T1.EID FOR FETCH ONLY;

Relational EfficienciesAggregation DEFINE Example – Optimized

Page 37: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 37

Aggregation by Expression Allows named expression to be used in ORDER BY clause

DEFINE FILE DB2FILE TABLE FILE DB2FILE TAX = 0.08 * PRICE SUM PRICE TAX END BY TAX NOPRINT

AGGREGATION DONE ... SELECT SK001, SUM(VB001), SUM(VB002) FROM (SELECT (.08 * T1.PRICE) AS SK001,T1.PRICE AS VB001,(.08 * T1.PRICE) AS VB002 FROM USER.DB2FILE T1 ) X GROUP BY SK001 ORDER BY SK001 FOR FETCH ONLY;

In the past: (FOC2597) USE OF DEFINED FIELD THAT CANNOT BE AGGREGATED :TAX

Relational EfficienciesSort Expression Example – Optimized

Page 38: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 38

Relational EfficienciesDirect SQL PassthruBest of both worlds

If the most efficient SQL is not generated or Optimized SQL code already exists

SQL sqlengine {any valid SQL statement} END

SET SQLENGINE=SQLORASQL PREPARE result FOR SELECT * FROM DQAORA01;TABLE FILE resultPRINT F1SSNON TABLE HOLD AS HOLD1ENDTABLE FILE HOLD1PRINT F1SSNEND

SQL DB2 SELECT C.CLIENT_ID,J.CLIENT_ID, C.CASE_NO,J.REST FROM CLIENT C, CLIENTJ J WHERE C.CLIENT_ID=J.CLIENT_ID; TABLE FILE SQLOUT PRINT * ON TABLE HOLD END

Page 39: Relational Efficiencies: Part I

Copyright 2007, Information Builders. Slide 39