relational efficiencies: part i
DESCRIPTION
Relational Efficiencies: Part I. Renee Teatro Information Builders. Relational Efficiencies Agenda. Optimization Overview JOINs Sorting Aggregation Expressions Direct SQL Passthru. Relational Efficiencies Layers of Processing. User executes a FOCUS request... - PowerPoint PPT PresentationTRANSCRIPT
Copyright 2007, Information Builders. Slide 1
Relational Efficiencies: Part I
Renee TeatroInformation Builders
Optimization Overview JOINs SortingAggregationExpressionsDirect SQL Passthru
Relational Efficiencies Agenda
Copyright 2007, Information Builders. Slide 3
User executes a FOCUS request...
FOCUS calls the appropriate module (TABLE, GRAPH)
Reads and parses the MFDParses the requestCalls the Interface with information on
the MFD and the request available inwork areas
Reads and parses the AFDChecks the module specifics for
optimizationAnalyzes and optimizes the requestGenerates SELECT statement(s)Passes SELECT statements
to the appropriate ‘physical’ modulePrepares, allocates, and opens cursor(s)Calls the RDBMS to FETCH rows of data
Analyzes and optimizes the SQL translation of the FOCUS application request
Chooses the appropriate access pathand retrieval method
Retrieves dataCreates the answer set
FOCUSFOCUS Reporting Modules
TABLE
DEFINE
TABLEF
ANALYSE
GRAPH
FRL
JOIN
MATCH FILE
DB2 SQL/DS Teradata Oracle
Data Adapter
SQL Generation and Execution(‘logical layer’ – GNTINT )
SQL modules (‘physical layer’)
DB2 SQL/DS RRSET
TeradataDBTFOC
Oracle ORAFOC
DATA
Report is displayed
FOCUS reads a row from the answer set and processes the remaining actions on that row (IF/WHERE, DEFINEs…)
Puts the valid row into the Internal MatrixReads the next row and repeats the
process until the end of the answer setProcesses the Internal Matrix and
displays the report
The Interface receives a row and/or an SQL status code
Converts non-standard data into FOCUS format, making it available to FOCUS
Asks RDBMS for the next row (FETCH), until the end of the answer set is reached (SQL status code +100)
Sends back a row and/or an SQL status code to the Interface
FOCUS / WebFOCUS
Data Adapter
Relational Efficiencies Layers of Processing
Interface optimization is the degree to which a TABLE request is translated to SQL
In other words, the process in which the interface translates projection, selection, JOIN, sort, and aggregation operations of a report request into its SQL equivalent and passes it to the RDBMS for processing
TABLE, MODIFY, MAINTAIN SQL Direct SQL Passthru Passing JOINs
Relational Efficiencies RDBMS Optimization
Copyright 2007, Information Builders. Slide 5
TSOCMSSQL
DB2/SQLDSSQLDBCSQLORASQLMSS
SET OPTIMIZATIONSQLJOIN
ONOFFSQLFOCUS
*
Components Description
Target Database Engine DB2 / SQLDS for DB2SQLDBC for TeradataSQLORA for Oracle
SQLMSS for SQL Server
OUTER
Relational EfficienciesThe Optimization Command
In the SQL engine SET OPTIMIZATION ON is the default, so the interface attempts to fully optimize all requests
RDBMS is preferred engine for processing More RDBMS processing is usually beneficial
Types of optimization ON – default OFF – Lets FOCUS handle all JOINs, sorts, and
aggregations SQL – pass SQL even if multiplicative effect found FOCUS – pass SQL only if results are identical to
FOCUS processing
Relational EfficienciesOptimization Settings
SET TRACEON=component//destination Component
SQLDI - FSTRACE - All Interface-RDBMS activitySQLAGGR - FSTRACE3 - Optimization messagesSTMTRACE - FSTRACE4 - SQL onlySQLCALL - commands and data exchange between
the physical and the logical layers of the data adapter Destination
FSTRACE - allocation for the ddname of FSTRACE CLIENT - displays client session to the screen
Relational EfficienciesData Adapter TRACE Facility
SET XRETRIEVAL=[ON | OFF]ON – the interface sends the request to the RDBMS
and it processes the requestOFF – the interface attempts to optimize the request,
but no RDBMS processing is done
Relational EfficienciesXRETRIEVAL Option
SET TRACEOFF=ALL SET TRACEUSER=CLIENTSET TRACEON=STMTRACE//FSTRACETABLE FILE EMPLOYEECOUNT EMP_ID BY DEPARTMENTWHERE CURR_SAL GT 10000END
STMTRACE:
SELECT DEPARTMENT, COUNT(*) FROM EMPLOYEEWHERE (CURR_SAL > 10000)GROUP BY DEPARTMENT ORDER BY DEPARTMENT;
Relational EfficienciesFully Optimized Query
DEFINE FILE EMPDB2 CATEGORY/A4 = IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH';END TABLE FILE EMPDB2 SUM CSAL CATEGORY BY EID END
STMTRACE:SELECT T1.EID,T1.CSAL FROM "PMSSAE"."EMPINFO" T1 ORDER BY T1.EID FOR FETCH ONLY;
SQLAGGR: (FOC2590) AGGREGATION NOT DONE FOR THE FOLLOWING REASON: (FOC2597) USE OF DEFINE FIELD THAT CANNOT BE AGGREGATED: CATEGORY
Relational EfficienciesNon-Optimized Query
Copyright 2007, Information Builders. Slide 11
Relational EfficienciesNow Optimized Query
SQLAGGR:AGGREGATION DONE ... STMTRACE: SELECT T1.EID, SUM(T1.CSAL) FROM "PMSSAE"."EMPINFO" T1 GROUP BY T1.EID ORDER BY T1.EID FOR FETCH ONLY;
SET TRACEOFF=ALL SET TRACEUSER=CLIENTSET TRACEON=SQLAGGR//CLIENTSET TRACEON=STMTRACE//CLIENT
TABLE FILE EMPDB2
SUM CSAL COMPUTE CATEGORY/A4=IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH';
BY EID
END
Copyright 2007, Information Builders. Slide 12
Projection and Selection
Projection is the retrieval of specific columns onlyProjection is always optimized Interface retrieves columns referenced in
Print/sum/count commands Objects of JOINs and DEFINE statements
PRINT * and SEG.fieldname Will return all columns in master file only SELECT * never produced
TABLE FILE EMPLOYEEPRINT *END
SELECT EID, LN, FNFROM EMPLOYEE;
Relational EfficienciesProjection
A master file can be considered a dynamic RDBMS view.A master can contain:
One or more columns of a relational table Multiple relational tables – called an embedded MFD Real relational views
Main advantages of a master file SQL JOIN syntax hidden from user View not stored in RDBMS catalog Activation of only necessary tables (segments)
Note: Not the case with dynamic JOIN
Relational EfficienciesProjection
Copyright 2007, Information Builders. Slide 15
Type of Expressions
Expression Components Examples
Arithmetic-Valued Expressions(Expressions that return a single number)
Character String-Valued Expressions(Expressions that return a character string)
Logical Expressions(Expressions that return a single value, True or False)
• Real fields of datatype I, P, D, or F• Numeric constants• Arithmetic operators (+, -, *, / )• Aggregation operators (SUM.,
CNT., AVE., MIN., MAX.)
• Real fields of datatype A• String constants• Concatenation operator (I)• EDIT of alphanumeric fields
• Real fields with any FOCUSdatatype
• Constants of consistent datatype• Relational operator
(EQ, NE, GT, LE ..)• Logical operators (AND, OR, NOT)• Valued expression operands
WHERE TOTAL (AVE.CSAL *0.10) +AVE.CSALGT 55000;
WHERE EDIT(FN, ‘9.$’) |LNEQ ‘J.WANG’ ;
WHERE (CDIV EQ ‘CORP’ OR‘NE’) AND ((CSAL*0.10) +CSAL GT 55000);
Screening conditions on DEFINEd fields, which calculate the above type of expressions, arepassed to the RDBMS.
NOTE
Relational EfficienciesSelection: Translatable Screening
Copyright 2007, Information Builders. Slide 16
User-written subroutines
IF–THEN–ELSE expressions*** optimized
Self-referential expressions
EDIT for field format conversions
Strong concatenation (II)
DECODE function
Non-SQL relational operators(INCLUDES, EXCLUDES)
FOCUS subroutines(ABS, INT, MAX, MIN, LOG, SQRT)
Expressions using fields withACTUAL=DATE
Expressions using Example
DEFINE... FNL/I3 = ARGLEN(15,LN,FNL);TABLE... IF FNL LE 6
DEFINE... DIVISION/A11=IF CDIV EQ ‘CORP’ THEN ‘CORPORATE’ ELSEIF CDIV EQ ‘NE’ THEN ‘NORTH-EAST’ ELSE ‘NA’;
TABLE... IF DIVISION EQ ‘CORPORATE’ OR ‘NORTH-EAST”
DEFINE... CPT/I2=CPT+1;TABLE... IF CPT NE 0
WHERE EDIT(ID) GT 20
DEFINE... NAME/A27=FN||(‘ ‘ | LN);TABLE... IF NAME EQ ‘DANIEL VALINO’
DEFINE... DEVISION/A11=DECODE CDIV (‘CORP’ ‘CORPORATE’ ‘NE’‘NORTH-EAST’ ELSE ‘NA’);
TABLE... IF DIVISION EQ ‘CORPORATE’ OR ‘NORTH-EAST’
IF LN INCLUDES ‘VALINO’
DEFINE... HDAT2/YYMD=HDAT+365;TABLE... IF HDAT2 GT ‘1990/03/01’
WHERE SQRT(CSAL) GT 260
Relational EfficienciesNon-Translatable Screening Conditions
Copyright 2007, Information Builders. Slide 17
JOIN Processing
Interface attempts to generate ONE SELECT statement to JOIN all tables
Applies to dynamic or embedded JOINs One OPEN cursor operation JOIN optimized more readily by RDBMS An optimized JOIN enables sorts and aggregations to be
passed Limits interface <==> RDBMS communications
Relational EfficienciesJOIN Optimization
If JOIN is not passed to RDBMS Termed: FOCUS-managed JOIN
One SELECT statement for EACH table FOCUS executes a nested loop JOIN
Parent table (HOST) is the outer tableOne inner table OPEN for each row returned from the
outer tableOuter (host) table – choose the one with fewest rows
returned Sorts and aggregations are not passed SQLAGGR/FSTRACE3 displays reason
Relational EfficienciesJOIN Optimization
JOIN F1SSN IN TABLE1 TO F2SSN IN TABLE2TABLE FILE TABLE1 PRINT F2SSN END Optimized JOIN:SELECT T1.F1SSN,T2.F2SSN FROM "PMSNJC".TABLE1 T1,
PMSNJC.TABLE2 T2 WHERE (T2.F2SSN = T1.F1SSN) FOR FETCH ONLY;
Non-Optimized JOIN:(FOC2510) FOCUS-MANAGED JOIN SELECTED FOR FOLLOWING
REASON(S):SELECT T1.F1SSN FROM "PMSNJC".TABLE1 T1 FOR FETCH ONLY;SELECT T2.F2SSN FROM PMSNJC.TABLE2 T2 WHERE (T2.F2SSN = ?)
FOR FETCH ONLY;
Relational EfficienciesOptimized JOIN vs. Non-Optimized JOIN
In earlier releases, these types of JOINs disabled optimization: Multiplicative effect encountered for aggregated requests
Termed: Interface-managed native JOIN Check results, FOCUS managed may be more efficient
(SET OPTIMIZATION=OFF) Outer JOIN (SET ALL=ON)
Missing cross-referenced rows are processed RDBMS specific syntax in SQL SELECT statement SQL sqlengine SET SQLJOIN OUTER OFF|ON SET ALL=PASS not supported
WHERE field EQ ‘$*’ OR field IS-MISSING Create HOLD files/JOIN/SET ALL=PASS
Heterogeneous JOIN Differing file types (e.g., flat file, IMS, etc.)
Relational EfficienciesSpecial JOINs
When a JOIN is not passed to RDBMS, make sure: The KEYS= parameter is defined correctly The JOIN command (unique or non-unique) corresponds
to the AFD KEYS= parameterSome other considerations:
(Over) normalized vs. non-normalized data Ensure referenced tables on same retrieval path Consider use of indices If Interface optimization is disabled, consider choice of
parent table, use of HOLD files JOIN on same data type and length
Relational EfficienciesJOIN Considerations
Copyright 2007, Information Builders. Slide 23
Sort Processing
FOCUS BY/ACROSS translated to SQL ORDER BY
Translating sort phrases (BY/ACROSS) to SQL is important Relational sort is usually more efficient RDBMS uses indices Sort enables RDBMS to perform aggregation FOCUS retrieves the answer set in sorted order Reduced I/O since answer set is aggregated
Relational EfficienciesSort Optimization
FOCUS sort phrases are NOT translated to SQL and Optimization is disabled when:
Optimization was set OFF by user JOINs were not passed to RDBMS (and consequently
Optimization was disabled by the interface) A FOCUS sort phrase uses an FRL command:
BY field ROWS value1 OVER value2... FOR field ROWS value1 OVER value2…
FOCUS sort phrases are not fully translated to SQL and aggregation and optimization is automatically disabled when:
FOCUS BY/ACROSS…IN-GROUPS-OF is requested
Relational EfficienciesSort Optimization
To get FOCUS Sort phrases translated to SQL Sort on real fields & use COMPUTEs instead of DEFINEs Sorts on most DEFINEd fields are now optimized
Use SQLAGGR/STMTRACE to evaluate if DEFINE fields are being translated. If not, reformulate if possible
With FST. and LST. ensure access file KEYS and KEYORDER parameters are correct
Considerations Consider indexes on sort objects SET OPTIMIZATION OFF/TABLEF/External Sort Consider using TABLEF if sort is passed
Relational EfficienciesSort Optimization
Use TABLEF when all FOCUS sort phrases are translated to SQL Faster than TABLE Does not generate an internal matrix (FOCSORT) Eliminates FOCUS sorting
You cannot use TABLEF when FOCUS has to process some of the sorting with
ACROSS Direct operators requiring the FOCUS internal matrix (TOT.,
PCT., or RPCT.) COMPUTE expressions using direct operators Multi-verb requests RETYPE
Note: Locks are held with TABLEF until report is complete (commit issued)
Relational EfficienciesUsing TABLEF
Copyright 2007, Information Builders. Slide 28
Aggregation Processing
Aggregation translation is important RDBMS aggregation is more efficient: indices An aggregated answer set reduces FOCUS-to-RDBMS
communication A smaller answer set reduces FOCUS local processing
FOCUS SQL
SUM ..., WRITE ... BY field
SUM., CNT., MIN., MAX., AVE.
SELECT SUM(...) GROUP BY columnORDER BY column
SUM(...), COUNT(*), MIN(...), MAX(...), AVG(...)
Relational EfficienciesEfficient Aggregation
Verbs: SUM, COUNT, WRITE Direct operators: MIN., MAX., AVE. Aggregating DEFINEd fields:
Constant DEFINEd fields translated with CNT.The following defined expressions can be translated
Type of expressions Expression components Examples
Arithmetic Valued(Expressions that return a single number)
Character String Valued(Expressions that return a character string)
• Real fields of datatype I, P, D, or F• Numeric constants• Arithmetic operators (+, -, *, / )
• Real fields of datatype A• String constants• Concatenation operator (I)• EDIT of alphanumeric fields
DEFINE FILE ORAEMPNEW_SAL/D12.2=(CSAL * 0.10)
+ CSAL ;END
DEFINE FILE ORAEMPNAME/A18=EDIT(FN,‘9.$’)|LN;END
Relational EfficienciesTranslatable Aggregation
Aggregation is not translated to SQL and optimization is automatically disabled when:
Optimization was set off by user JOINs were not passed to RDBMS (and consequently
optimization was disabled by the interface) FOCUS sort phrase is not translated Some screening conditions not passed to RDBMS Some non-SQL operators are used Multi-verb requests COUNT with MISSING=ON
If the verbs PRINT or LIST are used, no aggregation is requested and FSTRACE3 returns the following message:
(FOC2590) AGGREGATION NOT DONE FOR THE FOLLOWING REASON:(FOC2594) AGGREGATION IS NOT APPLICABLE TO THE VERB USED
Relational EfficienciesNon-Translatable Aggregation
NOTE
Possible index-only processingPossibly aggregate in RDBMS indexExplicit or implicit (e.g., in heading/footing) FST. and LST.
can be optimized using MIN and MAXAggregate on real fieldsUse COMPUTE in place of DEFINECreate aggregated extract files (HOLD files) in cases
where aggregation is not optimized
Relational EfficienciesAggregation Considerations
Copyright 2007, Information Builders. Slide 33
Virtual Field Processing
DEFINE fields can be optimized as part of aggregation or record selection
Aggregation or record selection can optimize: Arithmetic-valued expressions Character string-valued expressions Logical expressions (selection only)
Aggregation cannot be optimized for logical expressionsSingle segment DEFINEs passed when JOIN is not IF-THEN-ELSE DEFINEs capable of being passed
Relational EfficienciesVirtual Field (DEFINE) Optimization
DEFINE FILE EMPINFO SAL_FLAG = IF (CURRENT_SALARY LT 10000) AND (DEPARTMENT_CD EQ 'MIS') THEN 1 ELSE 0; END TABLE FILE EMPINFO PRINT EMP_ID LAST_NAME FIRST_NAME IF SAL_FLAG EQ 1 END
STMTRACE: SELECT T1.EID,T1.LN,T1.FN, T1.DEPARTMENT_CD,T1.CURRENT_SALARY FROM "USER1"."EMPINFO" T1 WHERE ((((T1.CURRENT_SALARY < 10000) AND (T1.DEPARTMENT_CD = 'MIS')))) FOR FETCH ONLY;
Relational EfficienciesIF-THEN-ELSE DEFINE Example – Optimized
DEFINE FILE EMPDB2CATEGORY/A4 = IF CSAL LT 10000 THEN 'LOW' ELSE 'HIGH'; CATEGORY1/I4 = IF CSAL LT 10000 THEN 0 ELSE 1 ; CATEGORY2/D10 = CSAL * 1.3; ENDTABLE FILE EMPDB2SUM CSAL CATEGORY2BY EIDENDSTMTRACE: SELECT T1.EID, SUM(T1.CSAL), SUM((T1.CSAL * 1.3)) FROM "PMSSAE"."EMPINFO" T1 GROUP BY T1.EID ORDER BY T1.EID FOR FETCH ONLY;
Relational EfficienciesAggregation DEFINE Example – Optimized
Copyright 2007, Information Builders. Slide 37
Aggregation by Expression Allows named expression to be used in ORDER BY clause
DEFINE FILE DB2FILE TABLE FILE DB2FILE TAX = 0.08 * PRICE SUM PRICE TAX END BY TAX NOPRINT
AGGREGATION DONE ... SELECT SK001, SUM(VB001), SUM(VB002) FROM (SELECT (.08 * T1.PRICE) AS SK001,T1.PRICE AS VB001,(.08 * T1.PRICE) AS VB002 FROM USER.DB2FILE T1 ) X GROUP BY SK001 ORDER BY SK001 FOR FETCH ONLY;
In the past: (FOC2597) USE OF DEFINED FIELD THAT CANNOT BE AGGREGATED :TAX
Relational EfficienciesSort Expression Example – Optimized
Copyright 2007, Information Builders. Slide 38
Relational EfficienciesDirect SQL PassthruBest of both worlds
If the most efficient SQL is not generated or Optimized SQL code already exists
SQL sqlengine {any valid SQL statement} END
SET SQLENGINE=SQLORASQL PREPARE result FOR SELECT * FROM DQAORA01;TABLE FILE resultPRINT F1SSNON TABLE HOLD AS HOLD1ENDTABLE FILE HOLD1PRINT F1SSNEND
SQL DB2 SELECT C.CLIENT_ID,J.CLIENT_ID, C.CASE_NO,J.REST FROM CLIENT C, CLIENTJ J WHERE C.CLIENT_ID=J.CLIENT_ID; TABLE FILE SQLOUT PRINT * ON TABLE HOLD END
Copyright 2007, Information Builders. Slide 39