28 May 2009

SQL Programming Guideline

· Use DCLGEN structures as host variables.

DCLGEN generates the correct type and length to the variables derived from the columns of the table. DCLGEN also provides a program with the LABEL -definitions of the columns, which enhances the clarity and maintainability of a program.

· Use the "Field Name Prefix" feature of the DCLGEN command.

This causes each Cobol field in the DCLGEN to have the same prefix. This makes coding much easier, as it can eliminate the need to code Cobol "OF"s when referring to a field. How many of your tables contain an account number field? A good many of them, probably. It is easier to code

MOVE WS-TEMP-ACCT-NUM TO VIFCACCT-ACCT-NUM

than

MOVE WS-TEMP-ACCT-NUM TO ACCT-NUM OF ACCT-TABLE.

Using prefixes, the programmer can tell at a glance which ACCT-NUM is being referred to. The prefix name should be related to the table name.

· Check that host variables and literals have the same data type and length as the columns.

The same rule applies to join predicates as well.
Different data types, such as INTEGER and DECIMAL
Different column lengths, such as CHAR(3) and CHAR(4)

Example of join

SELECT EMPNO, MGRNO
FROM DSN8810.EMP, DSN8810.DEPT
WHERE WORKDEPT = DEPTNO


where WORKDEPT is defined as CHAR(4) and DEPTNO is defined as CHAR(3)

If column definition is the one that is longer or more precise, MIS(Matching Index Scan) is possible but data conversion is needed.

Wrong variable definitions can be avoided by always using
DCLGEN structures as host variables.

· Avoid scalar functions in WHERE clauses for those index columns you want to use. E.g. substring can often be replaced with LIKE or BETWEEN

In the following example MIS is not possible:

MOVE 'A' TO HV.

SELECT DEPTNAME FROM DSN8810.DEPT
WHERE SUBSTR(DEPTNO,1,1) = :HV

DB2 Optimizer can be made to use MIS -access path if scalar functions actions are moved outside the SQL -clause or are substituted by a more efficient SQL -clause.

In this example scalar function SUBSTR can be replaced by

BETWEEN or LIKE , which both are MIS.

MOVE 'A' TO START.
MOVE 'A99' TO STOP.

SELECT DEPTNAME FROM DSN8810.DEPT
WHERE DEPTNO BETWEEN :START AND :STOP

or

SELECT DEPTNAME FROM DSN8810.DEPT
WHERE DEPTNO LIKE 'A%'

· Avoid arithmetic expressions in a predicate. Perform
calculations before the SQL statement and use the result in the query.


variable + 3 O.K. (non-column expression is indexable)
column + 3 not O.K. (column expression is not indexable)


When column expression is used in WHERE clause, optimizer is not able to use MIS for those columns.
It’s recommended to count values into host variables before SQL:

ADD 100 TO HV1.

SELECT ACSTAFF FROM DSN8810.PROJACT
WHERE PROJNO = :HV1 AND ACTNO = :HV2
AND ACSTDATE = :HV3


Acceptable way is to count the value with variable in SQL.
In this case with index (PROJNO,ACTNO, ACSTDATE), the access path is MIS (MC=3)

SELECT ACSTAFF FROM DSN8810.PROJACT
WHERE PROJNO = :HV1 AND ACTNO = :HV2 + 100
AND ACSTDATE = :HV3


Foul and prohibited way is to count the value with column in SQL. In this case with index (PROJNO,ACTNO, ACSTDATE), the access path is MIS (MC=1)

· Avoid negations (e.g. NOT BETWEEN, NOT IN) in a predicate for index columns.

If negations (e.g. NOT BETWEEN, NOT IN) are used in WHERE clause for the index columns, MIS is not possible:

Foul and prohibited way :
SELECT LASTNAME, EMPNO FROM DSN8810.EMP
WHERE EMPNO NOT BETWEEN '000100' AND '000350'


Access path will be Table space Scan or Non Matching Index Scan.

NOT BETWEEN can be replaced with two sentences or with UNION when result table is small:

SELECT LASTNAME, EMPNO FROM DSN8810.EMP
WHERE EMPNO < '000100' UNION ALL SELECT LASTNAME, EMPNO FROM DSN8810.EMP WHERE EMPNO > '000350'


Both select clauses use MIS as an access path, if the result table is small compared to the rows in the table.
NOT IN -predicate can be replaced only with IN-predicate.

· Keep in mind that the type of a subquery (correlated /noncorrelated) has a great influence on the performance of the query.

A correlated subquery refers to at least one column of the outer query. The outer query is executed first. The subquery is evaluated for each qualified row of the outer one. When a subquery includes another subquery, then each subquery is executed for every qualifying row of outer queries.

For example:

SELECT EMPNO, <==outer query LASTNAME, WORKDEPT, EDLEVEL FROM DSN8810.EMP CURRENT_ROW WHERE EDLEVEL >
(SELECT AVG(EDLEVEL) <==subquery FROM DSN8810.EMP (executes n times) WHERE WORKDEPT = )
If the subquery has already been evaluated for a given correlation value, then the subquery might not have to be re-evaluated UNCORRELATED SUBQUERY is not dependent on outer query and it executes from bottom to top. First execute the innermost subquery and then the result table does compare to every qualifying row of the next outer query.
For example:
SELECT EMPNO, <== outer query
LASTNAME, WORKDEPT, EDLEVEL
FROM DSN8810.EMP WHERE EDLEVEL >
(SELECT AVG(EDLEVEL) <== inner query
FROM DSN8810.EMP) (executes once)
If the subquery result table has a lot of rows, the executing SQL clause may be heavy. The rows of subquery are stored in the assorted result table, which is accessed via sparse index. · When an OR structure cannot be converted into an IN list (for instance WHERE col < :hv1 OR col > :hv2) the access path cannot be MIS.


The access path cannot be MIS, but is at it’s best Multiple Index Access, or in many cases NON-MIS or Table Space Scan. Keep in mind, that index-only access is not possible with MIA(Multiple Index Access)

For example:

SELECT LASTNAME, EMPNO FROM DSN8810.EMP
WHERE EMPNO < '000100' OR EMPNO > '000350'

OR can be replaced with UNION as follows:

SELECT LASTNAME, EMPNO FROM DSN8810.EMP
WHERE EMPNO < '000100' UNION ALL SELECT LASTNAME, EMPNO FROM DSN8810.EMP WHERE EMPNO > '000350'

Access path will be Matching Index Scan, if the number of rows in the result table is small, compared to the number of rows in table.

· When comparing NOT NULL column to a subquery which may get a null value as result, use COALESCE (VALUE) function in subquery.

The function ensures that the subquery will not return a null value as a result when it is compared to the not null -column.

For example:
CREATE TABLE T1 (C1 INTEGER NOT NULL, C2 INTEGER);
CREATE INDEX X1 ON T1 C1 ASC);
CREATE INDEX X2 ON T1 C2 ASC);

For column C1 DB2 will not use index X1 because the result of the subquery may be null value

SELECT ..... FROM T1
WHERE C1 = (SELECT MIN(C1) FROM T1 WHERE …. );

For column C1 DB2 will use index X1 because the result of the subquery will never be null value

SELECT … FROM T1 WHERE C1 =(SELECT VALUE(MIN(C2), 999999) FROM T1 WHERE …. );

· When using OUTER JOIN, be aware of WHERE clauses. Remember that in many cases the result table is formed first and WHERE clause is only checked afterwards. Use ON-clause instead.

In the example the FROM clause will be executed first and WHERE clause after that.

In the FROM clause there is an outer join between tables E1 and E2. The Final WHERE does not work as one would expect, because after join E1.LASTNAME columns have null values.

SELECT VALUE (E1.LASTNAME,'XXX') AS E1NAME,
E1.EMPNO, VALUE (E2.LASTNAME,'XXX') AS E2NAME,
E2.EMPNO
FROM
( SELECT LASTNAME, EMPNO FROM DSN8810.EMP
WHERE LASTNAME <= 'K99') AS E1 FULL JOIN ( SELECT LASTNAME, EMPNO FROM DSN8810.EMP WHERE LASTNAME > 'K99') AS E2
ON E1.EMPNO = E2.EMPNO
WHERE E1.LASTNAME LIKE '%S%' ORDER BY 1;

Notice the strong influence of the WHERE-clause. Clause E2
mentioned before, produces no rows in result table. The reason is, that the FROM clause is executed first, and because it includes this JOIN predicate, which does not match, E1.LASTNAME gets a null value, and the row is excluded in the WHERE -clause.

Correction 1: Move the WHERE clause to the first table
expression.

SELECT VALUE (E1.LASTNAME,'XXX') AS E1NAME,
VALUE (E2.LASTNAME,'XXX') AS E2NAME,
FROM ( SELECT LASTNAME, EMPNO FROM DSN8810.EMP
WHERE LASTNAME <= 'K99' AND LASTNAME LIKE '%S%') AS E1 FULL JOIN ( SELECT LASTNAME, EMPNO FROM DSN8810.EMP WHERE LASTNAME > 'K99') AS E2 ON E1.EMPNO = E2.EMPNO ORDER BY 1;

Correction 2: Check null value separately

SELECT VALUE (E1.LASTNAME,'XXX') AS E1NAME,
E1.EMPNO, VALUE (E2.LASTNAME,'XXX') AS E2NAME,
E2.EMPNO FROM ( SELECT LASTNAME, EMPNO
FROM DSN8810.EMP WHERE LASTNAME <= 'K99') AS E1 FULL JOIN ( SELECT LASTNAME, EMPNO FROM DSN8810.EMP WHERE LASTNAME > 'K99') AS E2
ON E1.EMPNO = E2.EMPNO WHERE
E1.LASTNAME LIKE '%S%' OR E1.LASTNAME IS NULL ORDER BY 1;

· When joining more than 2 tables with outer join, DB2
always builds up a work table.

· Use the order of columns in an index for the columns in the ORDER BY clause whenever possible.

If DB2 decides to use the index according the ORDER BY list, the rows are in right sequence without sorting.

The result table of the following query will be processed in order of (col1, col2). DB2 can utilize any of such indexes which begins with order by -columns . Good indexes are for instance (col1, col2) or (col1, col2, col3):

SELECT col1, col2, col3
FROM table1
WHERE col1 BETWEEN :hv1 AND :hv2
ORDER BY col1, col2

Notice that with LIST PREFETCH, ORDER BY always causes
sorting. MIA and sometimes fetching rows in other sequence than clustering index make LIST PREFETCH as an access path.

· Do not use DISTINCT if there is no need to exclude duplicates.

If table has UNIQUE-index (col1), the next query will not cause sorting. But when UNIQUE-index is like (col1,col2,col3), the result table is sorted.

SELECT DISTINCT col1, col2
FROM table1

When used with column function , DISTINCT will not lead to sorting if the column is the first column of an index. The following query may use index (col1) or index (col1,col2,col3):

SELECT COUNT (DISTINCT col1)
FROM table1

· Specify a UNION with the ALL option unless eliminating of duplicate rows is necessary.

UNION (without ALL) leads always to a sort for excluding duplicates, even when result tables are exclusive and therefore duplicates don’t exist.

· Adding an extra column, which is not in the index, to ORDER BY clause, causes a sort even if there is an equal predicate on that column

Check the SQL-clauses generated by your code generator!

Example:

Index A, B, C, D and clause:
…..
WHERE A = :hv1
AND B = :hv2
AND C = :hv3
AND D > :hv4
AND E = :hv5
ORDER BY A, B, C, D, E

· If columns in ORDER BY clause are from more than one table or only from the inner table, DB2 may end up with an unnecessary sort.

Indexes:
T1: col1, col2
T2: col3

SELECT A.col1, A.col2, B.col3, B.col4
FROM T1 A, T2 B
WHERE ….
ORDER BY A.col1, A.col2, B.col3

· In mass inserts LOAD utility is more efficient than an
INSERT statement. Remember, database is not available during the LOAD

· Use FOR FETCH ONLY when a SELECT statement is used only for retrieving data.

In a distributed environment BLOCK FETCH reduces the number of messages sent across the network and returns fetched rows efficiently. DB2 triggers BLOCK FETCH when it can detect that the retrieved rows can not be updated or deleted. If a cursor is not used for update or delete, use FOR FETCH ONLY to ensure block fetch. This causes DB2 to send a buffer full of rows to the local DB2:

DECLARE cur CURSOR FOR
SELECT col1, col2
FROM table1
FOR FETCH ONLY

· Wild card ('%', '_') as the first character in a LIKE
predicate prevents DB2 from using MIS access path, but it is still more efficient than fetching all the rows and filtering the unnecessary ones in the application program.

· Check occurrence with cursor whenever the result table may have more than one row

Recommended method
cursor
EXISTS

Not recommended method
simple SELECT
SELECT COUNT(*)

For Example;

SELECT ‘1’ INTO :WS-EXISTS – PIC X(1)
FROM tablename WHERE conditions
FETCH FIRST ROW ONLY WITH UR

don’t care about the :WS-EXISTS value, only the SQLCODE that is returned.

· List only the columns you need in SELECT clause. Colums, on which you have an equal predicate, shouldn’t be listed in SELECT clause.

· Avoid executing extra SQL clauses.

For example:
SELECT CURRENT DATE
FROMSYSIBM.SYSDUMMY1
+ updating table1

Correct way:
UPDATE table1
SET col1 = CURRENT DATE
….

· Update and delete rows with cursor except when unique key is known.

By using a cursor you assure, that updates and deletes affect only those rows desired. Lock management speaks in favor of using cursor.

DECLARE CURSOR cursor1 CURSOR FOR
SELECT col1, col2, col3 FROM table1
WHERE col1 = :hv1
FOR UPDATE OF col3
.....
UPDATE table1 SET col3 = :hv3
WHERE CURRENT OF cursor1

The exception is when you update one row and the unique key of the row is known. In this case you can update directly.

UPDATE table1 SET col3 = :hv3
WHERE pkey = :hvpkey

Using the cursor for updating is slightly heavier than direct
UPDATE-clause. Notice that you cannot update with read-only cursor. Cursors including ORDER BY, UNION, UNION ALL, DISTINCT, column functions and JOIN, are read-only.

· When using WHERE clause in creating a view, use WITH CHECK OPTION when the view is used to update a table

WITH CHECK OPTION it is guaranteed that you cannot insert or update rows which cannot be seen through the view.
For example:
CREATE VIEW DSN8810.VPROJ01
(PROJNO,PROJNAME,PROJDEP,RESPEMP)
AS SELECT PROJNO,PROJNAME,PROJDEP,RESPEMP
WHERE PROJDEP LIKE 'AA%'
WITH CHECK OPTION

In this case WITH CHECK OPTION it ensures that rows inserted or updated via view, are always applied by condition PROJDEP LIKE ‘AA%’.
WITH CHECK OPTION it cannot be defined on read-only view or if subquery is included in the view

· Never use SELECT * clause

When columns needed are listed in SELECT clause, the clause is not so much dependent on table structure. Maintenance work is much easier.

Additionally SELECT -clause becomes self-documentary. The connections between columns and variables are visible in SELECT -clause. Retreive and select only the columns you need.

Listing unused columns use more CPU and in some cases it causes DB2 to retrieve data from the table instead of index only access.

· In an INSERT statement, name the columns for which you provide insert values.

Benefits:

INSERT clause is self-documentary
Correspondence between column and data can be checked
directly from INSERT clause
INSERT clause is independent of table construction

· In FOR UPDATE OF clause of the SELECT statement name only the columns to be updated. If you intend to delete rows, name the columns of the primary key.

When defining updating cursor, list only updated columns in FOR UPDATE OF -list. In SELECT -list updated columns need not to be mentioned. When the primary key of the table is col1 and col3 it is updated:

DECLARE CURSOR cursor1 CURSOR FOR
SELECT col1, col2
FROM table1
WHERE col1 = :w1
FOR UPDATE OF col3

If you intend to delete rows, list in FOR UPDATE OF the columns of the primary key.

DECLARE CURSOR cursor2 CURSOR FOR
SELECT col1, col2
FROM table1
WHERE col11 = :w1
FOR UPDATE OF col1

· Write an IN list instead of many OR predicates. DB2 converts several OR -conditions automatically to an IN-list. IN -list is still more clear.

SELECT col1, col2 FROM table1 WHERE col1 IN
('A1', 'B2', 'C3')

· Name the columns in an ORDER BY clause. Exceptions: A column derived from a function or from an expression and every column in the result table of a UNION statement must be ordered by the column number unless renamed with AS clause.

Naming columns adds to the clarity and safety of a program, especially when it is necessary to make changes to it. There are situations where a column has no name:
- SQL-clause is UNION or UNION ALL. In this case all columns of the resulting table have no name.
- ordering column is a constant or a clause.
- ordering column is a column function.
- Constants.

These ‘nameless columns’ may be referenced by AS-clause. If not the clause has to be referenced by its ordering number. AS –clause is recommended. Using the ordering number is bad programming.

· Rows fetched with read-only cursor must not be updated. Cursor must be defined with FOR UPDATE OF-option. Otherwise there will be a kangaroo-cursor i.e. jumping over rows, rereading rows and eventually looping.

When updating or deleting rows, use SELECT FOR UPDATE -
cursor to avoid locking. Use updating cursor even when reading only one row, which may be updated or deleted. When PLAN or PACKAGE is bound by ISOLATION CS, the updating cursor locks the browsed page by U-lock, which prevents other simultaneous updates. For update and delete the lock is changed to X-lock.

When using SELECT FOR UPDATE -cursor deletions and updates are made a row at a time, and locks may be released in suitable intervals by COMMIT. Doing the same without cursor may update/delete a great number of rows and hereby keep many pages locked for too long.

DECLARE CURSOR cursor1 CURSOR FOR
SELECT col1, col2
FROM table1
WHERE col1 = :hv1
FOR UPDATE OF col3
...
UPDATE table1
SET col3 = :hv3
WHERE CURRENT OF cursor1

Observe! FOR UPDATE OF cannot be defined for read-only cursors. Cursors including ORDER BY, UNION, UNION ALL,
DISTINCT, column functions and JOIN, are read-only.

Issuing INSERT, UPDATE and DELETE statements from the same application process while a cursor is open can cause unpredictable results. Updates change the positions of the row, which may result in 'kangaroo cursor' .

· When batch programs are executed concurrently with online programs consider retrying an operation after a deadlock or timeout.

In those batch programs that run concurrently with online transactions, return code for deadlock/timeout (-911) may be implemented to execute the interrupted LUW again a couple of times.

In Deadlock- and timeout-situations, batch programs receive a SQL -code of -911, and task is automatically rolled back to the preceding COMMIT-point. (In realtime programs SQL -code may be - 913, depending of the parameters of the running system, in which case the task is not rolled back to the commit point.) If execution is
continued all statements executed after the preceding COMMIT –point must be executed again

· Define a cursor with FOR FETCH ONLY when the cursor is read-only.

Use FOR FETCH ONLY -option to define a cursor if your intention is only to read the rows. DB2 does not implement locking when consistency is otherwise guaranteed (lock avoidance). Although lock avoidance is used, warm pages are locked. That is why COMMIT is also useful in reading batch-programs.

Lock avoidance needs:
- Isolation Level (CS)
- Currentdata(no)
- Non-updating cursor
- There are no current updates to the row or to the page (cold row) In an ambiguous case FOR FETCH ONLY -option tells DB2 that cursor is read-only.

Consider using UR-option in read-only cursors. This improves concurrency , but may provide the reading program with a logically inconsistent data.

· Be aware of wrong conclusions with MIN and MAX results.

You have to be careful in interpretation of the column functions (AVG, COUNT, MAX, MIN, SUM) results.
For example

SELECT MIN(DEPTNO)
FROM TABLE1
WHERE DEPTNO > :hv

If condition is not true, SQL code is 0 and the answer is one row with a NULL -value.
Here is an example of a table that has one key column and two data columns. One data column allows a NULL -value. The table was provided by some rows including null values and all possible queries were made. The result, is in the table listed below. WHERE –clause provides NULL-answers, but real NULL-values are just left out of the calculations


* = sum of real values/ quantity of real values
** = real max/min value
*** = sum of real values

Null value fulfills only the IS NULL predicate. All other predicates exclude null values.

· Keep in mind the execution sequence in an OUTER JOIN.

For example:

Make a report by the department about the average salary of employees hired after the year 1998. If no employees are hired for the department after 1998 the average salary is 0.


Example 1 (wrong results)

SELECT DEPTNAME,
VALUE(AVG(SALARY) , 0) AS AVGSALARY
FROM DEPT D LEFT JOIN WORKER W
ON D.DEPTNO = W.DEPTNO
WHERE YEAR(HIREDATE) >= 1999
GROUP BY DEPTNAME
ORDER BY DEPTNAME


Results are wrong:

If department has no employees hired after 1998 the department is left out of the report (LEFT JOIN does not help because it works at the FROM level, first FROM and after that WHERE)

Example 2 (correct results)

SELECT DEPTNAME,
VALUE(AVG(SALARY) , 0) AS AVGSALARY
FROM DEPT D LEFT JOIN
( SELECT DEPTNO, SALARY
FROM WORKER
WHERE YEAR(HIREDATE) >= 1998 ) AS W
ON D.DEPTNO = W.DEPTNO
GROUP BY DEPTNAME
ORDER BY DEPTNAME;

· For stored procedures, if using the precompiler and there are SQL host variables in the LINKAGE SECTION, reset the SQL-INIT-FLAG each invocation.

· Never use data returned by negative sqlcode or never use data *not* returned by DB2.

Two typical examples:
- value in host variable after SQLCODE=+100, and
- value in host variable when its NULL indicator is negative.

· Develop a common error handling routine to be used by ALL programs. Capture all info from the SQLCA or use Get Diagnostics

Every EXECUTEABLE SQL statement should check positively for SQLcode and SQLWARN conditions. E.g., is SQLCode +0 ok, +100, handle not found or end of cursor, handle other expected and treatable SQLCODES –perhaps you have retry logic for -911/913; Never assume that if it’s not 0 then it must be +100 because you think you tested for every other possibility. Check positively and if it isn’t an expected SQLCode or SQLWARN, then it’s an error, fatal or non-fatal.

Along with standard error checking, have a standard error reporting mechanism – common format for DB errors or logic or other environmental errors.

· USE the QUERYNO-Clause for every SQL

The numbering of the SQLs can be in steps if 100 so you have enough gaps to insert furhter SQLs (don't renumber the SQLs). Try to ensure the Queryno values used are unique within any given program. With using QUERNO it is much easier to compare access-pathes when code has changed or finding the SQL in the source when only the statement-no. is available (e.g. some warnings, like lock-escalation in DB2-master-adress-space)

More important is the potential for unexpected results if you use access path hints for statements with duplicated Queryno’s. DB2’s going to have a hard time applying a hint to a statement for which it’s not even close, with the result that the hint will be rejected and the statement the hint was intended for won’t pick it up.

· Build the commit logic into every program.

If it runs quickly enough to not meet the commit interval, no problem. design the application you need to do commits at the logical boundary

· Use rowset-processing, especially for new programs. Make also a good programming model for doing this. In most cases it will be very expensive to implement rowset-processing in existing programs because processing-logic has to change significant.

· to minimize lock timeouts by coding data modification statements (INSERT, UPDATE, DELETE) as close to the COMMIT as possible.

· Consider coding all modifications in the same order regardless of program to minimize deadlocks.

For example, issuing all modification DDL in alphabetical order by table name (e.g. INSERT TABLEA, DELETE TABLEF, UPDATE TABLEX, INSERT TABLEZ). Not important that it is alphabetical order (especially if there is some other logical, business order) just that it is the SAME order for all programs.

· Be sure to use the same CCSID for both the DB2 Precompiler and the COBOL Compiler. Their defaults are not the same.

· For every SQL (not only COBOL); In SQLs that reference more than one table use always correlation-ids. Qualify any column-reference with correlation-ids wherever it occurs (where-clause, select-clause, on-clause, group-by ... and so on).

2 comments:

Martin Hubel said...

Hi Cuneyt,
Great! Thanks for pulling this together. If you would like me to send some edits, I'd be happy to do so.
--Martin

Greg P said...

Information is good - but you have to scroll ~22 screens to see it all.

Perhaps use a wider column, and emphasise/delineate each section (perhaps use multiple html pages for different areas of focus?).

Need to highlight key phrases a little better too. Larger bullet size & indenting could help.