Performance optimization of IS NULL

select distinct A.id from A left outer join B on B.A_id = A.id
where B.A_id is null

Such as the title, more than SQL in large amounts of data (A more than 7, more than 4 of B table, inner join matching number is more than 3) execution in DB2 is very slow, the average time at 1000S above...
Online search a bit, it seems the reason is that DB2 query optimizer in null values for a large number will automatically select a full table scan rather than an index scan?
So how to optimize?

What is the PS: problem is simply that the strongest performance methods A and B difference sets in, I had to change the current:
select distinct A.id from A left outer join B on B.A_id = A.id
where A.id not in (select A.id from org inner join B on B.A_id = A.id)
Use NOT IN instead of IS NULL...
Whether there will be more optimized scheme? God answers online

Started by Helen at February 03, 2016 - 3:12 PM

where COALESCE(B.A_id,0)=0

Try it

Posted by Angelia at February 13, 2016 - 3:41 PM

Hello this is tried, efficiency and is is similar to null. But it is interesting to me on the IBM website to find a similar answer you post, but is said to ORACLE... On this method in you said also said at the same time to build an index, similar to the following:
create index test on B(COALESCE(A_id,0)), But DB2 seems to be not allow this kind of grammar, index cannot be created...

Posted by Helen at February 23, 2016 - 4:36 PM

Top up tomorrow, there is no adequate solution

Posted by Helen at March 03, 2016 - 5:05 PM

The establishment of 1 redundant field to save COALESCE (A_id, 0), this field is indexed.

Posted by Teresa at March 17, 2016 - 5:59 PM

select id
from A
where not exists (select 1 from B where A_id = A.id)

In the B (A_id) create index

Posted by Kerry at March 21, 2016 - 6:38 PM

Thank you LS two answer.
Now update my status:
I am the given SQL is part of the SQL to my task, because I think that the causes of the slow SQL efficiency is is null query, so give only a part of the is null, in fact this is a big factor in the slow efficiency of task related SQL, but in after optimizing the me as the building is given, confirmed the other part in the slow efficiency.
I now give the whole SQL:
SELECT DISTINCT A.id FROM A LEFT OUTER JOIN B ON B.A_id =A.id
LEFT OUTER JOIN C ON B.id=c.B_id
LEFT OUTER JOIN D ON C.D_id=d.id
WHERE (B.A_id IS NULL OR (NOT (B.some_attribue <> 0 OR B.colOther IS NULL)
AND D.first_column IS NOT NULL AND NOT (D.second_column <> 0 OR D.second_column IS NULL) AND (D.third_colum <> 0 OR D.third_colum IS NULL)))

A slow efficiency factors mentioned above post is bold, reasons and improvement I have in the main building are given, the where if only the improved, then the efficiency from 20 minutes to 6 seconds,
But the next question is, once the improved with the back of the clause OR (1& & 2& & 3& & 4) together, efficiency will be greatly reduced, so the improved before I almost futile...
Then I found, if the field of type B.A_id from VARCHAR (250) to BIGINT, the efficiency of the entire SQL still can greatly improve, even not to remove the OR clause, nor do I mentioned above is modified, the SQL, just change a field type, will solve the problem...
So my question is, why only change a field types can solve all the problems? Because the index mechanism of DB2 request field needs to be BIGINT?

Posted by Helen at April 03, 2016 - 7:02 PM

May be because varchar is the variable field, as an index calculation more CPU, while bigint is a fixed amount of field, therefore, to improve the efficiency of bigint immediately after.

If you can, try using char.

Posted by Steven at December 02, 2016 - 10:05 AM

Well there is probably so...
But no matter, it has been said that is not feasible, change the field type involves too wide, was rejected...

Posted by Helen at December 05, 2016 - 10:38 AM

(B.A_id IS NULL OR (NOT (B.some_attribue <> 0 OR B.colOther IS NULL)
AND D.first_column IS NOT NULL AND NOT (D.second_column <> 0 OR D.second_column IS NULL) AND (D.third_colum <> 0 OR D.third_colum IS NULL)))

1 B.A_id IS NULL
2 B.some_attribue <> 0
3 B.colOther IS NULL
4 D.first_column IS NOT NULL
5 D.second_column <> 0
6 D.second_column IS NULL
7 D.third_colum <> 0
8 D.third_colum IS NULL

(1 OR (NOT (2 OR 3)AND 4 AND NOT (5 OR 6) AND (7 OR 8)))


(1 OR (NOT 2 AND NOT 3 AND 4 AND NOT 5 AND NOT 6 AND (7 OR 8)))


1

UNION

NOT 2 AND NOT 3 AND 4 AND NOT 5 AND NOT 6 AND 7

UNION

NOT 2 AND NOT 3 AND 4 AND NOT 5 AND NOT 6 AND 8

B.A_id IS NULL
UNION
B.some_attribue = 0 AND B.colOther IS NOT NULL AND D.first_column IS NOT NULL AND D.second_column = 0 AND D.second_column IS NOT NULL AND D.third_colum <> 0
UNINO
B.some_attribue = 0 AND B.colOther IS NOT NULL AND D.first_column IS NOT NULL AND D.second_column = 0 AND D.second_column IS NOT NULL AND D.third_colum IS NULL


To simplify where conditions do not according to the business logic to write SQL

Split three sentences each sentence separately optimized where sequence or left modified by inner

Try it

Posted by Upton at December 18, 2016 - 11:11 AM

Should not use VARCHAR as the main health, more records, slower, VARCHAR is for large number of recorded remark use, design should consider the table.

Posted by Steven at December 28, 2016 - 11:40 AM

In fact varchar the field is not the primary key, and plays key role, associated with the related table, but the problem is the field type table corresponding to the BIGINT, but this table when a VARCHAR...

Posted by Helen at December 31, 2016 - 11:50 AM

The present situation is positive solution, because the best scheme and his colleagues get is this train of thought, detailed later I will paste.

Posted by Helen at January 03, 2017 - 12:50 PM

Good stuff, thank Master

Posted by Mick at January 03, 2017 - 2:17 PM

Do not stick out?

Posted by Claudia at January 07, 2017 - 11:51 AM

The response to 2013-08-02 14:42:58 was removed by the administrator

Posted by Edison at January 15, 2017 - 1:40 PM