Troubleshoot: confused optimizer

A few days ago I received a report of a problem of poor performance of a banking application. It turns out that although the number of transactions was not high, the monetary value was, so it was a priority to complete them almost instantly. As time went by, these transactions were taking longer and longer, to the point that eventually some of them were considered failed and had to be restarted, making everyone involved in the business nervous, and it is in this scenario that they decided to come to my advice.

The first problem I encountered was that the application was not instrumented and therefore its code was confused with that of other applications also running in that instance. Fortunately, out of the whole universe of SQLs, the DBA had already identified a couple of statements, which coincidentally appeared in the AWR reports as the main responsible for buffer gets, so I immediately proceeded to analyze them.

select nvl(max(event_seq_no), 0) 
  from cstbs_contract_change_log 
 where contract_ref_no = :b1
---------------------------------------------------------------------
| Id  | Operation                   | Name                          |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                               |
|   1 |  SORT AGGREGATE             |                               |
|   2 |   FIRST ROW                 |                               |
|   3 |    INDEX FULL SCAN (MIN/MAX)| PK01_CSTB_CONTRACT_CHANGE_LOG |
---------------------------------------------------------------------

select nvl(max(event_seq_no), 0) 
  from istbs_contractis 
 where contract_ref_no = :b1
------------------------------------------------------------
| Id  | Operation                   | Name                 |
------------------------------------------------------------
|   0 | SELECT STATEMENT            |                      |
|   1 |  SORT AGGREGATE             |                      |
|   2 |   FIRST ROW                 |                      |
|   3 |    INDEX FULL SCAN (MIN/MAX)| PK01_ISTB_CONTRACTIS |
------------------------------------------------------------

It was found that for both tables there were indexes for the contract_ref_no column, which were of course much more appropriate than the primary key ones, but for some reason the optimizer was not taking them into account, so as an initial step we proceeded to generate new statistics, hoping that this would be enough for the Optimizer to make the right decision.

exec dbms_stats.gather_table_stats( 
   ownname => 'FCCPRD', 
   tabname => 'CSTB_CONTRACT_CHANGE_LOG'
   cascade => TRUE, 
   estimate_percent => 100, 
   method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO' );

exec dbms_stats.gather_table_stats( 
   ownname => 'FCCPRD', 
   tabname => 'ISTB_CONTRACTIS'
   cascade => TRUE, 
   estimate_percent => 100, 
   method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO' );

With this, execution times were significantly reduced, from an initial average value of 34.2″ to only 8.7″, but there was still the perception that the transaction could be even faster, so it was inevitable to instrument the application and thus be able to accurately identify additional SQL statements that could be tuned. To this end, the following lines of code were incorporated into the problematic routine:

create or replace PACKAGE BODY pks_msg_router AS
. . .
  PROCEDURE p_process_req_msg  (is_req_clob    IN   VARCHAR2
  . . .
  BEGIN
    -- begin instrumentation
    dbms_application_info.set_module(
    module_name => 'msg_router',
    action_name => 'process_req_msg');
    . . .
    <application logic>
    . . . 
    -- end instrumentation
    dbms_application_info.set_module(
    module_name => NULL,
    action_name => NULL);
  END p_process_req_msg;
. . .

After the change was implemented, more inefficient SQL statements could be identified, such as:

select nvl(max(event_seq_no),0) from cstbs_contract_event_log 
where contract_ref_no = :b1
--------------------------------------------------------------------
| Id  | Operation                   | Name                         |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                              |
|   1 |  SORT AGGREGATE             |                              |
|   2 |   FIRST ROW                 |                              |
|   3 |    INDEX FULL SCAN (MIN/MAX)| PK01_CSTB_CONTRACT_EVENT_LOG |
--------------------------------------------------------------------

select nvl(max(change_seq_no),0) from cstbs_contract_change_log 
where contract_ref_no = :b2 and event_seq_no = :b1
----------------------------------------------------------------------
| Id  | Operation                    | Name                          |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                               |
|   1 |  SORT AGGREGATE              |                               |
|   2 |   FIRST ROW                  |                               |
|   3 |    INDEX RANGE SCAN (MIN/MAX)| PK01_CSTB_CONTRACT_CHANGE_LOG |
----------------------------------------------------------------------

The widespread of the problem ends up confirming that this is not a normal situation, but it would be the incidence of a bug at the optimizer level, being the common denominator to all these sentences that include the MIN and/or MAX operator, and it is precisely this criterion that we used to do a search in My Oracle Support, finding the following:

Bug 8467123Poor Performance Of Query With Index Full Scan (Min/Max)
Bug 11834402Cbo Chooses A Slow Index Full Scan (Min/Max) Over A Much Faster Index Range Scan
Bug 9140303Optimizer Choose Wrong Index For Min/Max Sql
Bug 8323407Index Range Scan (Min/Max) Cost Change

Apparently Oracle introduced in Oracle Server 10.2.0.4 a fix for the bug 5611962 CBO does not use index full scan (MIN/MAX) with part. range iterator, but this fix ended up generating a new bug. The solution? Upgrading to 10.2.0.5 or higher, but it was not feasible in this scenario as it would take some time to implement. Fortunately there is a workaround: disable the fix!

SQL> alter system set "_fix_control" = "5611962:OFF";

After performing preliminary tests, it is concluded that the workaround is effective, programming its application at the level of the entire instance. After that, the universe of SQL statements of the problematic transaction is reanalyzed, finding that the previously identified statements are now executed efficiently, with new execution plans.

Original situation34.2 segundos
After aplying SQL Profile (2 SQLs)8.7 segundos
After disabling fix for BUG 56119620.8 segundos

With these results, the tuning of the application was concluded, obtaining a decrease in execution time by no less than 98%! To date the application is operating normally and the users are happy with the response time, so it is considered one more case solved satisfactorily.

Did you find this article interesting, did you have any doubts, do you want to suggest a topic to cover, leave me your comments or contact me me right now!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Posts

Learn how to identify the row involved in the occurrence of the wait even "enq: TX - row lock contention"
Learn how to resolve the CRS-2304 GPnP profile signature verification failed error when starting an 11.2 database on a 19c cluster.
Learn how to fix corrupted permissions on an Oracle Home, either Oracle Grid or Oracle Database Server.

Need Help?

Fill in these details and I will be in touch as soon as possible.