A few days ago I received a report of a problem of poor performance of a banking application. It turns out that although the number of transactions was not high, the monetary value was, so it was a priority to complete them almost instantly. As time went by, these transactions were taking longer and longer, to the point that eventually some of them were considered failed and had to be restarted, making everyone involved in the business nervous, and it is in this scenario that they decided to come to my advice.
The first problem I encountered was that the application was not instrumented and therefore its code was confused with that of other applications also running in that instance. Fortunately, out of the whole universe of SQLs, the DBA had already identified a couple of statements, which coincidentally appeared in the AWR reports as the main responsible for buffer gets, so I immediately proceeded to analyze them.
select nvl(max(event_seq_no), 0)
from cstbs_contract_change_log
where contract_ref_no = :b1
---------------------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | FIRST ROW | |
| 3 | INDEX FULL SCAN (MIN/MAX)| PK01_CSTB_CONTRACT_CHANGE_LOG |
---------------------------------------------------------------------
select nvl(max(event_seq_no), 0)
from istbs_contractis
where contract_ref_no = :b1
------------------------------------------------------------
| Id | Operation | Name |
------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | FIRST ROW | |
| 3 | INDEX FULL SCAN (MIN/MAX)| PK01_ISTB_CONTRACTIS |
------------------------------------------------------------
It was found that for both tables there were indexes for the contract_ref_no column, which were of course much more appropriate than the primary key ones, but for some reason the optimizer was not taking them into account, so as an initial step we proceeded to generate new statistics, hoping that this would be enough for the Optimizer to make the right decision.
exec dbms_stats.gather_table_stats(
ownname => 'FCCPRD',
tabname => 'CSTB_CONTRACT_CHANGE_LOG'
cascade => TRUE,
estimate_percent => 100,
method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO' );
exec dbms_stats.gather_table_stats(
ownname => 'FCCPRD',
tabname => 'ISTB_CONTRACTIS'
cascade => TRUE,
estimate_percent => 100,
method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO' );
With this, execution times were significantly reduced, from an initial average value of 34.2″ to only 8.7″, but there was still the perception that the transaction could be even faster, so it was inevitable to instrument the application and thus be able to accurately identify additional SQL statements that could be tuned. To this end, the following lines of code were incorporated into the problematic routine:
create or replace PACKAGE BODY pks_msg_router AS
. . .
PROCEDURE p_process_req_msg (is_req_clob IN VARCHAR2
. . .
BEGIN
-- begin instrumentation
dbms_application_info.set_module(
module_name => 'msg_router',
action_name => 'process_req_msg');
. . .
<application logic>
. . .
-- end instrumentation
dbms_application_info.set_module(
module_name => NULL,
action_name => NULL);
END p_process_req_msg;
. . .
After the change was implemented, more inefficient SQL statements could be identified, such as:
select nvl(max(event_seq_no),0) from cstbs_contract_event_log
where contract_ref_no = :b1
--------------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | FIRST ROW | |
| 3 | INDEX FULL SCAN (MIN/MAX)| PK01_CSTB_CONTRACT_EVENT_LOG |
--------------------------------------------------------------------
select nvl(max(change_seq_no),0) from cstbs_contract_change_log
where contract_ref_no = :b2 and event_seq_no = :b1
----------------------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | FIRST ROW | |
| 3 | INDEX RANGE SCAN (MIN/MAX)| PK01_CSTB_CONTRACT_CHANGE_LOG |
----------------------------------------------------------------------
The widespread of the problem ends up confirming that this is not a normal situation, but it would be the incidence of a bug at the optimizer level, being the common denominator to all these sentences that include the MIN and/or MAX operator, and it is precisely this criterion that we used to do a search in My Oracle Support, finding the following:
Bug 8467123 | Poor Performance Of Query With Index Full Scan (Min/Max) |
Bug 11834402 | Cbo Chooses A Slow Index Full Scan (Min/Max) Over A Much Faster Index Range Scan |
Bug 9140303 | Optimizer Choose Wrong Index For Min/Max Sql |
Bug 8323407 | Index Range Scan (Min/Max) Cost Change |
Apparently Oracle introduced in Oracle Server 10.2.0.4 a fix for the bug 5611962 CBO does not use index full scan (MIN/MAX) with part. range iterator, but this fix ended up generating a new bug. The solution? Upgrading to 10.2.0.5 or higher, but it was not feasible in this scenario as it would take some time to implement. Fortunately there is a workaround: disable the fix!
SQL> alter system set "_fix_control" = "5611962:OFF";
After performing preliminary tests, it is concluded that the workaround is effective, programming its application at the level of the entire instance. After that, the universe of SQL statements of the problematic transaction is reanalyzed, finding that the previously identified statements are now executed efficiently, with new execution plans.
Original situation | 34.2 segundos |
After aplying SQL Profile (2 SQLs) | 8.7 segundos |
After disabling fix for BUG 5611962 | 0.8 segundos |
With these results, the tuning of the application was concluded, obtaining a decrease in execution time by no less than 98%! To date the application is operating normally and the users are happy with the response time, so it is considered one more case solved satisfactorily.