Troubleshoot: ora-600[kcrf_resilver_log_1] & ora-600[4193]

April 23, 2014
No Comments

Last Saturday I received a call while I was driving, parked and took it. It was from a good friend and client who had a problem with one of his databases.

It happened that, as part of a maintenance to the server in which it operated, one of the disks had presented errors to be recognized. Eventually they found the cause of the problem and solved it, the disk was recognized again but the database would not open, they got an ORA-600 error that they did not know how to deal with and therefore required my help.

That day was a particularly busy one, I was helping in the preparations to celebrate my little Lucia’s birthday, but apparently it would not take much time to attend this emergency, so we agreed that I would take a few minutes to at least see what could be done to solve the problem; my friend gave me all the necessary data to connect remotely to his server, and a few minutes later I was reviewing the alert.log file, finding the following:

Copiar

Sat Apr 19 13:46:34 2014 
ALTER DATABASE OPEN
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Errors in file d:\app\oracle\diag\rdbms\cts\cts\trace\cts_ora_3696.trc  (incident=85364):
ORA-00600: código de error interno, argumentos: [kcrf_resilver_log_1], [0x7FF62C56708], [2], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\oracle\diag\rdbms\cts\cts\incident\incdir_85364\cts_ora_3696_i85364.trc
Aborting crash recovery due to error 600
Errors in file d:\app\oracle\diag\rdbms\cts\cts\trace\cts_ora_3696.trc:
ORA-00600: código de error interno, argumentos: [kcrf_resilver_log_1], [0x7FF62C56708], [2], [], [], [], [], [], [], [], [], []
Errors in file d:\app\oracle\diag\rdbms\cts\cts\trace\cts_ora_3696.trc:
ORA-00600: código de error interno, argumentos: [kcrf_resilver_log_1], [0x7FF62C56708], [2], [], [], [], [], [], [], [], [], []
ORA-600 signalled during: ALTER DATABASE OPEN...

Checking on My Oracle Support (MOS) I found the note 1227666.1 ORA-00600 [kcrf_resilver_log_1] on restart after system crash, which he points out as the cause of this error: “There has been a lost write to the online redolog as a result of the crash. Instance recovery is not possible.”, and indicates that the solution is: “Restore the database and do point in time recovery to the most recent archivelog.”

It is bad news, Oracle Support tells us that there was an error when saving in the online redologs, it is not possible to do an automatic recovery of the instance and consequently it is necessary to do a restore and then a recover. As some of you may already suspect, this problem was not going to be so easy to solve: this database was not in archive log mode, and there was no physical backup to resort to, the chances of solving the problem were drastically reduced, but the challenge was given and we had to keep trying.

As a first step, a physical copy of all the datafiles, online redologs and controlfiles was made, in order to later try to recover them:

Copiar

SQL> recover database using backup controlfile until cancel;
ORA-00279: change 99446132 generated at 19/04/2014 10:00:45 needed for thread 1
ORA-00289: suggestion : D:\APP\ORACLE\FLASH_RECOVERY_AREA\CTS\ARCHIVELOG\2014_04_19\O1_MF_1_1157_%U_.ARC
ORA-00280: change 99446132 for thread 1 is in sequence #1157

Specify log: {=suggested | filename | AUTO | CANCEL}

Obviously the suggested archivelog did not exist, but to our luck the sequence 1157 corresponded to an undamaged redolog:

Copiar

SQL> select group#, sequence# from v$log;

    GROUP#  SEQUENCE# 
---------- ---------- 
         1       1156 
         2       1157
         3       1155

Therefore, we proceeded to feed this online redolog to the recover command.

Copiar

Specify log: {=suggested | filename | AUTO | CANCEL}
'D:\APP\ORACLE\ORADATA\CTS\REDO02.LOG'

Log applied.
Media recovery complete.

The message “Media recovery complete” brought us back to our souls, apparently the problem was solved, but the final step was missing, opening the database.

Copiar

alter database open resetlogs;

But instead of the database being opened normally, we encountered a new ORA-600, as seen in this excerpt from the alert.log.

Copiar

Sat Apr 19 15:16:01 2014
alter database open resetlogs
RESETLOGS after complete recovery through change 99446132
Resetting resetlogs activation ID 4065219144 (0xf24e5248)
Sat Apr 19 15:16:03 2014
Setting recovery target incarnation to 3
Sat Apr 19 15:16:03 2014
Assigning activation ID 4085838382 (0xf388f22e)
Thread 1 opened at log sequence 1
  Current log# 1 seq# 1 mem# 0: D:\APP\ORACLE\ORADATA\CTS\REDO01.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
Sat Apr 19 15:16:04 2014
SMON: enabling cache recovery
Successfully onlined Undo Tablespace 2.
Dictionary check beginning
Dictionary check complete
Verifying file header compatibility for 11g tablespace encryption..
Verifying 11g file header compatibility for tablespace encryption completed
SMON: enabling tx recovery
Database Characterset is WE8MSWIN1252
Errors in file d:\app\oracle\diag\rdbms\cts\cts\trace\cts_smon_4640.trc  (incident=87710):
ORA-00600: internal error code, arguments: [4193], [], [], [], [], [], [], [], [], [], [], []
Incident details in: d:\app\oracle\diag\rdbms\cts\cts\incident\incdir_87710\cts_smon_4640_i87710.trc
No Resource Manager plan active
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x19698E97] [PC:0x9257D88, kgegpa()+38]
Dump file d:\app\oracle\diag\rdbms\cts\cts\trace\alert_cts.log

After searching MOS, the following note was found: 452662.1 Ora-600 [4193] When Opening Or Shutting Down A Database, which states as a cause: “..some kind of block corruption in either the redo or the undo block”.

The document in question discusses extensively the various possible scenarios, but for this particular case we only applied the solution for when the database fails to be opened, and that is to disable the automatic management of undo and try to open the database, if it succeeds in opening then the corruption did not affect the redo but was limited to the undo segments. That would be the ideal scenario, so we cross our fingers and run:

Copiar

SQL> alter system set undo_management='MANUAL' scope=SPFILE sid='*';
System altered.

SQL> shutdown immediate
ORACLE instance shut down.

SQL> startup
ORACLE instance started.

Total System Global Area  167772160 bytes
Fixed Size                   787968 bytes
Variable Size              61864448 bytes
Database Buffers          104857600 bytes
Redo Buffers                 262144 bytes
Database mounted.
Database opened.

Phenomenal! To our relief the database opened normally, which was verified by the messages in the alert.log file:

Copiar

Sat Apr 19 15:28:08 2014
alter database open
Beginning crash recovery of 1 threads
 parallel recovery started with 3 processes
Started redo scan
Completed redo scan
 read 31 KB redo, 13 data blocks need recovery
Started redo application at
 Thread 1: logseq 1, block 3
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1 Reading mem 0
  Mem# 0: D:\APP\ORACLE\ORADATA\CTS\REDO01.LOG
Completed redo application of 0.02MB
Completed crash recovery at
 Thread 1: logseq 1, block 66, scn 99466367
 13 data blocks read, 13 data blocks written, 31 redo k-bytes read
Sat Apr 19 15:28:09 2014
Thread 1 advanced to log sequence 2 (thread open)
Thread 1 opened at log sequence 2
  Current log# 2 seq# 2 mem# 0: D:\APP\ORACLE\ORADATA\CTS\REDO02.LOG
Successful open of redo thread 1
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set

The note also indicates that since the database manages to open, it should be considered that all transactions have been saved, and therefore we can get rid of the undo tablespace containing the corrupted undo segments and create a new tablespace to replace it, so we continue with the following:

Copiar

SQL> create undo tablespace UNDOTBS2 datafile
    'D:\APP\ORACLE\ORADATA\CTS\UNDOTBS02.DBF'
     size 100M reuse autoextend on;
Tablespace created.

SQL> alter system set undo_tablespace='UNDOTBS2' scope=SPFILE sid='*';
System altered.

SQL> alter system set undo_management='AUTO' scope=SPFILE sid='*';
System altered.

SQL> shutdown immediate
ORACLE instance shut down.

SQL> startup
ORACLE instance started.

Total System Global Area  167772160 bytes
Fixed Size                   787968 bytes
Variable Size              61864448 bytes
Database Buffers          104857600 bytes
Redo Buffers                 262144 bytes
Database mounted.
Database opened.

SQL> alter tablespace undotbs1 offline normal;
Tablespace altered.

SQL> drop tablespace UNDOTBS1 including contents and datafiles;
Tablespace dropped.

With these last steps completed, we could finally consider the problem solved, and we were all happy. What initially seemed like a problem that would take a few minutes to solve ended up taking almost an hour, but it was an excellent opportunity to learn and feel again the excitement and satisfaction of being able to solve problems, which is the main reason why I am in this business in the first place. See you next time folks!!!

Did you find this article interesting, did you have any doubts, do you want to suggest a topic to cover, leave me your comments or contact me me right now!

All Things Oracle

Troubleshoot: ora-600[kcrf_resilver_log_1] & ora-600[4193]

Leave a Reply Cancel reply

Related Posts

Troubleshoot: ORA-24247 when trying to send an email

Huge materialized view log? Fix it in 5 steps

How to refresh materialized views faster in 19c?

All Things Oracle

Troubleshoot: ora-600[kcrf_resilver_log_1] & ora-600[4193]

Leave a Reply Cancel reply

Related Posts

Troubleshoot: ORA-24247 when trying to send an email

Huge materialized view log? Fix it in 5 steps

How to refresh materialized views faster in 19c?

Need Help?