Oracle Home permissions corrupted? Solve it in a jiffy!

I was in the middle of configuring a couple of Oracle 19c Clusters for an upgrade project from Oracle Database Server 11.2 to 19c, when I rebooted one of the computers (node2) I noticed that the services were not available, for no apparent reason, since this particular Cluster had been operating normally for several weeks.

The problem

As part of the work to copy Linux users and groups from Clusters 11.2 to these new Clusters 19c, a task was accidentally executed that started modifying the owner and group of the files in the /u01 filesystem, where all the Oracle Grid and Oracle Database Server binaries reside.

Although the process was terminated almost immediately, the damage was already done: the contents of the Oracle Home of Grid 19c, as well as the contents of the Oracle Home of Oracle Database Server 19c and 11.2 were altered and the Oracle software was rendered inoperative.

[oracle@node2 ~]$ ls -la /u01/app/
total 4
drwxr-xr-x.  5 root      oinstall    52 Aug 24 12:46 .
drwxr-xr-x.  3 root      oinstall    17 Aug 24 12:35 ..
drwxr-xr-x.  3 haldaemon haldaemon   20 Aug 24 12:35 grid
drwxr-xr-x. 13 oracle    oinstall  4096 Aug 25 20:07 oracle
drwxrwx---.  5 oracle    oinstall    92 Sep 28 11:25 oraInventory

[oracle@node2 ~]$ ls -la /u01/app/oracle
total 8
drwxr-xr-x. 13 oracle    oinstall  4096 Aug 25 20:07 .
drwxr-xr-x.  5 root      oinstall    52 Aug 24 12:46 ..
drwxr-xr-x.  3 haldaemon haldaemon   18 Aug 24 16:50 11.2.0
drwxr-xr-x.  3 haldaemon haldaemon   18 Aug 24 12:35 19.0.0
drwxr-xr-x.  5 oracle    oinstall    45 Aug 24 19:14 admin
drwxr-x---.  3 oracle    oinstall    21 Aug 24 18:34 audit
drwxrwxr-x.  6 oracle    oinstall    60 Aug 24 18:12 cfgtoollogs

[oracle@node2 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager

First attempt

Searching in My Oracle Support you will find this document:

1931142.1How to check and fix file permissions on Grid Infrastructure environment

Although it only applies to the Oracle Home of Grid Infrastructure, we have to start somewhere, so we follow the procedure indicated there, starting with stopping the Grid processes and changing the owner and group of the directories to their default values: oracle:oinstall.

[root@node2 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'node2'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'node2'
CRS-2677: Stop of 'ora.drivers.acfs' on 'node2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'node2' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@node2 ~]# chown -R oracle:oinstall /u01/app

[root@node2 ~]# cd $ORACLE_HOME/crs/install/

[root@node2 install]# ./rootcrs.sh -init
Smartmatch is deprecated at /u01/app/grid/19.0.0/grid_1/crs/install/crsupgrade.pm line 6512.
Using configuration parameter file: /u01/app/grid/19.0.0/grid_1/crs/install/crsconfig_params
The log of current session can be found at:
  /u01/app/oracle/crsdata/node2/crsconfig/crsinit_node2_2024-09-26_03-45-33PM.log

We proceed to validate the Oracle Home status of Grid:

[oracle@node2 ~]$ cluvfy comp software -n node2 -verbose
This software is "184" days old. It is a best practice to update the CRS home by downloading and applying the latest release update. Refer to MOS note 756671.1 for more details.

Performing following verification checks ...

  Software home: /u01/app/grid/19.0.0/grid_1 ...
  Node Name     Status                    Comment
  ------------  ------------------------  ------------------------
  node2         failed                    1256 files verified
  Software home: /u01/app/grid/19.0.0/grid_1 ...FAILED (PRVG-2031, PRVG-11960, PRVH-0147)

Verification of software was unsuccessful on all the specified nodes.

Failures were encountered during execution of CVU verification request "software".

Software home: /u01/app/grid/19.0.0/grid_1 ...FAILED
node2: PRVG-2031 : Owner of file
       "/u01/app/grid/19.0.0/grid_1/lib/liboramysql19.so" did not match the
       expected value on node "node2". [Expected = "root(0)" ; Found =
       "oracle(54321)"]
node2: PRVG-2031 : Owner of file "/u01/app/grid/19.0.0/grid_1/bin/expdp" did
       not match the expected value on node "node2". [Expected = "root(0)" ;
       Found = "oracle(54321)"]
node2: PRVG-2031 : Owner of file "/u01/app/grid/19.0.0/grid_1/bin/sqlldr" did
       not match the expected value on node "node2". [Expected = "root(0)" ;
       Found = "oracle(54321)"]
. . .
node2: PRVG-2031 : Owner of file "/u01/app/grid/19.0.0/grid_1/bin/okdstry" did
       not match the expected value on node "node2". [Expected = "root(0)" ;
       Found = "oracle(54321)"]
node2: PRVG-2031 : Owner of file "/u01/app/grid/19.0.0/grid_1/bin/okinit" did
       not match the expected value on node "node2". [Expected = "root(0)" ;
       Found = "oracle(54321)"]
node2: PRVG-2031 : Owner of file "/u01/app/grid/19.0.0/grid_1/bin/oklist" did
       not match the expected value on node "node2". [Expected = "root(0)" ;
       Found = "oracle(54321)"]
node2: PRVG-2031 : Owner of file
       "/u01/app/grid/19.0.0/grid_1/bin/oradaemonagent" did not match the
       expected value on node "node2". [Expected = "root(0)" ; Found =
       "oracle(54321)"]

The same note indicates that if the execution of rootcrs.sh -init is insufficient, a manual change must be made using the information contained in the crsconfig_fileperms and crsconfig_dirs files, but this is time-consuming and error-prone.

Fortunately, My Oracle Support has this document:

1515018.1Script to capture and restore file permission in a directory (for eg. ORACLE_HOME)

The note indicates that the first step is to download a perl program, called permission.pl.

This program requires as only parameter the directory we want to process, and as a result it will generate a file with the Linux commands to reconstruct the owner, group and permissions of each and every file in that directory!

It seems to be the ultimate solution, but is it true? Let’s see.

Second attempt

We start by executing the permission.pl program on node1, since it was not affected and operates without problems.

Note: it must be run as root user.

[root@node1 ~]# ./permission.pl /u01/app
Following log files are generated
logfile      : permission-Thu-Sep-26-16-58-50-2024
Command file : restore-perm-Thu-Sep-26-16-58-50-2024.cmd
Linecount : 147029

It was finished in less than 10 seconds and the “.cmd” file actually contains the promised commands.

[root@node1 ~]# more restore-perm-Thu-Sep-26-16-58-50-2024.cmd
chown  root:oinstall "/u01/app"
chmod  755 "/u01/app"
chown  root:oinstall "/u01/app/grid"
chmod  755 "/u01/app/grid"
chown  root:oinstall "/u01/app/grid/19.0.0"
chmod  755 "/u01/app/grid/19.0.0"
chown  root:oinstall "/u01/app/grid/19.0.0/grid_1"
chmod  755 "/u01/app/grid/19.0.0/grid_1"
chown  oracle:oinstall "/u01/app/grid/19.0.0/grid_1/env.ora"
chmod  644 "/u01/app/grid/19.0.0/grid_1/env.ora"
chown  root:oinstall "/u01/app/grid/19.0.0/grid_1/root.sh"
chmod  755 "/u01/app/grid/19.0.0/grid_1/root.sh"
chown  oracle:oinstall "/u01/app/grid/19.0.0/grid_1/welcome.html"
chmod  640 "/u01/app/grid/19.0.0/grid_1/welcome.html"
chown  oracle:oinstall "/u01/app/grid/19.0.0/grid_1/gridSetup.sh"
chmod  750 "/u01/app/grid/19.0.0/grid_1/gridSetup.sh"
chown  oracle:oinstall "/u01/app/grid/19.0.0/grid_1/runcluvfy.sh"
chmod  750 "/u01/app/grid/19.0.0/grid_1/runcluvfy.sh"
chown  oracle:oinstall "/u01/app/grid/19.0.0/grid_1/gridSetup.rsp"
chmod  644 "/u01/app/grid/19.0.0/grid_1/gridSetup.rsp"

We copy the file to the affected computer, but we must take into account that the name of the computer and the name of the ASM instance are different, so we must adapt them in the script before executing it.

[root@node1 ~]# sed -i 's/ASM1/ASM2/g' restore-perm-Thu-Sep-26-16-58-50-2024.cmd
[root@node1 ~]# sed -i 's/asm1/asm2/g' restore-perm-Thu-Sep-26-16-58-50-2024.cmd
[root@node1 ~]# sed -i 's/node1/node2/g' restore-perm-Thu-Sep-26-16-58-50-2024.cmd

[root@node1 ~]# sh restore-perm-Thu-Sep-26-16-58-50-2024.cmd
chown: cannot access ‘/u01/app/grid/19.0.0/grid_1/gpnp/gpnp_bcp__2024_8_24_175252’: No such file or directory
chmod: cannot access ‘/u01/app/grid/19.0.0/grid_1/gpnp/gpnp_bcp__2024_8_24_175252’: No such file or directory
chown: cannot access ‘/u01/app/grid/19.0.0/grid_1/gpnp/gpnp_bcp__2024_8_24_175252/stg__node2’: No such file or directory
chmod: cannot access ‘/u01/app/grid/19.0.0/grid_1/gpnp/gpnp_bcp__2024_8_24_175252/stg__node2’: No such file or directory
. . .
chmod: cannot access ‘/u01/app/grid/19.0.0/grid_1/cfgtoollogs/oui/GridSetupActions2024-08-24_05-56-29PM/time2024-08-24_05-56-29PM.log’: No such file or directory
chown: cannot access ‘/u01/app/grid/19.0.0/grid_1/crf/admin/run/crfmond/snode2.pid’: No such file or directory
chmod: cannot access ‘/u01/app/grid/19.0.0/grid_1/crf/admin/run/crfmond/snode2.pid’: No such file or directory

After a few minutes, it finally completes. Although several error messages are displayed, they can be ignored as they are log and trace files, which do not affect the operation of the Oracle software.

We proceed to validate again the Oracle Home status of Grid:

[oracle@node2 ~]$ cluvfy comp software -n node2 -verbose
This software is "184" days old. It is a best practice to update the CRS home by downloading and applying the latest release update. Refer to MOS note 756671.1 for more details.

Performing following verification checks ...

  Software home: /u01/app/grid/19.0.0/grid_1 ...
  Node Name     Status                    Comment
  ------------  ------------------------  ------------------------
  node2         passed                    1256 files verified
  Software home: /u01/app/grid/19.0.0/grid_1 ...PASSED

Verification of software was successful.

CVU operation performed:      software
Date:                         Sep 26, 2024 17:15:11 PM
CVU version:                  19.23.0.0.0 (032824x8664)
Clusterware version:          19.0.0.0.0
CVU home:                     /u01/app/grid/19.0.0/grid_1
Grid home:                    /u01/app/grid/19.0.0/grid_1
User:                         oracle
Operating system:             Linux5.4.17-2136.330.7.1.el7uek.x86_64

Excellent! Everything seems to be in order, but the litmus test is to see if the Clusterware starts and if the databases dbrac (Oracle Database 11.2) and orcl (Oracle Database 19c) start.

[root@node2 ~]# crsctl start crs
CRS-4123: Oracle High Availability Services has been started.

[root@node2 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

[root@node2 ~]# olsnodes -s
node1  Active
node2  Active
node3  Active

[root@node2 ~]# ps -ef | grep pmon
oracle   16125     1  0 17:24 ?        00:00:00 asm_pmon_+ASM2
oracle   16380     1  0 17:24 ?        00:00:00 ora_pmon_dbrac1_2
oracle   16503     1  0 17:24 ?        00:00:00 ora_pmon_orcl1_2

Confirmed: everything operates normally and we can consider the case closed and continue with the day’s tasks.

We did have a scare, but fortunately a solution was found that saved us from having to reinstall the software, which by the way is not that complicated if you are used to using Golden Images.

Don’t know what they are? Well, what are you waiting for to learn about it! I have a whole series of articles you can start reading right now.

Did you find this article interesting, did you have any doubts, do you want to suggest a topic to cover, leave me your comments or contact me me right now!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Posts

Learn how to resolve the CRS-2304 GPnP profile signature verification failed error when starting an 11.2 database on a 19c cluster.
Learn how to improve the security of your applications using Oracle Wallet / Secure External Password Store.
What to do when we find that the database returns the time advanced or delayed for no apparent reason.

Need Help?

Fill in these details and I will be in touch as soon as possible.