Thursday, November 28, 2013

Recover corrupt OCR without backup

We all know Oracle takes backups of OCR after every 4 hours,but what if we don't have backup  and OCR is corrupted,we cannot go for installing whole clusterware which could be a lengthy process.

If we are on 10g R2 and later version then this can be done without re-installing Clusterware ( if you have backup of root.sh or it’s not overwritten by any subsequent patch ) I tested this on my test machines.

We can get Current Voting Disk location

[root@racnode1 ~]# crsctl query css votedisk
 0. 0 /OCFS/VOT

located 1 votedisk(s).

Current OCR Files location

[root@racnode1 ~]# ocrcheck

Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262144
Used space (kbytes) : 4344
Available space (kbytes) : 257800
ID : 601339441

Device/File Name : /OCFS/OCR
Device/File integrity check succeeded

Device/File Name : /OCFS/OCR2
Device/File integrity check succeeded

Cluster registry integrity check succeeded

Output of CRS_STAT

[root@racnode1 ~]# crs_stat -t

Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE racnode1
ora….C1.lsnr application ONLINE ONLINE racnode1
ora.racnode1.gsd   application ONLINE ONLINE racnode1
ora.racnode1.ons   application ONLINE ONLINE racnode1
ora.racnode1.vip   application ONLINE ONLINE racnode1
ora….SM2.asm application ONLINE ONLINE racnode2
ora….C2.lsnr application ONLINE ONLINE racnode2
ora.racnode2.gsd   application ONLINE ONLINE racnode2
ora.racnode2.ons   application ONLINE ONLINE racnode2
ora.racnode2.vip   application ONLINE ONLINE racnode2
ora.test.AP.cs application ONLINE ONLINE racnode1
ora….st1.srv application ONLINE ONLINE racnode1
ora.test.db    application ONLINE ONLINE racnode2
ora….t1.inst application ONLINE ONLINE racnode1
ora….t2.inst application ONLINE ONLINE racnode2

I stopped clusterware on both nodes and removed OCR & Voting Disks.

[root@racnode1 ~]# ls -lrt /OCFS/*

-rw-r–r– 1 root root 399507456 Jun 29 14:05 /OCFS/OCR2
-rw-r—– 1 root oinstall 10485760 Jun 29 14:05 /OCFS/OCR
-rw-r–r– 1 oracle oinstall 10240000 Jun 29 14:05 /OCFS/VOT

[root@racnode1 ~]# rm -fr /OCFS/*

Tried again to start Cluster

[root@racnode1 ~]# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly



Clusterware could not startup.

[root@racnode1 ~]# crsctl check crs

Failure 1 contacting CSS daemon
Cannot communicate with CRS
Cannot communicate with EVM

Thrown error in /tmp/crsct.* file about OCR

[root@racnode1 ~]# cat /tmp/crsc*

OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]

 Here, I lost all my OCR & Voting disk.

Below procedure can be used for recovery.

1) Execute rootdelete.sh script from All Nodes.
2) Execute rootdeinstall.sh from Primary Node.
3) Run root.sh from Primary node.
4) Run root.sh from all remaining nodes.
5) Execute remaining configurations (ONS,netca,register required resources)

1) Executing rootdelete.sh on all nodes, this script can be found under $ORA_CRS_HOME/install/

[root@racnode1 ~]# /u01/app/oracle/product/crs/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down…
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in ‘/etc/oracle/scls_scr’

[root@racnode2 ~]# /u01/app/oracle/product/crs/install/rootdelete.sh
Shutting down Oracle Cluster Ready Services (CRS):
OCR initialization failed accessing OCR device: PROC-26: Error while accessing the physical storage Operating System error [No such file or directory] [2]
Shutdown has begun. The daemons should exit soon.
Checking to see if Oracle CRS stack is down…
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in ‘/etc/oracle/scls_scr’

OCR initialization error can be safely ignored.

2) Execute rootdeinstall.sh on Primary Node, this script can also be found under $ORA_CRS_HOME/install

[root@racnode1 ~]# /u01/app/oracle/product/crs/install/rootdeinstall.sh
Removing contents from OCR mirror device
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 0.031627 seconds, 332 MB/s
Removing contents from OCR device
2560+0 records in
2560+0 records out
10485760 bytes (10 MB) copied, 0.029947 seconds, 350 MB/s

3) Run root.sh on Primary node, this will create VOT & OCR files.

[root@racnode1 ~]# $ORA_CRS_HOME/root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
“/OCFS/VOT” does not exist. Create it before proceeding.
Make sure that this file is shared across cluster nodes.
1

I had to touch this file to proceed

[root@racnode1 ~]# touch /OCFS/VOT
[root@racnode1 ~]# $ORA_CRS_HOME/root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
assigning default hostname racnode1 for node 1.
assigning default hostname racnode2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: racnode1 racnode1-priv racnode1
node 2: racnode2 racnode2-priv racnode2
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
Now formatting voting device: /OCFS/VOT
Format of 1 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
racnode1
CSS is inactive on these nodes.
racnode2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.

4) Run root.sh from all remaining nodes.

[root@racnode2 crs]# ./root.sh
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory ‘/u01/app/oracle/product’ is not owned by root
WARNING: directory ‘/u01/app/oracle’ is not owned by root
WARNING: directory ‘/u01/app’ is not owned by root
WARNING: directory ‘/u01′ is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
assigning default hostname racnode1 for node 1.
assigning default hostname racnode2 for node 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node :
node 1: racnode1 racnode1-priv racnode1
node 2: racnode2 racnode2-priv racnode2
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
racnode1
racnode2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps

Creating VIP application resource on (2) nodes…
Creating GSD application resource on (2) nodes…
Creating ONS application resource on (2) nodes…
Starting VIP application resource on (2) nodes…
Starting GSD application resource on (2) nodes…
Starting ONS application resource on (2) nodes…

Done.

Clusterware is up and running

[root@racnode2 crs]# crs_stat -t

Name Type Target State Host
————————————————————
ora.racnode1.gsd application ONLINE ONLINE racnode1
ora.racnode1.ons application ONLINE ONLINE racnode1
ora.racnode1.vip application ONLINE ONLINE racnode2
ora.racnode2.gsd application ONLINE ONLINE racnode2
ora.racnode2.ons application ONLINE ONLINE racnode2
ora.racnode2.vip application ONLINE ONLINE racnode2

5) Remaining Configuration

       a) Configuring Server side ONS

[root@racnode1 crs]# $ORA_CRS_HOME/bin/racgons add_config racnode1:6200 racnode2:6200

         b) Listener Configuration usint netca

we might want to remove listener.ora from both nodes as entries may exist already. Take backup or orignial listener.ora  and use netca to configure &
register with OCR. Till 10g, we can not register listener using srvctl

Renaming orginal listener.ora

[oracle@racnode1 ~]$ mv $ORACLE_HOME/network/admin/listener.ora $ORACLE_HOME/network/admin/listener.ora.orig
[oracle@racnode1 ~]$ ssh racnode2 mv $ORACLE_HOME/network/admin/lstener.ora $ORACLE_HOME/network/admin/listener.ora.orig

          c)Adding ASM, Instance, Database

[oracle@racnode1 ~]$ srvctl add asm -i +ASM1 -n racnode1 -o /u01/app/oracle/product/10.2.0/db_1
[oracle@racnode1 ~]$ srvctl add asm -i +ASM2 -n racnode2 -o /u01/app/oracle/product/10.2.0/db_1
[oracle@racnode1 ~]$ srvctl add database -d test -o /u01/app/oracle/product/10.2.0/db_1
[oracle@racnode1 ~]$ srvctl add instance -d test -i test1 -n racnode1
[oracle@racnode1 ~]$ srvctl add instance -d test -i test2 -n racnode2

I restarted both nodes, got everything back. Yes, Services can be re-created.

[oracle@racnode1 ~]$ crs_stat -t

Name Type Target State Host
————————————————————
ora….SM1.asm application ONLINE ONLINE racnode1
ora….C1.lsnr application ONLINE ONLINE racnode1
ora.racnode1.gsd   application ONLINE ONLINE racnode1
ora.racnode1.ons   application ONLINE ONLINE racnode1
ora.racnode1.vip   application ONLINE ONLINE racnode1
ora….SM2.asm application ONLINE ONLINE racnode2
ora….C2.lsnr application ONLINE ONLINE racnode2
ora.racnode2.gsd   application ONLINE ONLINE racnode2
ora.racnode2.ons   application ONLINE ONLINE racnode2
ora.racnode2.vip   application ONLINE ONLINE racnode2
ora.test.db    application ONLINE ONLINE racnode2
ora….t1.inst application ONLINE ONLINE racnode1
ora….t2.inst application ONLINE ONLINE racnode2

Note : That’s why it’s recommended to take backup of root.sh after fresh install as subsequent patches can
overwrite root.sh script.

This is described on Metalink Note: 399482.1

No comments: