Tuesday, November 06, 2007

OCR Mirroring

In my last blog I talked about adding mirrors to your voting disk. Here I will document how to add mirrors to your OCR on a clustered file system like vxfs.

In the existing configuration there is only one ocr. The location of your ocr is specified in /var/opt/oracle/ocr.loc (on Solaris). It would be different depending on your port.

You need to add an OCR mirror when CRS is up else you get an error message like the one below

root@bart # /u01/app/oracle/product/10.1.0/crs/bin/ocrconfig -replace ocrmirror /u02/oracrs/ocr_file_02
PROT-1: Failed to initialize ocrconfig

For more details on why you have received an error check the latest ocrconfig log file under $ORA_CRS_HOME/log/node/client

root@bart # cat ocrconfig_1742.log
Oracle Database 10g CRS Release 10.2.0.1.0 Production Copyright 1996, 2005 Oracle. All rights reserved.
2007-10-27 12:32:19.130: [ OCRCONF][1]ocrconfig starts...
2007-10-27 12:32:20.933: [ OCRCONF][1]Failure in initializing ocr in DEFAULT level. error:[PROC-32: Cluster Ready Services on the local node i
s not running Messaging error [9]]
2007-10-27 12:32:20.933: [ OCRCONF][1]Exiting [status=failed]...

From the above error message it is clear that CRS needs to be up.

So after starting up CRS

root@bart # /u01/app/oracle/product/10.1.0/crs/bin/ocrconfig -replace ocrmirror /u02/oracrs/ocr_file_02
PROT-16: Internal Error

But even though I get an error it appears to have created the mirror

root@bart # cat /var/opt/oracle/ocr.loc
#Device/file getting replaced by device
/u02/oracrs/ocr_file_02
ocrconfig_loc=/u02/oracrs/ocr.dbf
ocrmirrorconfig_loc=/u02/oracrs/ocr_file_02
local_only=falseroot@bart

# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 24932
Available space (kbytes) : 237188
ID : 1909893349
Device/File Name : /u02/oracrs/ocr.dbf
Device/File integrity check succeeded
Device/File Name : /u02/oracrs/ocr_file_02
Device/File integrity check succeeded

Now let us replace the existing OCR with a new one file

u01/app/oracle/product/10.1.0/crs/bin/ocrconfig -replace ocr /u02/oracrs/ocr_file_01
PROT-16: Internal Error

root@bart # cat /var/opt/oracle/ocr.loc
#Device/file /u02/oracrs/ocr.dbf getting replaced by device
/u02/oracrs/ocr_file_01
ocrconfig_loc=/u02/oracrs/ocr_file_01
ocrmirrorconfig_loc=/u02/oracrs/ocr_file_02
local_only=falseroot@bart

# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 24932
Available space (kbytes) : 237188
ID : 1909893349
Device/File Name : /u02/oracrs/ocr_file_01
Device/File integrity check succeeded
Device/File Name : /u02/oracrs/ocr_file_02
Device/File integrity check succeeded

Cluster registry integrity check succeeded

At this point you can shutdown and startup CRS and see if all is fine. One point to note is that you should check that the ocr.loc files on all remote nodes are updated with the updated location of the OCR file. In my testing the CRS stack on the remote node was down and in such cases the ocr.loc on the remote node does not get updated.

When the cluster is running if the OCR mirror is removed/corrupted the cluster continues running and all crs commands including ocrcheck can be run if you comment the following line from /var/opt/oracle/ocr.loc
ocrmirrorconfig_loc=/u02/oracrs/ocr_file_02

If you lose the primary OCR the cluster keeps running but
ocrcheck fails and CRS cannot be stopped even if you comment out the following line even though the mirror is available.

root@bart # cat /var/opt/oracle/ocr.loc
#Device/file /u02/oracrs/ocr_file_01 getting replaced by device /u02/oracrs/ocr_file_01
#ocrconfig_loc=/u02/oracrs/ocr_file_01
ocrmirrorconfig_loc=/u02/oracrs/ocr_file_02

Things are fine if I replace the ocr using
root@bart # /u01/app/oracle/product/10.1.0/crs/bin/ocrconfig -replace ocr /u02/oracrs/ocr_file_01
PROT-16: Internal Error

Saturday, November 03, 2007

Voting disk mirroring

A while ago I had blogged on how to add a voting disk to an existing RAC setup here.


To be safe it is best to do this with the cluster down.
I did the same recently on a 2 node RAC on 10.2.0.3 running Veritas SFRAC 4.1.

In the setup I am working on there is only one voting disk the location of which is
root@bart # /u01/app/oracle/product/10.1.0/crs/bin/crsctl query css votedisk
0. 0 /u02/oravoting/voting

With CRS down on all nodes you have to use the force option else you receive an error

root@bart # /u01/app/oracle/product/10.1.0/crs/bin/crsctl add css votedisk /u02/oravoting/voting_disk_01
Cluster is not in a ready state for online disk addition

root@bart # /u01/app/oracle/product/10.1.0/crs/bin/crsctl add css votedisk /u02/oravoting/voting_disk_01 -force
Now formatting voting disk: /u02/oravoting/voting_disk_01
CLSFMT returned with error [4].
failed 9 to initailize votedisk /u02/oravoting/voting_disk_01

root@bart # cd /u02/oravoting/
root@bart # ls -ltr
total 99992
-rw-r--r-- 1 oracle oinstall 10238976 Oct 30 11:53 voting
-rw-r--r-- 1 root other 10240000 Oct 30 12:05 voting_disk_01

So although it has created a second voting disk it has been created with the wrong permissions. When you do a fresh install of CRS the rootconfig script called from root.sh sets the owner of the voting disk to the owner of the crs software (in this case oracle). Hence this is a bug and the startup of css fails and you will see errors in the Unix logfile since the permissions are incorrect.

Lets’ add a second one..

root@bart # /u01/app/oracle/product/10.1.0/crs/bin/crsctl add css votedisk /u02/oravoting/voting_disk_02 -force
Now formatting voting disk: /u02/oravoting/voting_disk_02
CLSFMT returned with error [4].
failed 9 to initailize votedisk /u02/oravoting/voting_disk_02

If we do not change permissions of the second mirror css will startup (since we have 2 voting disks online and the algorithm requires that a majority of disks to be available) but the following message will be logged in the ocssd.log file

[ CSSD]2007-11-01 11:46:26.400 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//u02/oravoting/voting)
[ CSSD]2007-11-01 11:46:26.401 [6] >TRACE: clssnmvDPT: spawned for disk 0
(/u02/oravoting/voting)
[ CSSD]2007-11-01 11:46:26.402 [1] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk
(1//u02/oravoting/voting_disk_01
)
[ CSSD]2007-11-01 11:46:26.402 [7] >TRACE: clssnmvDPT: spawned for disk 1 (/u02/oravoting/voting_disk_01)
[ CSSD]2007-11-01 11:46:26.403 [1] >TRACE: clssnmDiskStateChange: state from 1
to 2 disk (2//u02/oravoting/voting_disk_02
)
[ CSSD]2007-11-01 11:46:26.404 [8] >TRACE: clssnmvDPT: spawned for disk 2 (/u02/oravoting/voting_disk_02)
[ CSSD]2007-11-01 11:46:26.405 [8] >ERROR: Internal Error Information:
Category: 1234
Operation: scls_block_open
Location: open
Other: open failed /u02/oravoting/voting_disk_02
Dep: 9

[ CSSD]2007-11-01 11:46:26.405 [8] >TRACE: clssnmvDiskOpen: Unable to open voting device (2:
/u02/oravoting/voting_disk_0
2)

If you were to change the permissions to oracle:dba the 2nd mirror you would not see the above.

Remember that when you install CRS in 10.2 and use normal redundancy for your voting disk configuration it is mandatory to have 2 mirrors. The installer does not allow you to proceed with one mirror.

If you are adding mirrors manually using crsctl it is recommended that you add 2 mirrors. If you had a 1+1 configuration css will not come up if the mirror is unavailable/corrupt.

If you are deleting a voting disk you have to update the following files in $ORA_CRS_HOME/install with the new voting disk location

paramfile.crs
params.crs
rootconfig

This is required if you need to re-initialize your OCR by re-running root.sh assuming your OCR backups are stuffed and you cannot restore your OCR.