Sunday, January 21, 2007

Checkpoint progress record

The checkpoint process record is written to by the CKPT process every 3 seconds.This update is more commonly known as the heartbeat.This can be verified from the following query

SELECT CPHBT from X$KCCCP where rownum < 2
CPHBT
----------
612542768

In versions prior to 10G there was only one record. For a single
instance in 10G there are 8 records in the above structure.

You can also check the same from a controlfile dump using the following event

alter session set events 'immediate trace name controlf level 12';

In versions prior to 10G level 10 would suffice to obtain a logical dump but
in my testing on versions 10.1.0.x and 10.2.0.x on various platforms
I have been able to get a controlfile dump only at level 12.

From the generate trace file in udump

***************************************************************************
CHECKPOINT PROGRESS RECORDS
***************************************************************************
(size = 8180, compat size = 8180, section max = 11, section in-use = 0,
last-recid= 0, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 2, numrecs = 11)
THREAD #1 - status:0x2 flags:0x0 dirty:68
low cache rba:(0x7d.e82d.0) on disk rba:(0x7d.e8d9.0)
on disk scn: 0x0000.0043bea2 01/22/2007 13:10:24
resetlogs scn: 0x0000.0008026d 11/16/2006 16:57:26
heartbeat: 612542768 mount id: 2427095148
......

To verify that the heartbeat happens every 3 secs you can do an strace on the CKPT
process.

eiger-> ps -ef | grep ckp | grep -v grep
oracle 2739 1 0 07:14 ? 00:00:04 ora_ckpt_HRDEV
eiger-> strace -tt -o /tmp/ckpt.out -p 2739
Process 2739 attached - interrupt to quit

From /tmp/ckpt.out

13:31:26.742402 gettimeofday({1169433086, 742475}, NULL) = 0
13:31:26.742531 times(NULL) = 431667132
13:31:26.742602 pwrite(16, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4a1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:26.743399 times(NULL) = 431667132
13:31:26.743529 times(NULL) = 431667132
13:31:26.743600 pwrite(17, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4a1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:26.744262 times(NULL) = 431667132
13:31:26.744338 times(NULL) = 431667132
13:31:26.744409 pwrite(18, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4a1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:26.744938 times(NULL) = 431667132
.........
13:31:29.750559 gettimeofday({1169433089, 750559}, NULL) = 0
13:31:29.750559 times(NULL) = 431667433
13:31:29.750559 pwrite(16, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4`1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:29.752055 times(NULL) = 431667433
13:31:29.752122 times(NULL) = 431667433
13:31:29.752659 pwrite(17, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4`1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:29.753758 times(NULL) = 431667433
13:31:29.753827 times(NULL) = 431667433
13:31:29.753896 pwrite(18, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4`1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:29.754706 times(NULL) = 431667433

As seen from below file descriptors 16,17 and 18 correspond to the 3 controlfiles of this database

eiger-> cd /proc
eiger-> cd 2739
eiger-> cd fd
eiger-> ls -al 1?

eiger-> ls -al 1? | grep contro
lrwx------ 1 oracle oracle 64 Jan 22 13:42 16 -> /usr/opt/oracle/u01/oradata/HRDEV/control01.ctl
lrwx------ 1 oracle oracle 64 Jan 22 13:42 17 -> /usr/opt/oracle/u01/oradata/HRDEV/control02.ctl
lrwx------ 1 oracle oracle 64 Jan 22 13:42 18 -> /usr/opt/oracle/u01/oradata/HRDEV/control03.ctl

The block which holds the checkpoint progress record also holds the CFVRR (controlfile vote result record)which is used to resolve a split brain scenario when nodes in a cluster cannot talk to each other.

4 comments:

Alex Gorbachev said...

in 10G there are 8 records

Why?

Fairlie Rego said...

Good question. Dunno. Will hopefully soon find out..

Thanks
Fairlie

Anonymous said...

The other 7 are there for future use.

Fairlie Rego said...

Thanks Sudhi.