Sunday, January 21, 2007

Checkpoint progress record

The checkpoint process record is written to by the CKPT process every 3 seconds.This update is more commonly known as the heartbeat.This can be verified from the following query

SELECT CPHBT from X$KCCCP where rownum < 2
CPHBT
----------
612542768

In versions prior to 10G there was only one record. For a single
instance in 10G there are 8 records in the above structure.

You can also check the same from a controlfile dump using the following event

alter session set events 'immediate trace name controlf level 12';

In versions prior to 10G level 10 would suffice to obtain a logical dump but
in my testing on versions 10.1.0.x and 10.2.0.x on various platforms
I have been able to get a controlfile dump only at level 12.

From the generate trace file in udump

***************************************************************************
CHECKPOINT PROGRESS RECORDS
***************************************************************************
(size = 8180, compat size = 8180, section max = 11, section in-use = 0,
last-recid= 0, old-recno = 0, last-recno = 0)
(extent = 1, blkno = 2, numrecs = 11)
THREAD #1 - status:0x2 flags:0x0 dirty:68
low cache rba:(0x7d.e82d.0) on disk rba:(0x7d.e8d9.0)
on disk scn: 0x0000.0043bea2 01/22/2007 13:10:24
resetlogs scn: 0x0000.0008026d 11/16/2006 16:57:26
heartbeat: 612542768 mount id: 2427095148
......

To verify that the heartbeat happens every 3 secs you can do an strace on the CKPT
process.

eiger-> ps -ef | grep ckp | grep -v grep
oracle 2739 1 0 07:14 ? 00:00:04 ora_ckpt_HRDEV
eiger-> strace -tt -o /tmp/ckpt.out -p 2739
Process 2739 attached - interrupt to quit

From /tmp/ckpt.out

13:31:26.742402 gettimeofday({1169433086, 742475}, NULL) = 0
13:31:26.742531 times(NULL) = 431667132
13:31:26.742602 pwrite(16, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4a1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:26.743399 times(NULL) = 431667132
13:31:26.743529 times(NULL) = 431667132
13:31:26.743600 pwrite(17, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4a1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:26.744262 times(NULL) = 431667132
13:31:26.744338 times(NULL) = 431667132
13:31:26.744409 pwrite(18, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4a1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:26.744938 times(NULL) = 431667132
.........
13:31:29.750559 gettimeofday({1169433089, 750559}, NULL) = 0
13:31:29.750559 times(NULL) = 431667433
13:31:29.750559 pwrite(16, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4`1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:29.752055 times(NULL) = 431667433
13:31:29.752122 times(NULL) = 431667433
13:31:29.752659 pwrite(17, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4`1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:29.753758 times(NULL) = 431667433
13:31:29.753827 times(NULL) = 431667433
13:31:29.753896 pwrite(18, "\25\302\0\0\3\0\0\0\0\0\0\0\0\0\1\4`1\0\0\2\0\0\0\0\0\0"..., 16384, 49152) = 16384
13:31:29.754706 times(NULL) = 431667433

As seen from below file descriptors 16,17 and 18 correspond to the 3 controlfiles of this database

eiger-> cd /proc
eiger-> cd 2739
eiger-> cd fd
eiger-> ls -al 1?

eiger-> ls -al 1? | grep contro
lrwx------ 1 oracle oracle 64 Jan 22 13:42 16 -> /usr/opt/oracle/u01/oradata/HRDEV/control01.ctl
lrwx------ 1 oracle oracle 64 Jan 22 13:42 17 -> /usr/opt/oracle/u01/oradata/HRDEV/control02.ctl
lrwx------ 1 oracle oracle 64 Jan 22 13:42 18 -> /usr/opt/oracle/u01/oradata/HRDEV/control03.ctl

The block which holds the checkpoint progress record also holds the CFVRR (controlfile vote result record)which is used to resolve a split brain scenario when nodes in a cluster cannot talk to each other.

Thursday, January 18, 2007

RAC, Archivelog mode and cluster_database

In version prior to 10.2 one had to set the parameter cluster_database
to FALSE to switch the archivelog mode of a database

Starting with 10.2 you no longer need to modify the CLUSTER_DATABASE parameter setting to
change the archiving mode in RAC. You can change archivelog mode as long as
the database is mounted in the local instance and not open in any instances.

Lets' test the same...

elephant-> srvctl status database -d F8902PRD
Instance F8902PRD1 is running on node elephant
Instance F8902PRD2 is running on node rhino
Instance F8902PRD3 is running on node hippo

elephant-> sqlplus

SQL*Plus: Release 10.2.0.1.0 - Production on Fri Jan 19 09:12:21 2007

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Enter user-name: / as sysdba

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options

SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /u03/backup/F8902PRD/arch/
Oldest online log sequence 1103
Next log sequence to archive 1104
Current log sequence 1104

elephant-> srvctl stop database -d F8902PRD

elephant-> srvctl start instance -d F8902PRD -i F8902PRD1 -o mount
elephant-> srvctl status database -d F8902PRD
Instance F8902PRD1 is running on node elephant
Instance F8902PRD2 is not running on node rhino
Instance F8902PRD3 is not running on node hippo

SQL> show parameter cluster_database

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cluster_database boolean TRUE
cluster_database_instances integer 3
SQL> alter database noarchivelog;

Database altered.
SQL> archive log list
Database log mode No Archive Mode
Automatic archival Disabled
Archive destination /u03/backup/F8902PRD/arch/
Oldest online log sequence 1103
Current log sequence 1104

elephant-> srvctl stop instance -d F8902PRD -i F8902PRD1

Now lets try to change back to archivelog mode by mounting the database from more
than one instance.

elephant-> srvctl start instance -d F8902PRD -i F8902PRD1 -o mount
elephant-> srvctl start instance -d F8902PRD -i F8902PRD2 -o mount
elephant-> srvctl status database -d F8902PRD
Instance F8902PRD1 is running on node elephant
Instance F8902PRD2 is running on node rhino
Instance F8902PRD3 is not running on node hippo

SQL> show parameter cluster_database

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
cluster_database boolean TRUE
cluster_database_instances integer 3
SQL> alter database archivelog;

Database altered.

SQL> archive log list
Database log mode Archive Mode
Automatic archival Enabled
Archive destination /u03/backup/F8902PRD/arch/
Oldest online log sequence 1103
Next log sequence to archive 1104
Current log sequence 1104


elephant-> srvctl stop database -d F8902PRD
elephant-> srvctl start instance -d F8902PRD -i F8902PRD1 -o mount
elephant-> srvctl start instance -d F8902PRD -i F8902PRD2 -o mount
elephant-> srvctl start instance -d F8902PRD -i F8902PRD2 -o open ==> I made a typo here. It should have been F8902PRD3
instead of F8902PRD2 but notice srvctl does not generate any errors. So u don't know if instance 2 has actually opened the database.
elephant-> srvctl start instance -d F8902PRD -i F8902PRD3 -o open

SQL> select inst_id,open_mode from gv$database;

INST_ID OPEN_MODE
---------- ----------
1 MOUNTED
2 MOUNTED
3 READ WRITE

So instance 3 has the db open while instances 1 and 2 have just mounted the database.

SQL> alter database noarchivelog;
alter database noarchivelog
*
ERROR at line 1:
ORA-01126: database must be mounted in this instance and not open in any
instance


*****************************************************************************************
However if you need to recreate the controlfile you need to set cluster_database to false

elephant-> srvctl stop database -d F8902PRD
elephant-> srvctl start instance -d F8902PRD -i F8902PRD1 -o nomount
elephant-> srvctl status database -d F8902PRD
Instance F8902PRD1 is running on node elephant
Instance F8902PRD2 is not running on node rhino
Instance F8902PRD3 is not running on node hippo

SQL> select inst_id,DATABASE_STATUS from gv$instance;

INST_ID DATABASE_STATUS
---------- -----------------
1 ACTIVE

SQL> @controlfile.sql
CREATE CONTROLFILE REUSE DATABASE "F8902PRD" NORESETLOGS ARCHIVELOG
*
ERROR at line 1:
ORA-01503: CREATE CONTROLFILE failed
ORA-12720: operation requires database is in EXCLUSIVE mode

Wednesday, January 10, 2007

The Optimizer and cache related statistics.

Jonathan Lewis in his excellent book on the CBO talks about new functionality within Oracle’s optimizer wherein the optimizer can use cache related statistics of objects to evaluate a more optimal execution plan.

All the following testing has been done on 10.2.0.3 with the hidden parameter _cache_stats_monitor set to TRUE (default is false)

The table cache_stats_1$ which is used to monitor lifetime caching statistics (c0902000.sql) is populated by MMON periodically via a MERGE statement which does not seem to be using bind variables. You can also populate the same directly using

SQL> exec dbms_stats_internal.flush_cache_stats;

PL/SQL procedure successfully completed.

For the purpose of this discussion we will concentrate on a normal heap table SCOTT.T1 with data_object_id = 58417 which is a replica of dba_tables.


SQL> select * from cache_stats_1$ where DATAOBJ#=58417

DATAOBJ# INST_ID CACHED_AVG CACHED_SQR_AVG CACHED_NO CACHED_SEQ_NO
---------- ---------- ---------- -------------- ---------- -------------
CHR_AVG CHR_SQR_AVG CHR_NO CHR_SEQ_NO LGR_SUM LGR_LAST PHR_LAST
---------- ----------- ---------- ---------- ---------- ---------- ----------
SPARE1 SPARE2 SPARE3 SPARE4 SPARE5
---------- ---------- ---------- ---------- ----------
58417 1 118.889076 14192.4521 35 2644
0 0 1 2575 0 768 0

Now if we were to gather cache related statistics for this table

SQL> exec dbms_stats.gather_table_stats('SCOTT','T1',stattype=>'CACHE');

PL/SQL procedure successfully completed.

A trace on the above API call reveals the following statement

********************************************************
insert into tab_stats$ (obj#,cachedblk,cachehit,logicalread) values (:1,:2,:3,:4)

Bind#0
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=07723aa8 bln=22 avl=04 flg=05
value=58417
Bind#1
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=07723a84 bln=24 avl=03 flg=05
value=118
Bind#2
oacdty=02 mxl=22(00) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=07723a60 bln=24 avl=00 flg=05
Bind#3
oacdty=02 mxl=22(00) mxlc=00 mal=00 scl=00 pre=00
oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=07723a3c bln=24 avl=00 flg=05

********************************************************

Only the cachedblk column appears to be populated. (I haven’t had a chance to check this value against x$bh)
SQL> select * from tab_stats$ where obj#=58417;

OBJ# CACHEDBLK CACHEHIT LOGICALREAD ROWCNT BLKCNT EMPCNT
---------- ---------- ---------- ----------- ---------- ---------- ----------
AVGSPC CHNCNT AVGRLN AVGSPC_FLB FLBCNT ANALYZETI SAMPLESIZE
---------- ---------- ---------- ---------- ---------- --------- ----------
FLAGS SPARE1 SPARE2 SPARE3
---------- ---------- ---------- ----------
SPARE4
--------------------------------------------------------------------------------
SPARE5
--------------------------------------------------------------------------------
SPARE6
---------
58417 118



Checking the execution plans of statements with "_optimizer_cache_stats" set to TRUE and FALSE we find the cost of the table is almost half in the former.

SQL> conn scott/tiger
Connected.
SQL> set autotrace traceonly explain
SQL> alter session set "_optimizer_cache_stats" = true;

Session altered.

SQL> select count(*) from t1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 29 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| T1 | 6446 | 29 (0)| 00:00:01 |
-------------------------------------------------------------------


SQL> alter session set "_optimizer_cache_stats" = false;

Session altered.

SQL> select count(*) from t1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

-------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 56 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| T1 | 6446 | 56 (0)| 00:00:01 |
-------------------------------------------------------------------


Also for fixed objects the API dbms_stats.gather_fixed_objects_stats populates tab_stats$

Monday, January 08, 2007

Extracting info from AWR

The Automatic Workload Repository stores a wealth of performance related data in an Oracle 10g Database. Historical data can be viewed from the DBA_HIST* views depending on the retention period.

One can either write a query against any of the dba_hist_* views or use the API
dbms_sqltune.select_workload_repository

For e.g
To find out all sql stmts which took more than 1 hour to execute between 7:00 am and 10:00 am this morning

• Find the snapshots for that period
select snap_id,BEGIN_INTERVAL_TIME from DBA_HIST_SNAPSHOT
where BEGIN_INTERVAL_TIME between to_timestamp('09-JAN-2007 07:00:00','DD-MON-YYYY HH24:MI:SS')and to_timestamp('09-JAN-2007 10:00:00','DD-MON-YYYY HH24:MI:SS')


5064 09-JAN-07 09.00.45.370 AM
5062 09-JAN-07 07.00.45.786 AM
5063 09-JAN-07 08.00.47.623 AM

• Find the sql ids of statements which meet the criteria

SQL> select sql_id,ELAPSED_TIME_TOTAL,ELAPSED_TIME_DELTA from
dba_hist_sqlstat where snap_id between 5062 and 5064 and ELAPSED_TIME_TOTAL > 3600000000


5umujs4y5q05f 252683861535 14085330457
5umujs4y5q05f 266739511867 14055650332
5umujs4y5q05f 280852257550 14112745683

• Find the sql statement corresponding to this sql_id

SQL> select sql_text from dba_hist_sqltext where sql_id='5umujs4y5q05f';

SELECT A.JOURNAL_ID, B.SOURCE, B.ACCOUNTING_PERIOD, A.ACCOUNT, A.DEPTID, A.PROJE
CT_ID, A.CLASS_FLD, A.MONETARY_AMOUNT, A.LINE_DESCR, A.JRNL_LN_REF, TO_CHAR(A.JO
URNAL_DATE,'YYYY-MM-DD'), B.JRNL_HDR_STATUS, TO_CHAR(SYSDATE,'YYYY-MM-DD') FR
OM PS_JRNL_LN A, PS_JRNL_HEADER B, PS_DEPT_TBL C WHERE B.BUSINESS_UNIT = A.BU
SINESS_UNIT AND B.JOURNAL_ID = A.JOURNAL_ID AND B.JOURNAL_DATE = A.JOU
RNAL_DATE AND B.UNPOST_SEQ = A.UNPOST_SEQ AND ( C.EFFDT = (SE
LECT MAX(C_ED.EFFDT) FROM PS_DEPT_TBL C_ED WHERE C.SETID = C_ED.SETID
AND C.DEPTID = C_ED.DEPTID AND C_ED.EFFDT <= SYSDATE)
AND A.BUSINESS_UNIT = 'UNSYD' AND A.LEDGER = 'ACTUALS' AND A.CURREN
CY_CD = 'AUD' AND C.DEPTID = A.DEPTID AND B.FISCAL_YEAR = :1 A
ND B.ACCOUNTING_PERIOD BETWEEN :2 AND :3 AND ( A.ACCOUNT = :4 OR A.A
CCOUNT = :5 OR ( A.ACCOUNT = :6 OR A.ACCOUNT = :7 AND A.JOURNA
L_ID IN (SELECT D.JOURNAL_ID FROM PS_JRNL_LN D WHERE ( D.ACCOUNT = :8
OR D.ACCOUNT = :9)))) AND B.JRNL_HDR_STATUS IN ('P','U') ) ORDER BY 3
, 1, 4


You can obtain the same result with a single statement using the dbms_sqltune.select_workload_repository API as shown below

SELECT VALUE(P).sql_text FROM table(dbms_sqltune.select_workload_repository(5062,5064,'elapsed_time > 3600000000')) P ;

SELECT A.JOURNAL_ID, B.SOURCE, B.ACCOUNTING_PERIOD, A.ACCOUNT, A.DEPTID, A.PROJE
CT_ID, A.CLASS_FLD, A.MONETARY_AMOUNT, A.LINE_DESCR, A.JRNL_LN_REF, TO_CHAR(A.JO
URNAL_DATE,'YYYY-MM-DD'), B.JRNL_HDR_STATUS, TO_CHAR(SYSDATE,'YYYY-MM-DD') FR
OM PS_JRNL_LN A, PS_JRNL_HEADER B, PS_DEPT_TBL C WHERE B.BUSINESS_UNIT = A.BU
SINESS_UNIT AND B.JOURNAL_ID = A.JOURNAL_ID AND B.JOURNAL_DATE = A.JOU
RNAL_DATE AND B.UNPOST_SEQ = A.UNPOST_SEQ AND ( C.EFFDT = (SE
LECT MAX(C_ED.EFFDT) FROM PS_DEPT_TBL C_ED WHERE C.SETID = C_ED.SETID
AND C.DEPTID = C_ED.DEPTID AND C_ED.EFFDT <= SYSDATE)
AND A.BUSINESS_UNIT = 'UNSYD' AND A.LEDGER = 'ACTUALS' AND A.CURREN
CY_CD = 'AUD' AND C.DEPTID = A.DEPTID AND B.FISCAL_YEAR = :1 A
ND B.ACCOUNTING_PERIOD BETWEEN :2 AND :3 AND ( A.ACCOUNT = :4 OR A.A
CCOUNT = :5 OR ( A.ACCOUNT = :6 OR A.ACCOUNT = :7 AND A.JOURNA
L_ID IN (SELECT D.JOURNAL_ID FROM PS_JRNL_LN D WHERE ( D.ACCOUNT = :8
OR D.ACCOUNT = :9)))) AND B.JRNL_HDR_STATUS IN ('P','U') ) ORDER BY 3
, 1, 4


P.S You have to be licensed to use the Diagnostics Pack to use AWR and licensed to use Oracle Tuning Pack to use SQL Tuning Sets. For more information please refer to the Oracle Licensing documentation at
http://download-west.oracle.com/docs/cd/B19306_01/license.102/b14199/options.htm#sthref80