el-caro: 2009

Saturday, December 05, 2009

Oracle Advanced Compression Advisor

Am currently working on how best to use compression techniques in an OLTP environment on a current 10g database which is due to be migrated to 11.2 in the next 6 months and I came across this OTN note

http://www.oracle.com/technology/products/database/compression/compression-advisor.html

The following statement got me very interested

This package can be used on Oracle Databases running Oracle Database 9i Release 2 through 11g Release 1.
A compression advisor (DBMS_COMPRESSION) comes packaged with Oracle Database 11g Release 2.

So lets give it a try...

I downloaded the package and compiled it..

SQL> @dbmscomp.sql

Package created.

Synonym created.

Grant succeeded.

No errors.

SQL> @prvtcomp.plb

Package body created.

Synonym created.

Grant succeeded.

No errors.

SQL> set serveroutput on

SQL> exec dbms_comp_advisor.getratio('OWNER1','CROSS','OLTP',1);

Invalid Compression option for the current COMPAT setting

Thankfully the package body is not encrypted and a quick look at the code reveals that it does the following

SQL> create table DBMS_COMPRESSION_TEMP_UNCMP as select /*+ full('CROSS') */ * from cross sample block( 1);

Table created.

SQL> create table DBMS_COMPRESSION_TEMP_CMP compress for all operations as select * from DBMS_COMPRESSION_TEMP_UNCMP;
create table DBMS_COMPRESSION_TEMP_CMP compress for all operations as select * from DBMS_COMPRESSION_TEMP_UNCMP
*
ERROR at line 1:
ORA-00922: missing or invalid option

Hmmm... not sure how they expect "compress for all operations" to work in a pre 11g database...

Friday, September 18, 2009

Interrupting Flashback Database

So a user called me up and said he was flashing back a database and he was not too happy with the amount of time it was taking to complete the operation and so he did a Ctrl-C...

When he tried to open the database this is what he saw..

==================================================================================

SQL> shutdown abort
ORACLE instance shut down.
SQL> startup
ORACLE instance started.

Total System Global Area 838860800 bytes
Fixed Size 2033832 bytes
Variable Size 520099672 bytes
Database Buffers 310378496 bytes
Redo Buffers 6348800 bytes
Database mounted.
ORA-38760: This database instance failed to turn on flashback database

SQL> alter database flashback off;

Database altered.

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-38760: This database instance failed to turn on flashback database

===>... Well he just turned off flashback..

SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01139: RESETLOGS option only valid after an incomplete database recovery

==================================================================================

Looking through the alert.log

Fri Sep 18 08:35:05 2009
flashback database to restore point TEST_AFTER
Fri Sep 18 08:35:05 2009
ORA-38757 signalled during: flashback database to restore point TEST_AFTER...
Fri Sep 18 12:00:48 2009

No errors in the RVWR process trace file

So I just dropped the restore point and voila

> drop restore point TEST_AFTER;

Restore point dropped.

> select flashback_on from v$database;

FLASHBACK_ON
------------------
NO

> alter database open;

Database altered.

Sunday, June 28, 2009

Are you sure you will be able to activate your standby??

A couple of weeks I faced a scenario where the standby database crashed

On looking at the alert.log I see the following message in the alert.log of the standby

***********************************************************
Sat Jun 6 06:48:52 2009
Recovery interrupted!
cannot find needed online log for redo thread 1
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
Sat Jun 6 06:48:53 2009
Errors in file /u01/app/oracle/admin/TEST/bdump/test1_mrp0_24533.trc:
ORA-10576: Give up restoring recovered datafiles to consistent state: some error occurred
ORA-16037: user requested cancel of managed recovery operation
Some recovered datafiles maybe left media fuzzy
Media recovery may continue but open resetlogs may fail
Sat Jun 6 06:48:54 2009
Waiting for MRP0 pid 24533 to terminate
************************************************************

Hmmm... this means that if the standby does not have the redo and cannot get it from the primary you will not be able to online media fuzzy files using supported methods

The same issue is explained in Bug 5956646
as an architectural limitation.

This is a very unlikely scenario but a possibility none the less

Tuesday, January 27, 2009

Outbound_connect_timeout

Outbound_connect_timeout comes into play when nodes in a cluster are down and we cannot wait for the OS timeout as this causes long delays in connect time. For example on Solaris the value of tcp_ip_abort_interval = 180000 ==> which is 180 seconds ==> 3 mins

In this post I will demonstrate how outbound_connect_timeout (OCT) can effectively avoid timeouts experienced by clients connecting to RAC nodes

If we take an example of the following connect string

TEST =
(DESCRIPTION =
(ADDRESS_LIST =
(LOAD_BALANCE = OFF)
(ADDRESS = (PROTOCOL = TCP)(HOST = sdby1-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = sdby2-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = sdby3-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = sdby4-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = prim1-vip)(PORT = 1521))
)
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = TEST)
)
)

In the above alias the first 4 nodes are non existent nodes and the service TEST
runs on the node prim1 which is listed last in the above alias.

I have deliberately set load_balance to OFF so that the client has to traverse through all nodes serially.

If we set an OUTBOUND_CONNECT_TIMEOUT of 3 seconds in the client's sqlnet.ora
(client is 10.2.0.3 Solaris) then the time to establish the connection is around 12 seconds.

If we were to run a sqlnet trace on the connection we see that the connection starts at

Connection started at
[27-JAN-2009 22:52:33:741] --- TRACE CONFIGURATION INFORMATION FOLLOWS ---
[27-JAN-2009 22:52:33:741] New trace stream is /tmp/cli_262.trc
and the first address which is tried is

[27-JAN-2009 22:52:33:757] nsc2addr: (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=sdb1-vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=TEST)(CID=(PROGRAM=sqlplus@bart)(HOST=bart)(USER=oracle))))

Moving on we can see that OCT is enabled due to the line

[27-JAN-2009 22:52:33:759] nstoSetupTimeout: entry
[27-JAN-2009 22:52:33:759] nstoSetupTimeout: ATO enabled for ctx=0x1001c9280, val=3000(millisecs)

After this we the following pattern is seen continuously

[27-JAN-2009 22:52:33:776] nsevwtsg: entry
[27-JAN-2009 22:52:33:776] nsevwtsg: cid=0
[27-JAN-2009 22:52:33:776] nsevwait: entry
[27-JAN-2009 22:52:33:776] nsevwait: 1 registered connection(s)
[27-JAN-2009 22:52:33:776] nsevwait: 0 pre-posted event(s)
[27-JAN-2009 22:52:33:776] nsevwait: waiting for transport event (0 thru 0)...
[27-JAN-2009 22:52:33:776] nsevwait: 0 newly-posted event(s)
[27-JAN-2009 22:52:33:776] nsevwait: 0 posted event(s)
[27-JAN-2009 22:52:33:776] nsevwait: exit (0)
[27-JAN-2009 22:52:33:776] nstoToqCheckSingle: entry
[27-JAN-2009 22:52:33:776] nstoToqCheckSingle: normal exit

and the timeout occurs in 3 seconds

[27-JAN-2009 22:52:36:771] nstoHandleEventTO: ATO occurred for ctx=0x1001c9280

Ergo the OCT seems to be working perfectly

The client then tries to establish a connection with sdby2-vip and experiences
the same timeout

The connection is finally established at

[27-JAN-2009 22:52:45:915] nscon: no connect data
[27-JAN-2009 22:52:45:915] nscon: connect handshake is complete
[27-JAN-2009 22:52:45:915] nscon: nsctxinf[0]=0x41, [1]=0x41
[27-JAN-2009 22:52:45:915] nscon: normal exit

which translates to around 12 seconds.

From my testing for clients on Windows

OUTBOUND_CONNECT_TIMEOUT does not work with 10.2.0.3 base windows client. (Clients may get errors)
It works with patch 21 on top of 10.2.0.3
OUTBOUND_CONNECT_TIMEOUT does work with 11g client

Without OCT if we trace the connection again we see that we wait for more than 3 mins

[28-JAN-2009 14:18:35:299] nttcni: entry
[28-JAN-2009 14:18:35:299] nttcni: trying to connect to socket 10.
[28-JAN-2009 14:22:19:915] ntt2err: entry
[28-JAN-2009 14:22:19:915] ntt2err: soc 10 error - operation=1, ntresnt[0]=505, ntresnt[1]=145, ntresnt[2]=0
[28-JAN-2009 14:22:19:915] ntt2err: exit
[28-JAN-2009 14:22:19:915] nttcni: exit
[28-JAN-2009 14:22:19:915] nttcon: exit
[28-JAN-2009 14:22:19:921] nserror: entry
[28-JAN-2009 14:22:19:921] nserror: nsres: id=0, op=65, ns=12535, ns2=12560; nt[0]=505, nt[1]=145, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
[28-JAN-2009 14:22:19:921] nsopen: unable to open transport
[28-JAN-2009 14:22:19:921] nsiocancel: entry