Monday, April 23, 2007

KST tracing

In a recent post Kevin Closson highlights the perils of the effect of the the ORA_CRS_HOME mounpoint filling up.
If you use the RAC service framework services would relocate to another node after going offline on the node on which the filesystem is full

In a RAC cluster if a process encounters an ora-600 or an ora-7445 (I haven't checked for other errors) each node in the cluster dumps trace files for each process with suffix trw in directories called cdmp_YYYYMMDDHHMISS under bdump. This instrumentation is referred to as KST tracing. So if you have a batch job which continuously encounters the before mentioned errors this can cause a plethora of cdmp* directories generated which could fill up the filesystem.
These trace files can be formatted by an internal Oracle tool called Trace Loader (trcldr).

You can disable this functionality by setting trace_enabled=false which
can be changed dynamically. After making this change you will see the following
message in the diag trace file

***********************************************
KST tracing is turned off, no data is logged
***********************************************

This functionality is also disabled if you set _diag_daemon=false. This would disable the DIAG daemon and hence disable other functionality like the dumping of systemstates by the DIAG process when a self deadlock occurs. DIAG is the only background process which can be killed without instance death.It is the 2nd last process to be terminated because it needs to perform trace flushing to file system. By default, the terminating process, usually PMON, gives 5 seconds to DIAG for dumping. This is governed by the parameter _ksu_diag_kill_time. So after dumping all
diagnostic data PMON will wait for _ksu_diag_kill_time before killing the instance.
This delay has been fixed via bug 5599293

3 comments:

Unknown said...

so, other than using "trcldr" a regular dba cannot make any sense out of those trw files?

I had always wondered about those files (seems like one for each session).

Raj

Fairlie Rego said...

I think without trcldr there still a lot of sense that can be made out of the trace files. I might blog about this in the future..

Yong Huang said...

There're a bunch of background processes that can be killed without crashing the instance. Check the process environment (ps eww or cat /proc//environ) for SKGP_HIDDEN_ARGS. BG tells you you can kill it. FATAL means it will crash instance.

BTW, trcldr is still undocumented. The only thing I found is Jack Cai's presentation. See bottom of http://yong321.freeshell.org/computer/oraclebin.html