Friday, February 16, 2007

LMS and real time priorities

One of the oft mentioned RAC best practices is that the priority of LMS processes which ship blocks across the interconnect be higher so that they are not competing for cpu cycles with other processes.
Starting with 10.2 LMS is supposed to run in the real time class. This is new functionality which is governed by the underscore parameter _os_sched_high_priority

However in 10.2.0.1 LMS still runs in the time-sharing TS class (SCHED_OTHER standard time-sharing) due to the absence of the oradism executable
buffalo >ls -al oradism
-r-sr-s--- 1 root dba 0 Jul 1 2005 oradism

buffalo >ps -efc|grep lms
oracle 27263 1 TS 24 2006 ? 00:12:38 ora_lms0_F8900DMO1
oracle 27273 1 TS 24 2006 ? 00:04:30 ora_lms1_F8900DMO1

When you apply the 10.2.0.3 patchset you notice that the oradism executable seems to be generated and LMS runs in the RR class (SCHED_RR round robin)

elephant-> ls -al oradism
-r-sr-s--- 1 root dba 14456 Nov 15 12:52 oradism

elephant-> ps -efc | grep lms

oracle 11554 1 RR 90 15:09 ? 00:00:01 ora_lms0_F8902PRD1
oracle 11558 1 RR 90 15:09 ? 00:00:01 ora_lms1_F8902PRD1

whereas other background processes like PMON still run in TS

elephant-> ps -efc | grep pmon | grep PRD
oracle 11544 1 TS 23 Feb14 ? 00:00:00 ora_pmon_F8902PRD1

I am not sure if this is ideal on a box with a low number of CPUs or if cache fusion traffic is not a major concern.If you want LMS to run in the same class as other processes you need to set _os_sched_high_priority back to 0 from its default value of 1 as seen from below

But doing this does not seem to change the class to TS

SQL> alter system set "_os_sched_high_priority"=0 scope=spfile;

System altered.

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Production
With the Partitioning, Real Application Clusters, OLAP and Data Mining options
elephant-> srvctl stop database -d F8902PRD
elephant-> srvctl start database -d F8902PRD
elephant-> ps -efc | grep lms | grep PRD
oracle 17097 1 RR 90 09:54 ? 00:00:00 ora_lms0_F8902PRD1
oracle 17101 1 RR 90 09:54 ? 00:00:00 ora_lms1_F8902PRD1

1 select a.ksppinm "Parameter",
2 b.ksppstvl "Session Value",
3 c.ksppstvl "Instance Value"
4 from x$ksppi a, x$ksppcv b, x$ksppsv c
5 where a.indx = b.indx and a.indx = c.indx
6* and a.ksppinm like '%os_sched%'
SQL> /

Parameter
--------------------------------------------------------------------------------
Session Value
-------------------------------------------------------------------------------------------------------------------------------------------------
Instance Value
-------------------------------------------------------------------------------------------------------------------------------------------------
_os_sched_high_priority
0
0

From bug 5635098 it appears there is another parameter called
_high_priority_processes which needs to be set to null for this to work.

SQL> alter system set "_high_priority_processes"='' scope=spfile;
System altered.

elephant-> srvctl stop database -d F8902PRD
elephant-> srvctl start database -d F8902PRD

elephant-> ps -efc | grep lms | grep -v grep

oracle 31654 1 TS 24 00:58 ? 00:00:01 ora_lms0_F8902PRD1
oracle 31656 1 TS 24 00:58 ? 00:00:00 ora_lms1_F8902PRD1

As you can lms is now running in TS class.
On Solaris Sparc 64 bit be aware of bug 5258549 which causes boxes with low number of CPUs to freeze.

16 comments:

Anonymous said...

blimey, I just checked my 10.2.0.2 and I have the following:

oracle 7438 1 FF 41 2006 ? 00:00:04 asm_lms0_+ASM1
oracle 1194 1 FF 41 2006 ? 18:32:03 ora_lms0_NOM1
oracle 1198 1 FF 41 2006 ? 15:18:40 ora_lms1_NOM1

so we have 3 different scheduler classes for 3 different 10gR2 version!

oracle really could not make up their mind here.

jason.

Fairlie Rego said...

Jason,

Which operating system are you on?

-Fairlie

Anonymous said...

Unfortunately, the bug report for Solaris is not very descriptive. It mentions Solaris 10 + CRS, but no clue on other common Solaris/Oracle combinations like Sun Cluster + Solaris 9. It could be a very specific combination of software components or a "general issue"

:(

Odd this did not get into the 10.2.0.3 Known Issues Note.

In any case, we have 3 Sun clusters running 9iR2 that I do not think they are going to see 10gR2 for a while. It seems too risky to go that path right now...

We will need to wait for 10.2.0.4 apparently...And that reminds me of an old saying coming from an Oracle rep (in the Twentieth Century) about the difference in quality between the even and odd releases (9208 vs 9207 or 715 vs 716), the ven release numbers were always better :)

Nilo.

Fairlie Rego said...

Nilo,

It is very strange that you mention about the quality of odd and even number patchsets given that I had mentioned the same thing on http://www.freelists.org/archives/oracle-l/09-2006/msg00591.html

-Fairlie

Anonymous said...

Hi Fairlie,

I'm on RedHat AS update 2 2.6.9-22.ELsmp at 10.2.02 gives the FF at 10.2.0.2 and the same o/s gives RR at 10.2.0.3

Fairlie Rego said...

Thanks Jason.If you talk to a Sales person he might say the "feature" is evolving but there is a code bug on this in the current 11g beta release.

Anonymous said...

Hi!
We have the same problem with oracle 10.2.0.3 and Solaris 2.8 on a FSC Box with two CPU's.
Is it a good idea to change the _os_sched_high_priority to NULL?

Regards
- Eric

Fairlie Rego said...

If you are on 10.2.0.3 you should set _high_priority_processes to null.
But you should take advise from Oracle Support before using
such undocumented parameters in a production system.

Unknown said...

Setting the LMS priority to "0" changed the class from FF to TS again

do anybody know what was FF??

Santhosh

Unknown said...

Setting the _os_sched_high_priority to "0" changed the class from FF to TS again, what is FF anyway??

Santhosh

Alex Gorbachev said...

FF means SCHED_FIFO.

From "man ps":
cls CLS scheduling class of the process.
(alias policy, class). Field's possible
values are:
- not reported
TS SCHED_OTHER
FF SCHED_FIFO
RR SCHED_RR
? unknown value

Anonymous said...

Thanks, very useful

Anonymous said...

Thank You, this discussion is very useful.

Anonymous said...

Does anyone have instructions how to simulate this RR scheduled LMS CPU race? (Naturally it would be used only in test environment, not in production environment).

Anonymous said...

TCaJd6 Your blog is great. Articles is interesting!

Anonymous said...

Hi,
How about the ASM instance?
Should we also alter the priority of LMS on the ASM instance?
Thanks for the info,
Dean