[20190418]exclusive latch spin count.txt

[20190418]exclusive latch spin count.txt

–//昨天测试”process allocation” latch,主要这个latch与其它拴锁spin方式有点不同,但是缺省都是spin 20000.如何验证一直是困扰我的问题.
–//而且现在的模式是spin 一定数量后,调用semop睡眠,等待唤醒.在这步消耗cpu资源很少.而不是像以前反复spin,指数回退.
–//链接:http://andreynikolaev.wordpress.com/2011/01/06/spin-tales-part-1-exclusive-latches-in-oracle-9-2-11g/
–//对方的服务器solaris,不管是solaris x86还是 solaris sparc cpu,都可以使用dtrace工具探究,我没有也不熟悉这个工具.他监测调用的函数如下:
latch_func addr     lname
———- ——– ————————
kslgetl    500063d0 first_spare_latch

kslgetl          –  ksl  get exclusive latch
sskgslgf         –  immediate latch get
kslges           – wait latch get
skgslsgts
sskgslspin       – spin for the latch
sskgslspin
sskgslspin
sskgslspin
sskgslspin
–//latch spin 使用sskgslspin函数调用,可是linux下使用intel cpu并没有对应的oracle内部函数.
(gdb) b sskgslspin
function “sskgslspin” not defined.
make breakpoint pending on future shared library load? (y or [n]) y
breakpoint 3 (sskgslspin) pending.

–//链接:https://fritshoogland.wordpress.com/2015/07/17/oracle-12-and-latches/
some searching around revealed that a cpu register reveals this information. add this to the above gdb script:

–//一些搜索显示cpu寄存器显示了这些信息。将其添加到上面的gdb脚本中:

break *0xc29b51
  commands
    silent
    printf ” kslges loop: %d\n”, $ecx
    c
  end

–//他没有讲如何获得这个地址,仅仅给出1个线索这些信息在cpu寄存器里面,如何探究呢?我通过我的测试说明.
–//首先说明一下我并不熟悉gdb调式工具.也不要在生产系统做这样的测试!!

1.环境:
scott@book> @ ver1
port_string                    version        banner
—————————— ————– ——————————————————————————–
x86_64/linux 2.4.xx            11.2.0.4.0     oracle database 11g enterprise edition release 11.2.0.4.0 – 64bit production

sys@book> @ hide spin_count
name              description                        default_value session_value system_value
—————– ———————————- ————- ————- ————
_mutex_spin_count mutex spin count                   true          255           255
_spin_count       amount to spin waiting for a latch true          2000          2000

sys@book> select class_ksllt,decode(class_ksllt,2,kslltnam,3,kslltnam) name,count(*) from x$kslltr group by  class_ksllt,decode(class_ksllt,2,kslltnam,3,kslltnam);
class_ksllt name               count(*)
———– —————— ——–
          0                         581
          2 process allocation        1
–//前面我已经说过,仅仅process allocation latch比较特殊,使用latch clase=2.我个人认为这样设计避免登录出现阻塞.响应更快一些.
–//其它拴锁都是latch class=”0″.
sys@book> select * from x$ksllclass ;
addr                   indx    inst_id       spin      yield   waittime     sleep0     sleep1     sleep2     sleep3     sleep4     sleep5     sleep6     sleep7
—————- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———-
00000000861986c0          0          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
00000000861986ec          1          1      20000          0          1       1000       1000       1000       1000       1000       1000       1000       1000
0000000086198718          2          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
0000000086198744          3          1      20000          0          1       1000       1000       1000       1000       1000       1000       1000       1000
0000000086198770          4          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
000000008619879c          5          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
00000000861987c8          6          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
00000000861987f4          7          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
8 rows selected.
–//不管什么latch class,缺省spin count=20000,注意不是2000.

$ cat exclusive_latch.txt
/* 参数如下: @ exclusive_latch.txt latch_name willing why where sleep_num */
–//connect / as sysdba
col laddr new_value laddr
select addr laddr from v$latch_parent where name=’&&1′;
oradebug setmypid
oradebug call kslgetl 0x&laddr &&2 &&3 &&4
host sleep &&5
oradebug call kslfre 0x&laddr
–//exit
–//注:我前几天的测试脚本有connect / as sysdba,exit这两行,我为了调式方便,先注解这2行,避免反复退出进入会话.
–//session 1:
sys@book> @ exclusive_latch.txt “test excl. parent l0” 1 1 2 100000
laddr
—————-
00000000600098d8

statement processed.
function returned 1
–//后面的参数是sleep的秒数,数值大一些,避免跟踪时退出.想继续按ctrl+c就可以中断sleep.

–//session 2:
sys@book> @ spid
       sid    serial# process                  server    spid       pid  p_serial# c50
———- ———- ———————— ——— —— ——- ———- ————————————————–
        44         45 37744                    dedicated 37745       27         21 alter system kill session ‘44,45’ immediate;
–//记下spid=37745.在打开一个终端窗口执行如下:
–//暂且称为window 3:
$ gdb -p 37745

–//session 2:
sys@book> @ exclusive_latch.txt “test excl. parent l0” 1 3 4 1
–//挂起!!

–//windows 3:
(gdb) c
continuing.

–//session 2:
sys@book> @ exclusive_latch.txt “test excl. parent l0” 1 3 4 1
laddr
—————-
00000000600098d8

statement processed.

–//停在oradebug call kslgetl 调用,因为session 1目前持有该拴锁.调用前几天测试使用latch_free.sql脚本:
sys@book> @ latch_free
2019-04-18 11:41:09
process 26
 holding: 00000000600098d8  “test excl. parent l0” lvl=0 whr=2 why=1, sid=30
  process 27, waiting for: 00000000600098d8 whr=4 why=3

–//回到window 3,按ctrl+c中断:
(gdb) c
continuing.
^c
program received signal sigint, interrupt.
0x00000037990d6407 in semop () from /lib64/libc.so.6
(gdb)
(gdb) bt 6
#0  0x00000037990d6407 in semop () from /lib64/libc.so.6
#1  0x0000000009809c0f in sskgpwwait ()
#2  0x00000000098089ce in skgpwwait ()
#3  0x00000000093f9fe1 in kslges ()
#4  0x00000000093f997a in kslgetl ()
#5  0x0000000007d7402e in skdxcall ()
(more stack frames follow…)
–//可以确定函数调用的堆栈或者称为顺序,当前停在semop睡眠上,可以发现调用kslgetl后,紧接着的是kslges.这样猜测spin计数在调用kslges函数里面.

2.重复前面测试,在gdb下设置断点:
–//在session 1按ctrl+c,退出window 3的gdb程序,重新执行gdb.
–//window 3:
$ rlwrap gdb -p 37745
(gdb)
(gdb) break kslges
breakpoint 1 at 0x93f9b74
–//设置断点在kslges函数调用上.然后在session 1,2分别执行(后面不再说明):
–//注:我前面加入rlwrap,主要记忆一些命令~/.gdb_history,避免反复打入(主要原因有时候要退出gdb界面),实际上gdb是支持方向键的.
–//session 1:
sys@book> @ exclusive_latch.txt “test excl. parent l0” 1 1 2 100000

–//session 2:
sys@book> @ exclusive_latch.txt “test excl. parent l0” 1 3 4 1

–//再次挂起!在window 3,执行如下:
(gdb) c
continuing.
breakpoint 1, 0x00000000093f9b74 in kslges ()
(gdb)

(gdb) info register
rax            0x0      0
rbx            0x0      0
rcx            0x3      3
rdx            0x0      0
rsi            0x0      0
rdi            0x600098d8       1610651864
rbp            0x7fff93de4f40   0x7fff93de4f40
rsp            0x7fff93de4f40   0x7fff93de4f40
r8             0x4      4
r9             0x0      0
r10            0x0      0
r11            0xa      10
r12            0x600098d8       1610651864
r13            0x1      1
r14            0x3      3
r15            0x4      4
rip            0x93f9b74        0x93f9b74 <kslges+4>
–//这里应该指向下一条执行的地址.
eflags         0x246    [ pf zf if ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
fctrl          0x27f    639
fstat          0x0      0
ftag           0xffff   65535
fiseg          0x0      0
fioff          0x9847daa        159677866
foseg          0x7fff   32767
fooff          0x93de38d0       -1814153008
fop            0x0      0
mxcsr          0x1fa0   [ pe im dm zm om um pm ]

–//到底那个寄存器是spin计数呢?到目前根本看不出来.
(gdb) info register ecx
invalid register `ecx’
–//昏,根本没有ecx这个寄存器,难道对方服务器不是intel系列的吗?
(gdb) set pagination off
(gdb) help alias
aliases of other commands.

list of commands:

ni — step one instruction
–//ni 表示 step one instruction
rc — continue program being debugged but run it in reverse
rni — step backward one instruction
rsi — step backward exactly one instruction
si — step one instruction exactly
stepping — specify single-stepping behavior at a tracepoint
tp — set a tracepoint at specified line or function
tty — set terminal for future runs of program being debugged
where — print backtrace of all stack frames
ws — specify single-stepping behavior at a tracepoint

type “help” followed by command name for full documentation.
type “apropos word” to search for commands related to “word”.
command name abbreviations are allowed if unambiguous.

–//如果ni,info register交替执行明显太慢.spin至少20000次呢,
–//如果我执行ni 1000应该不会错过什么,可以这时看寄存器应该猜测spin count在那个寄存器中.

(gdb) ni 1000
0x00000000093f9dfb in kslges ()

(gdb) info register
rax            0x4dc0   19904
rbx            0x0      0
rcx            0x4dbe   19902
rdx            0x100    256
rsi            0x0      0
rdi            0x1a     26
rbp            0x7fff93de4f40   0x7fff93de4f40
rsp            0x7fff93de4c00   0x7fff93de4c00
r8             0x861ca808       2250024968
r9             0x19c    412
r10            0x0      0
r11            0x1b     27
r12            0x8620f490       2250306704
r13            0x600098d8       1610651864
r14            0x4e20   20000
r15            0x1b     27
rip            0x93f9dfb        0x93f9dfb <kslges+651>
eflags         0x217    [ cf pf af if ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
fctrl          0x27f    639
fstat          0x0      0
ftag           0xffff   65535
fiseg          0x0      0
fioff          0x9847daa        159677866
foseg          0x7fff   32767
fooff          0x93de38d0       -1814153008
fop            0x0      0
mxcsr          0x1fa0   [ pe im dm zm om um pm ]
–//^_^,明显spin count保存在rax,rcx寄存器中.现在必须通过rip地址确定循环的开头(实际上直接拿这个地址测试也可以的,毕竟每次spin 循
–//环都会停在这里)

(gdb) ni
0x00000000093f9dfe in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbe   19902
rip            0x93f9dfe        0x93f9dfe <kslges+654>
(gdb) ni
0x00000000093f9ddc in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbe   19902
rip            0x93f9ddc        0x93f9ddc <kslges+620>

–//运气太好了!!注意看rax的变化从19904=>19903.以及rip变小了,可以确定循环开头在0x93f9ddc地址.
–//下班,下午继续.
–//看看1个spin循环需要多少指令.
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbe   19902
rip            0x93f9dfe        0x93f9dfe <kslges+654>
(gdb) ni
0x00000000093f9ddc in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbe   19902
rip            0x93f9ddc        0x93f9ddc <kslges+620>
(gdb) ni
0x00000000093f9dde in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbe   19902
rip            0x93f9dde        0x93f9dde <kslges+622>
(gdb) ni
0x00000000093f9de4 in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbe   19902
rip            0x93f9de4        0x93f9de4 <kslges+628>
(gdb) ni
0x00000000093f9deb in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbe   19902
rip            0x93f9deb        0x93f9deb <kslges+635>
(gdb) ni
0x00000000093f9def in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbe   19902
rip            0x93f9def        0x93f9def <kslges+639>
(gdb) ni
0x00000000093f9df2 in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbe   19902
rip            0x93f9df2        0x93f9df2 <kslges+642>
(gdb) ni
0x00000000093f9df8 in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbe   19902
rip            0x93f9df8        0x93f9df8 <kslges+648>
(gdb) ni
0x00000000093f9dfb in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbf   19903
rcx            0x4dbd   19901
rip            0x93f9dfb        0x93f9dfb <kslges+651>
(gdb) ni
0x00000000093f9dfe in kslges ()
(gdb) info register rax rcx rip
rax            0x4dbe   19902
rcx            0x4dbd   19901
rip            0x93f9dfe        0x93f9dfe <kslges+654>
–//一共需要9条指令.有了这些信息,可以写出gbd脚本.
(gdb) disassemble kslges
–//反汇编看看.

0x00000000093f9ddc <kslges+620>:        xor    %esi,%esi
0x00000000093f9dde <kslges+622>:        mov    %esi,-0xd8(%rbp)
0x00000000093f9de4 <kslges+628>:        mov    %sil,-0xa6(%rbp)
0x00000000093f9deb <kslges+635>:        mov    0x0(%r13),%rdi
0x00000000093f9def <kslges+639>:        test   %rdi,%rdi
0x00000000093f9df2 <kslges+642>:        je     0x93fa6c0 <kslges+2896>
0x00000000093f9df8 <kslges+648>:        add    $0xffffffffffffffff,%ecx
0x00000000093f9dfb <kslges+651>:        add    $0xffffffffffffffff,%eax
0x00000000093f9dfe <kslges+654>:        jne    0x93f9ddc <kslges+620>

–//^_^.看不懂,估计这个$0xffffffffffffffff表示-1,明白了我前面ecx对应这里,如何显示呢?不懂.
–//我写入    printf ” spin count loop: %d\n”, $ecx报错!!
program received signal sigsegv, segmentation fault.
0x0000000009805bd5 in slaac_int ()
–//实例崩溃了.
$ sqlplus -prelim /nolog
sql*plus: release 11.2.0.4.0 production on thu apr 18 16:23:45 2019
copyright (c) 1982, 2013, oracle.  all rights reserved.

@> connect sys as sysdba
enter password:
prelim connection established
sys@book> shutdown immediate ;
ora-01012: not logged on
sys@book> shutdown abort;
oracle instance shut down.
–//注:作者有说明硬件=sgi system.

3.重复前面测试,先写出gdb脚本:
$ cat spin.gdb
break kslgetl
  commands
    silent
    printf “kslgetl %x, %d, %d, %d\n”, $rdi, $rsi, $rdx, $rcx
    c
  end

break kslges
  commands
    silent
    printf “kslges %x, %d, %d, %d\n”, $rdi, $rsi, $rdx, $rcx
    c
  end

break skgpwwait
  commands
    silent
    printf “skgpwwait %d, %d, %d, %d\n”, $rdi, $rsi, $rdx, $rcx
    c
  end

break sskgpwwait
  commands
    silent
    printf “sskgpwwait %d, %d, %d, %d\n”, $rdi, $rsi, $rdx, $rcx
    c
  end

break semop
  commands
    silent
    printf “semop %d, %d, %d, %d\n”, $rdi, $rsi, $rdx, $rcx
    c
  end

break *0x93f9ddc
  commands
    silent
    printf ” spin count loop: %d %d %x\n”, $rax,$rcx,$rip
    c
  end

#0  0x00000037990d6407 in semop () from /lib64/libc.so.6
#1  0x0000000009809c0f in sskgpwwait ()
#2  0x00000000098089ce in skgpwwait ()
#3  0x00000000093f9fe1 in kslges ()
#4  0x00000000093f997a in kslgetl ()

–//重复测试:
–//window 3:
$ gdb -p 37745 -x spin.gdb

–//session 1:
sys@book> @ exclusive_latch.txt “test excl. parent l0” 1 1 2 100000

–//session 2:
sys@book> @ exclusive_latch.txt “test excl. parent l0” 1 3 4 1

–//window 3:
kslgetl 600098d8, 1, 3, 4
kslges 600098d8, 0, 0, 3
 spin count loop: 20000 19999 93f9ddc
 spin count loop: 19999 19998 93f9ddc
 spin count loop: 19998 19997 93f9ddc
 spin count loop: 19997 19996 93f9ddc
 spin count loop: 19996 19995 93f9ddc
 spin count loop: 19995 19994 93f9ddc
 spin count loop: 19994 19993 93f9ddc
 spin count loop: 19993 19992 93f9ddc
 spin count loop: 19992 19991 93f9ddc
 spin count loop: 19991 19990 93f9ddc
 spin count loop: 19990 19989 93f9ddc
 spin count loop: 19989 19988 93f9ddc
…..
–//不断按return继续…
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait -1814147480, 202182304, -2044659696, 0
sskgpwwait -1814147480, 202182304, -2044659696, 0
semop 314408960, -1814148224, 1, -1
 spin count loop: 20000 19999 93f9ddc
 spin count loop: 19999 19998 93f9ddc
…..
–//不断按return继续…
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait -1814147480, 202182304, -2044659696, 0
sskgpwwait -1814147480, 202182304, -2044659696, 0
semop 314408960, -1814148224, 1, -1

–//session 1:
–//按ctrl+c中断.
sys@book> @ exclusive_latch.txt “test excl. parent l0” 1 1 2 100000
old   1: select addr laddr from v$latch_parent where name=’&&1′
new   1: select addr laddr from v$latch_parent where name=’test excl. parent l0′
laddr
—————-
00000000600098d8
statement processed.
function returned 1
function returned 0

–//session 2等1秒也执行完成.
sys@book> @ exclusive_latch.txt “test excl. parent l0” 1 3 4 1
laddr
—————-
00000000600098d8
statement processed.
function returned 1
function returned 0
sys@book>

–//window 3界面显示如下:
semop 314408960, -1814148224, 1, -1

 spin count loop: 20000 19999 93f9ddc

–//执行完成.可以发现执行2次周期 ,每个周期20000次,第3次获得.为什么是2次呢?
–//我再次重复测试:

(gdb) info break 6
num     type           disp enb address            what
6       breakpoint     keep y   0x00000000093f9ddc <kslges+620>
        breakpoint already hit 20001 times
        silent
        printf ” spin count loop: %d %d %x\n”, $rax,$rcx,$rip
        c
–//就只有20001次.

4.换一种方式测试:
–//定制spin次数如下:
*._spin_count=20
sys@book> startup pfile=/tmp/@.ora
oracle instance started.
total system global area  643084288 bytes
fixed size                  2255872 bytes
variable size             205521920 bytes
database buffers          427819008 bytes
redo buffers                7487488 bytes
database mounted.
database opened.

sys@book> select * from x$ksllclass ;
addr                   indx    inst_id       spin      yield   waittime     sleep0     sleep1     sleep2     sleep3     sleep4     sleep5     sleep6     sleep7
—————- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———-
00000000861986c0          0          1         20          0          1       8000       8000       8000       8000       8000       8000       8000       8000
00000000861986ec          1          1         20          0          1       1000       1000       1000       1000       1000       1000       1000       1000
0000000086198718          2          1         20          0          1       8000       8000       8000       8000       8000       8000       8000       8000
0000000086198744          3          1         20          0          1       1000       1000       1000       1000       1000       1000       1000       1000
0000000086198770          4          1         20          0          1       8000       8000       8000       8000       8000       8000       8000       8000
000000008619879c          5          1         20          0          1       8000       8000       8000       8000       8000       8000       8000       8000
00000000861987c8          6          1         20          0          1       8000       8000       8000       8000       8000       8000       8000       8000
00000000861987f4          7          1         20          0          1       8000       8000       8000       8000       8000       8000       8000       8000
8 rows selected.
–//重复测试,细节不再列出,仅仅记录gdb输出.

(gdb) c
continuing.
kslgetl 6010d860, 1, 2087607608, 3991
kslgetl 6010d860, 1, 2087558280, 3991
kslgetl 6010d860, 1, 0, 4039
kslgetl 6010d860, 1, 0, 3980
kslgetl 6010d860, 1, 0, 4039
kslgetl 6010d860, 1, 2087563160, 3991
kslgetl 6010d860, 1, 2087572104, 3991
kslgetl 600098d8, 1, 3, 4
kslges 600098d8, 0, 0, 3
 spin count loop: 20 19 93f9ddc
 spin count loop: 19 18 93f9ddc
 spin count loop: 18 17 93f9ddc
 spin count loop: 17 16 93f9ddc
 spin count loop: 16 15 93f9ddc
 spin count loop: 15 14 93f9ddc
 spin count loop: 14 13 93f9ddc
 spin count loop: 13 12 93f9ddc
 spin count loop: 12 11 93f9ddc
 spin count loop: 11 10 93f9ddc
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait -1031167592, 202182304, -2044672536, 0
sskgpwwait -1031167592, 202182304, -2044672536, 0
semop 314933248, -1031168336, 1, -1
 spin count loop: 20 19 93f9ddc
 spin count loop: 19 18 93f9ddc
 spin count loop: 18 17 93f9ddc
 spin count loop: 17 16 93f9ddc
 spin count loop: 16 15 93f9ddc
 spin count loop: 15 14 93f9ddc
 spin count loop: 14 13 93f9ddc
 spin count loop: 13 12 93f9ddc
 spin count loop: 12 11 93f9ddc
 spin count loop: 11 10 93f9ddc
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait -1031167592, 202182304, -2044672536, 0
sskgpwwait -1031167592, 202182304, -2044672536, 0
semop 314933248, -1031168336, 1, -1
 spin count loop: 20 19 93f9ddc
–//第1次执行一共2次spin周期2次,每个周期20次.从另外一个方面验证spin 计数来之视图select * from x$ksllclass ;.
(gdb) info break 6
num     type           disp enb address            what
6       breakpoint     keep y   0x00000000093f9ddc <kslges+620>
        breakpoint already hit 41 times
        silent
        printf ” spin count loop: %d %d %x\n”, $rax,$rcx,$rip
        c
–//breakpoint already hit 41 times,如果接着重复测试:
(gdb) c
continuing.
kslgetl 600098d8, 1, 3, 4
kslges 600098d8, 0, 0, 3
 spin count loop: 20 19 93f9ddc
 spin count loop: 19 18 93f9ddc
 spin count loop: 18 17 93f9ddc
 spin count loop: 17 16 93f9ddc
 spin count loop: 16 15 93f9ddc
 spin count loop: 15 14 93f9ddc
 spin count loop: 14 13 93f9ddc
 spin count loop: 13 12 93f9ddc
 spin count loop: 12 11 93f9ddc
 spin count loop: 11 10 93f9ddc
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait -1031167592, 202182304, -2044672536, 0
sskgpwwait -1031167592, 202182304, -2044672536, 0
semop 314933248, -1031168336, 1, -1
 spin count loop: 20 19 93f9ddc
–//这次仅仅spin周期1次.
(gdb) info break 6
num     type           disp enb address            what
6       breakpoint     keep y   0x00000000093f9ddc <kslges+620>
        breakpoint already hit 62 times
        silent
        printf ” spin count loop: %d %d %x\n”, $rax,$rcx,$rip
        c
–//breakpoint already hit 62 times,也就是最后1次执行break 21次.

5.继续定制spin次数,采用不同类看看:
select addr,name,level#,latch#,gets,misses,sleeps,immediate_gets,immediate_misses,waiters_woken,waits_holding_latch,spin_gets,wait_time from v$latch_parent   where lower(name) like ‘%’||lower(‘test excl. parent l0′)||’%’
addr             name                 level#     latch#       gets     misses     sleeps immediate_gets immediate_misses waiters_woken waits_holding_latch  spin_gets  wait_time
—————- ——————– —— ———- ———- ———- ———- ————– —————- ————- ——————- ———- ———-
00000000600098d8 test excl. parent l0      0          4         64         30         39              0                0             0                   0          7 1.4769e+10
–//latch#=4
–//定制spin次数如下,修改参数文件加入:
#*._spin_count=20
*._latch_classes=’4:3′
*._latch_class_3=’10 0 1 10000 20000 30000 40000 50000 60000 70000 50000′

sys@book> startup pfile=/tmp/@.ora
oracle instance started.
total system global area  643084288 bytes
fixed size                  2255872 bytes
variable size             205521920 bytes
database buffers          427819008 bytes
redo buffers                7487488 bytes
database mounted.
database opened.

sys@book> select * from x$ksllclass ;
addr                   indx    inst_id       spin      yield   waittime     sleep0     sleep1     sleep2     sleep3     sleep4     sleep5     sleep6     sleep7
—————- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———- ———-
00000000861986c0          0          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
00000000861986ec          1          1      20000          0          1       1000       1000       1000       1000       1000       1000       1000       1000
0000000086198718          2          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
0000000086198744          3          1         10          0          1      10000      20000      30000      40000      50000      60000      70000      50000
0000000086198770          4          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
000000008619879c          5          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
00000000861987c8          6          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
00000000861987f4          7          1      20000          0          1       8000       8000       8000       8000       8000       8000       8000       8000
8 rows selected.
–//spin=10.
–//重复测试,细节不再列出,仅仅记录gdb输出.

kslgetl 6010d860, 1, 2087529088, 3991
kslgetl 80641188, 1, 0, 4174
kslgetl 6010d860, 1, 2087479760, 3991
kslgetl 6010d860, 1, 0, 4039
kslgetl 6010d860, 1, 0, 3980
kslgetl 6010d860, 1, 0, 4039
kslgetl 6010d860, 1, 2087484640, 3991
kslgetl 6010d860, 1, 2087493584, 3991
kslgetl 600098d8, 1, 3, 4
kslges 600098d8, 0, 0, 3
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait 1986434424, 202182304, 0, 10000
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait 1986434424, 202182304, 0, 20000
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait 1986434424, 202182304, 0, 30000
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait 1986434424, 202182304, 0, 40000
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait 1986434424, 202182304, 0, 50000
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait 1986434424, 202182304, 0, 60000
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait 1986434424, 202182304, 0, 70000
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait 1986434424, 202182304, 0, 50000
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc
skgpwwait 1986434424, 202182304, 0, 50000
 spin count loop: 10 9 93f9ddc
 spin count loop: 9 8 93f9ddc
 spin count loop: 8 7 93f9ddc
 spin count loop: 7 6 93f9ddc
 spin count loop: 6 5 93f9ddc
 spin count loop: 5 4 93f9ddc
 spin count loop: 4 3 93f9ddc
 spin count loop: 3 2 93f9ddc
 spin count loop: 2 1 93f9ddc
 spin count loop: 1 0 93f9ddc

(gdb) info break 6
num     type           disp enb address            what
6       breakpoint     keep y   0x00000000093f9ddc <kslges+620>
        breakpoint already hit 199 times
        silent
        printf ” spin count loop: %d %d %x\n”, $rax,$rcx,$rip
        c
(gdb) bt
#0  0x00000037990ce183 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000002d9751c in skgpnap ()
#2  0x0000000009808a7a in skgpwwait ()
#3  0x00000000093fa63b in kslges ()
#4  0x00000000093f997a in kslgetl ()
#5  0x0000000007d7402e in skdxcall ()
#6  0x00000000076c96aa in ksdxcall ()
#7  0x00000000076cdbcb in ksdxen_int ()
#8  0x00000000076d11a0 in ksdxen ()
#9  0x00000000095bbdad in opiodr ()
#10 0x00000000097a629f in ttcpip ()
#11 0x000000000186470e in opitsk ()
#12 0x0000000001869235 in opiino ()
#13 0x00000000095bbdad in opiodr ()
#14 0x00000000018607ac in opidrv ()
#15 0x0000000001e3a48f in sou2o ()
#16 0x0000000000a29265 in opimai_real ()
#17 0x0000000001e407ad in ssthrdmain ()
#18 0x0000000000a291d1 in main ()
–//这种方式不断的spin循环,获取拴锁.

总结:
1.不要在生产系统做这样测试.
2.主要是自己不熟悉gdb调式工具
3.又写太长,我主要记录比较详细,避免以后看不懂^_^.
4.可以看出latch 获取的变化,缺省都是20000次,对于exclusive latch.
5.我个人不主张定制化解决这类拴锁的问题,应该从应用着手,比如优化sql语句,减少执行测试等等.
6.明天分析shared latch,按照andreynikolaev.wordpress.com介绍是2*_spin_count次数.明天验证看看.
7.实际上仅仅记住现在latch机制与原来的不同,不再使用原来的指数回退sleep机制.而是仅仅spin 20000次,然后执行semop,等待唤醒.
–//为什么前面测试出现2次spin周期,我还不是很清楚…
–//再补充一个例子说明:
$ cat exclusive_latch.txt
/* 参数如下: @ exclusive_latch.txt latch_name willing why where sleep_num */
–//connect / as sysdba
col laddr new_value laddr
select addr laddr from v$latch_parent where name=’&&1′;
oradebug setmypid
oradebug call kslgetl 0x&laddr &&2 &&3 &&4
host sleep &&5
oradebug call kslfre 0x&laddr
exit

$ cat p6.sh
#! /bin/bash
vdate=$(date ‘+%y%m%d%h%m%s’)
echo $vdate

source peek.sh “$1” 20 | timestamp.pl >| /tmp/peekx_${vdate}.txt &

sqlplus -s -l / as sysdba <<eof  >| /tmp/latch_free_${vdate}.txt &
$(seq 20 | xargs -i {} echo -e ‘@latch_free \n host sleep 1’)
eof

sleep 1
# 参数如下: @ exclusive_latch.txt latch_name willing why where  sleep_num
sqlplus / as sysdba @ exclusive_latch.txt “$1” 1 4 5 10 > /dev/null &
p=$!
strace -fttt  -p $p -o /tmp/pp_${vdate}_${p}.txt > /dev/null &
sleep 2
sqlplus / as sysdba @ exclusive_latch.txt “$1” 1 6 7 5 > /dev/null &
p=$!
strace -fttt  -p $p -o /tmp/pp_${vdate}_${p}.txt > /dev/null &
wait

$ . p6.sh “test excl. parent l0”
20190419090713
process 30017 attached – interrupt to quit
process 30020 attached
process 30023 attached
process 30017 suspended
process 30026 attached – interrupt to quit
process 30028 attached
process 30017 resumed
process 30023 detached
process 30047 attached
process 30026 suspended
process 30017 detached
process 30020 detached
process 30026 resumed
process 30047 detached
process 30026 detached
process 30028 detached
[1]   done                    source peek.sh “$1” 20 | timestamp.pl >|/tmp/peekx_${vdate}.txt
[3]   done                    sqlplus / as sysdba @ exclusive_latch.txt “$1” 1 4 5 10 > /dev/null
[4]   done                    strace -fttt -p $p -o /tmp/pp_${vdate}_${p}.txt > /dev/null
[5]-  done                    sqlplus / as sysdba @ exclusive_latch.txt “$1” 1 6 7 5 > /dev/null
[6]+  done                    strace -fttt -p $p -o /tmp/pp_${vdate}_${p}.txt > /dev/null
[2]+  done                    sqlplus -s -l / as sysdba  >|/tmp/latch_free_${vdate}.txt <<eof
$(seq 20 | xargs -i {} echo -e ‘@latch_free \n host sleep 1’)
eof

$ grep sem /tmp/pp_20190419090713*.txt
/tmp/pp_20190419090713_30017.txt:30020 09:07:25.053803 semctl(315195392, 33, setval, 0x1) = 0 <0.000025>
/tmp/pp_20190419090713_30026.txt:30028 09:07:17.040321 semop(315195392, 0x7ffff4363890, 1) = 0 <8.013580>

–//你可以发现进程1发出kslfre时调用semctl时间在09:07:25.053803,进程2从09:07:17.040321开始执行semop.加上8.013580秒被唤醒.
–//在09:07:28:053901结束semop

9.select函数 属于 strace -e desc 可以跟踪到,-e ipc可以单独跟踪semop,semctl,semtimedop系统调用.

(0)
上一篇 2022年3月22日
下一篇 2022年3月22日

相关推荐