Kernel space lock contention配置及其使用
概述
本文涉及到的内容如下:
- kernel lock相关debug方式,比如lock耗时,拿不到lock,lock依赖等等
- trace的使用
1.开启lock contenttion涉及到的config配置
config LOCKDEP
bool
depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
select STACKTRACE
select FRAME_POINTER if !MIPS && !PPC && !ARM_UNWIND && !S390 && !MICROBLAZE && !ARC && !SCORE && !X86
select KALLSYMS
select KALLSYMS_ALL
config LOCKDEP_SMALL
bool
config LOCK_STAT
bool "Lock usage statistics"
depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
select LOCKDEP
select DEBUG_SPINLOCK
select DEBUG_MUTEXES
select DEBUG_RT_MUTEXES if RT_MUTEXES
select DEBUG_LOCK_ALLOC
default n
help
This feature enables tracking lock contention points
For more details, see Documentation/locking/lockstat.txt
This also enables lock events required by "perf lock",
subcommand of perf.
If you want to use "perf lock", you also need to turn on
CONFIG_EVENT_TRACING.
CONFIG_LOCK_STAT defines "contended" and "acquired" lock events.
(CONFIG_LOCKDEP defines "acquire" and "release" events.)
上面的config默认是关闭状态。
2 menuconfig开启config配置
kernel hacking —> Lock Debugging (spinlock, mutexs, etc…) —>进入之后勾选
Lock Debugging: detect incorrect freeing of live locks 和Lcok usage statistics,就会把依赖项自动勾选.
保存退出之后,可以查看kernel下面新产生的config diff文件:
# Lock Debugging (spinlocks, mutexes, etc...)
#
-# CONFIG_DEBUG_RT_MUTEXES is not set
-# CONFIG_DEBUG_SPINLOCK is not set
-# CONFIG_DEBUG_MUTEXES is not set
+CONFIG_DEBUG_RT_MUTEXES=y
+CONFIG_DEBUG_SPINLOCK=y
+CONFIG_DEBUG_MUTEXES=y
# CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
-# CONFIG_DEBUG_LOCK_ALLOC is not set
+CONFIG_DEBUG_LOCK_ALLOC=y
# CONFIG_PROVE_LOCKING is not set
-# CONFIG_LOCK_STAT is not set
+CONFIG_LOCKDEP=y
+CONFIG_LOCK_STAT=y
+# CONFIG_DEBUG_LOCKDEP is not set
# CONFIG_DEBUG_ATOMIC_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_LOCK_TORTURE_TEST is not set
这样开启lock trace event了.。编译boot并刷机即可!
3 确定是否开启trace lock content feature
上面步骤二中开启之后,在adb shell里面多了如下几个接口:
- 增加了lock trace event, 有四个lock trace event,具体做什么使用,下面在详细讲解
- 增加了lock的统计信息, 节点信息实现源码kernel/locking/lockdep_proc.c文件中:
● lock_stat: 统计各种类型lock的时延
● lockdep: 表示锁的的深度,即一个lock里面调用了多少个lock,并将这些lock的调用全部显示出来
● lockdep_stats: 即锁深度里面包含了哪些lock以及频次
● locks:锁的状态,这个实现在fs/locks.c文件里面
4 如何使用lock trace event和proc node信息
4.1 lock trace event
从第三部分可以知道,lock trace event包含四个event,分别讲解如下:
四个event定义在include/trace/event/lock.h里面, 使用在kernel/locking/lockdep.c文件,
- lock_acquire: 获取lock,比如mutex lock ,rcu read lock 或者spinlock等等lock
- lock_acquired: 表示已经获取lock了
- lock_release: lock释放的event
- lock_contended: lock被谁hold了
那么如何抓取lock相关的trace event呢?
可以使用下面的脚本抓取(必须添加lock event ),即正常的抓取trace 命令:
● echo 40000 > buffer_size_kb && echo irq sched_switch sched_wakeup sched_waking cpu_frequency cpu_idle lock > set_event && cat set_event && echo > trace && echo 1 > tracing_on && sleep 10 && echo 0 > tracing_on && cat trace > /data/trace.txt
脚本执行完毕之后, pull出data目录下的trace.txt文件,vim打开即可看到下面类似的信息:
┊ ┊ sleep-2875 [004] d.s2 748.861607: sched_waking: comm=rcu_sched pid=9 prio=120 target_cpu=000
┊ ┊ sleep-2875 [004] d.s3 748.861608: lock_acquire: 00000000d4eff177 &rq->lock
┊ ┊ sleep-2875 [004] d.s3 748.861609: lock_contended: 00000000d4eff177 &rq->lock
┊ ┊ sleep-2875 [004] d.s3 748.861662: lock_acquired: 00000000d4eff177 &rq->lock
┊ ┊ sleep-2875 [004] d.s3 748.861663: lock_acquire: 000000000ff04d65 read tk_core.seq
┊ ┊ sleep-2875 [004] d.s3 748.861664: lock_release: 000000000ff04d65 tk_core.seq
┊ ┊ sleep-2875 [004] d.s3 748.861666: lock_acquire: 00000000cc726398 read rcu_read_lock
┊ ┊ sleep-2875 [004] d.s3 748.861667: lock_release: 00000000cc726398 rcu_read_lock
┊ ┊ sleep-2875 [004] d.s3 748.861668: lock_release: 00000000d4eff177 &rq->lock
┊ ┊ sleep-2875 [004] d.s2 748.861669: lock_acquire: 00000000cc726398 read rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861670: lock_acquire: 00000000cc726398 read rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861671: lock_release: 00000000cc726398 rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861672: lock_acquire: 00000000cc726398 read rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861673: lock_release: 00000000cc726398 rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861674: lock_acquire: 00000000cc726398 read rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861675: lock_release: 00000000cc726398 rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861676: lock_acquire: 00000000cc726398 read rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861677: lock_release: 00000000cc726398 rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861678: lock_acquire: 00000000cc726398 read rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861679: lock_release: 00000000cc726398 rcu_read_lock
┊ ┊ sleep-2875 [004] d.s2 748.861683: lock_release: 00000000cc726398 rcu_read_lock
┊ ┊ sleep-2875 [004] d.s3 748.861684: lock_acquire: 00000000d4eff177 &rq->lock
┊ ┊ sleep-2875 [004] d.s3 748.861685: lock_contended: 00000000d4eff177 &rq->lock
┊ ┊ sleep-2875 [004] d.s3 748.861719: lock_acquired: 00000000d4eff177 &rq->lock
┊ ┊ sleep-2875 [004] d.s4 748.861720: lock_acquire: 00000000cc2f375c &rq->lock
┊ ┊ sleep-2875 [004] d.s4 748.861720: lock_acquired: 00000000cc2f375c &rq->lock
┊ ┊ sleep-2875 [004] d.s4 748.861721: lock_acquire: 000000000ff04d65 read tk_core.seq
┊ ┊ sleep-2875 [004] d.s4 748.861722: lock_release: 000000000ff04d65 tk_core.seq
┊ ┊ sleep-2875 [004] d.s4 748.861724: lock_acquire: 00000000cc726398 read rcu_read_lock
┊ ┊ sleep-2875 [004] d.s4 748.861725: lock_release: 00000000cc726398 rcu_read_lock
┊ ┊ sleep-2875 [004] d.s4 748.861726: lock_acquire: 00000000cc726398 read rcu_read_lock
┊ ┊ sleep-2875 [004] d.s4 748.861727: lock_release: 00000000cc726398 rcu_read_lock
┊ ┊ sleep-2875 [004] d.s4 748.861728: lock_release: 00000000d4eff177 &rq->lock
┊ ┊ sleep-2875 [004] d.s3 748.861730: lock_release: 00000000cc2f375c &rq->lock
我们可以看到如下的信息:
- rcu_read_lock 是lock_acquire 之后直接lock_release. 持锁时间非常的短
- &rq->lock, 是一个spinlock类型. 通过lock_acquire→ lock_acquired→ lock_release的过程. 在lock_acquire过程中出现了lock_contended trace event, 表示此时的&rq->lock spinlock这个lock存在contention或者contended,表示此时有lock的竞争.
- 如果出现锁竞争,就会统计当前进程获取lock的等待时间等等统计信息,详细在4.2节讲解.
- 可以明显的看到持锁时间, 或者获取锁的时间.
4.2 lock proc node信息
有三个主要的节点信息,都在shell proc目录下:
cat /proc/lock_stat:
/proc # cat lock_stat | head -n 40
lock_stat version 0.4
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
class name con-bounces contentions waittime-min waittime-max waittime-total waittime-avg acq-bounces acquisitions holdtime-min holdtime-max holdtime-total holdtime-avg
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
&(&n->list_lock)->rlock: 441821 441946 1.81 80.23 1891806.60 4.28 2488861 10314115 0.00 137.08 64789787.05 6.28
-----------------------
&(&n->list_lock)->rlock 114751 [<0000000051f28fcf>] ___slab_alloc+0x194/0x5ec
&(&n->list_lock)->rlock 105155 [<000000000913f330>] deactivate_slab+0x2ec/0x540
&(&n->list_lock)->rlock 169288 [<00000000fb58074e>] free_debug_processing+0x34/0x26c
&(&n->list_lock)->rlock 12316 [<00000000f00be1d8>] kfree+0x30c/0x55c
-----------------------
&(&n->list_lock)->rlock 111558 [<0000000051f28fcf>] ___slab_alloc+0x194/0x5ec
&(&n->list_lock)->rlock 89847 [<000000000913f330>] deactivate_slab+0x2ec/0x540
&(&n->list_lock)->rlock 192366 [<00000000fb58074e>] free_debug_processing+0x34/0x26c
&(&n->list_lock)->rlock 12305 [<00000000f00be1d8>] kfree+0x30c/0x55c
.............................................................................................................................................................................................................................
&rq->lock: 351895 351959 1.88 564.00 9605797.10 27.29 1569868 5892437 0.00 1061.00 172708414.01 29.31
---------
&rq->lock 5739 [<00000000807dcaa6>] task_rq_lock+0x74/0xc4
&rq->lock 7268 [<00000000f033900c>] pick_next_task_fair+0x378/0x7e4
&rq->lock 85000 [<0000000043e94103>] try_to_wake_up+0x1bc/0x554
&rq->lock 46881 [<0000000078183a31>] try_to_wake_up+0x3b8/0x554
---------
&rq->lock 24883 [<00000000f57857d0>] update_blocked_averages+0x50/0xc44
&rq->lock 3577 [<00000000807dcaa6>] task_rq_lock+0x74/0xc4
&rq->lock 14525 [<00000000f033900c>] pick_next_task_fair+0x378/0x7e4
&rq->lock 5 [<0000000007411a3e>] walt_set_window_start+0xd4/0x134
.............................................................................................................................................................................................................................
&sg_policy->update_lock: 259980 259986 1.88 212.12 1681170.67 6.47 1729740 3292457 1.88 270.81 28368055.35 8.62
-----------------------
&sg_policy->update_lock 259986 [<00000000077def88>] sugov_update_shared+0x5c/0x1b0
-----------------------
&sg_policy->update_lock 259986 [<00000000077def88>] sugov_update_shared+0x5c/0x1b0
.............................................................................................................................................................................................................................
上面表示从开机到现在cat这个节点的时间内,lock的所有状态信息,包括
- lock的名字
- contention次数
- lock等待的时间,最大时间,最小时间,平均时间以及等待的总时间
- lock持有时间,最大,最小,平均以及持有的总时间
- 获取这个lock的次数
等等…
cat /proc/lockdep:
目的是将class_lock上面所有相关联的lock stack全部输出
可能的样式如下:
/proc # cat lockdep | head -n 100
all lock classes:
0000000044beef8b ....: logbuf_lock
00000000c80448bf ....: (console_sem).lock
00000000700ad619 ....: console_lock
00000000ef505732 ....: cgroup_mutex
0000000042291e92 ....: console_owner_lock
000000002e29cf8c ....: console_owner
000000004e6f50fd ....: devtree_lock
000000003c9defa6 ....: resource_lock
00000000947b85f2 ....: pm_mutex
0000000010004418 ....: primary_crng.lock
0000000020d31ff4 ....: input_pool.lock
00000000588a9d5b ....: "warn_unseeded_randomness".lock
00000000e13be140 ....: lock
0000000005822fb5 ....: cpu_hotplug_lock.rw_sem
00000000b0cf55a3 ....: cpuhp_state_mutex
cat /proc/lockdep_stats:
信息如下,主要是一些上下文里面的信息:
/proc # cat lockdep_stats
lock-classes: 1851 [max: 8191]
direct dependencies: 0 [max: 32768]
indirect dependencies: 0
all direct dependencies: 0
in-hardirq chains: 0
in-softirq chains: 0
in-process chains: 0
stack-trace entries: 20581 [max: 524288]
combined max dependencies: 1
hardirq-safe locks: 0
hardirq-unsafe locks: 0
softirq-safe locks: 0
softirq-unsafe locks: 0
irq-safe locks: 0
irq-unsafe locks: 0
hardirq-read-safe locks: 0
hardirq-read-unsafe locks: 0
softirq-read-safe locks: 0
softirq-read-unsafe locks: 0
irq-read-safe locks: 0
irq-read-unsafe locks: 0
uncategorized locks: 1851
unused locks: 0
max locking depth: 19
debug_locks: 1
lock event信息还是对于lock debug还是非常有帮助的。
实战在另一篇博文: pr_emerg耗时,影响性能原理排查
路先生的杂货铺: 有几个链接丢失了
weixin_43655947: 重新关联module,清理干净重新编译 是指 clean了重新编译?重新关联是什么意思啊
悟空明镜: 不做调度了,哈哈。好多调度的文章我已经设置私密了。你们看不到了
静待梅花开: 点赞,继续加油,为后人多留下点宝贵经验
悟空明镜: 需要重新编译kernel,在替换修改后的kernel,最后reboot