Linux死鎖檢測-Lockdep
專題:Linux內存管理專題
關鍵詞:LockDep、spinlock、mutex。
lockdep是內核提供協助發現死鎖問題的功能。
本文首先介紹何為lockdep,然後如何在內核使能lockdep,並簡單分析內核lockdep相關代碼。
最後構造不同死鎖用例,並分析如何根據lockdep輸出發現問題根源。
1. Lockdep介紹
死鎖是指兩個或多個進程因爭奪資源而造成的互相等待的現象。
常見的死鎖有如下兩種:
遞歸死鎖:中斷等延遲操作中使用了鎖,和外面的鎖構成了遞歸死鎖。
AB-BA死鎖:多個鎖因處理不當而引發死鎖,多個內核路徑上的所處理順序不一致也會導致死鎖。
Linux內核提供死鎖調試模塊Lockdep,跟蹤每個鎖的自身狀態和各個鎖之間的依賴關系,經過一系列的驗證規則來確保鎖之間依賴關系是正確的。
2. 內核死鎖檢測Lockdep
2.1 使能Lockdep
Lockdep檢測的鎖包括spinlock、rwlock、mutex、rwsem的死鎖,鎖的錯誤釋放,原子操作中睡眠等錯誤行為。
在內核中配置路徑為:Kernel hacking->Lock Debugging (spinlocks, mutexes, etc...)。
下面是lockcep內核選項及其解釋:
CONFIG_DEBUG_RT_MUTEXES=y
檢測rt mutex的死鎖,並自動報告死鎖現場信息。
CONFIG_DEBUG_SPINLOCK=y
檢測spinlock的未初始化使用等問題。配合NMI watchdog使用,能發現spinlock死鎖。
CONFIG_DEBUG_MUTEXES=y檢測並報告mutex錯誤
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
檢測wait/wound類型mutex的slowpath測試。
CONFIG_DEBUG_LOCK_ALLOC=y
檢測使用中的鎖(spinlock/rwlock/mutex/rwsem)被釋放,或者使用中的鎖被重新初始化,或者在進程退出時持有鎖。
CONFIG_PROVE_LOCKING=y
使內核能在死鎖發生前報告死鎖詳細信息。參見/proc/lockdep_chains。
CONFIG_LOCKDEP=y
整個Lockdep的總開關。參見/proc/lockdep、/proc/lockdep_stats。
CONFIG_LOCK_STAT=y記鎖持有競爭區域的信息,包括等待時間、持有時間等等信息。參見/proc/lock_stat。
CONFIG_DEBUG_LOCKDEP=y會對Lockdep的使用過程中進行更多的自我檢測,會增加很多額外開銷。
CONFIG_DEBUG_ATOMIC_SLEEP=y
在atomic section中睡眠可能造成很多不可預測的問題,這些atomic section包括spinlock持鎖、rcu讀操作、禁止內核搶占部分、中斷處理中等等。
2.2 Lock相關內核節點
/proc/sys/kernel/lock_stat------------------------置位則可以查看/proc/lock_stat統計信息,清楚則關閉lockdep統計信息。
/proc/sys/kernel/max_lock_depth--------------
/proc/sys/kernel/prove_locking
/proc/locks
/proc/lock_stat-------------------------------------關於鎖的使用統計信息
/proc/lockdep---------------------------------------存在依賴關系的鎖
/proc/lockdep_stats------------------------------存在依賴關系鎖的統計信息
/proc/lockdep_chains----------------------------依賴關系鎖鏈表
內核還提供了了Tracepoint協助發現鎖的使用問題:/sys/kernel/debug/tracing/events/lock。
2.3 lockdep代碼簡單分析
3. Lockdep測試
3.1 測試spin_lock死鎖
構造測試用例代碼如下:
void hack_spinAB(void) { printk("hack_lockdep:A->B\n"); spin_lock(&hack_spinA); spin_lock(&hack_spinB); } void hack_spinBA(void) { printk("hack_lockdep:B->A\n"); spin_lock(&hack_spinB); } static int __init lockdep_test_init(void) { printk("al: lockdep error test init\n"); hack_spinAB(); hack_spinBA(); return 0; }
執行insmod data/lock.ko 後,控制臺顯示如下。
首先從死鎖描述大概可以知道死鎖類型。
然後詳細介紹了產生死鎖的點,這時就可以大概知道是哪個鎖,有哪些地方調用導致了死鎖。
接著是詳細的發生死鎖的backtrace,有助於分析死鎖產生時的棧回溯。
al: lockdep error test init
hack_lockdep:A->B
hack_lockdep:B->A=============================================
[ INFO: possible recursive locking detected ]---------------------------------------------------------------檢測到的死鎖描述:遞歸死鎖類型
4.0.0+ #87 Tainted: G O
---------------------------------------------
insmod/658 is trying to acquire lock:---------------------------------------------------------------------------死鎖細節描述:欲持鎖點和已持鎖點
(hack_spinB){+.+...}, at: [<bf002030>] lockdep_test_init+0x30/0x3c [lock]--------------------------lockdep_test_init中調用hack_spinBA再次持有hack_spinB鎖but task is already holding lock:
(hack_spinB){+.+...}, at: [<bf000038>] hack_spinAB+0x38/0x3c [lock]--------------------------------hack_spinB已經在hack_spinAB函數中被持有other info that might help us debug this:-----------------------------------------------------------------------鎖的其它補充信息
Possible unsafe locking scenario:CPU0
----
lock(hack_spinB);
lock(hack_spinB);*** DEADLOCK ***
May be due to missing lock nesting notation
2 locks held by insmod/658:----------------------------------------------------------------------------------------進程共持有兩個鎖
#0: (hack_spinA){+.+...}, at: [<bf000030>] hack_spinAB+0x30/0x3c [lock]
#1: (hack_spinB){+.+...}, at: [<bf000038>] hack_spinAB+0x38/0x3c [lock]stack backtrace:--------------------------------------------------------------------------------------------------------棧回溯信息:可以看出從lockdep_test_init->_raw_spin_lock->lock_acquire的調用路徑。
CPU: 0 PID: 658 Comm: insmod Tainted: G O 4.0.0+ #87
Hardware name: ARM-Versatile Express
[<c00171b4>] (unwind_backtrace) from [<c0012e7c>] (show_stack+0x20/0x24)
[<c0012e7c>] (show_stack) from [<c05ade10>] (dump_stack+0x8c/0xb4)
[<c05ade10>] (dump_stack) from [<c006b988>] (__lock_acquire+0x1aa4/0x1f64)
[<c006b988>] (__lock_acquire) from [<c006c55c>] (lock_acquire+0xf4/0x190)
[<c006c55c>] (lock_acquire) from [<c05b4ec8>] (_raw_spin_lock+0x60/0x98)
[<c05b4ec8>] (_raw_spin_lock) from [<bf002030>] (lockdep_test_init+0x30/0x3c [lock])
[<bf002030>] (lockdep_test_init [lock]) from [<c0008a28>] (do_one_initcall+0x9c/0x1e8)
[<c0008a28>] (do_one_initcall) from [<c05abf30>] (do_init_module+0x70/0x1c0)
[<c05abf30>] (do_init_module) from [<c00a4ddc>] (load_module+0x18b0/0x1f90)
[<c00a4ddc>] (load_module) from [<c00a55fc>] (SyS_init_module+0x140/0x150)
[<c00a55fc>] (SyS_init_module) from [<c000ec80>] (ret_fast_syscall+0x0/0x4c)
INFO: rcu_sched self-detected stall on CPU
0: (2099 ticks this GP) idle=5ed/140000000000001/0 softirq=13024/13024 fqs=1783
(t=2100 jiffies g=-51 c=-52 q=22)
Task dump for CPU 0:
insmod R running 0 658 657 0x00000002
[<c00171b4>] (unwind_backtrace) from [<c0012e7c>] (show_stack+0x20/0x24)
[<c0012e7c>] (show_stack) from [<c0052874>] (sched_show_task+0x128/0x184)
[<c0052874>] (sched_show_task) from [<c0055dd0>] (dump_cpu_task+0x48/0x4c)
[<c0055dd0>] (dump_cpu_task) from [<c0082878>] (rcu_dump_cpu_stacks+0x9c/0xd4)
[<c0082878>] (rcu_dump_cpu_stacks) from [<c008665c>] (rcu_check_callbacks+0x640/0x968)
[<c008665c>] (rcu_check_callbacks) from [<c008b628>] (update_process_times+0x4c/0x74)
[<c008b628>] (update_process_times) from [<c009a1d4>] (tick_periodic+0x54/0xf8)
[<c009a1d4>] (tick_periodic) from [<c009a3d8>] (tick_handle_periodic+0x38/0x98)
[<c009a3d8>] (tick_handle_periodic) from [<c00164a4>] (twd_handler+0x40/0x50)
[<c00164a4>] (twd_handler) from [<c007dfc4>] (handle_percpu_devid_irq+0xd8/0x1dc)
[<c007dfc4>] (handle_percpu_devid_irq) from [<c0079a7c>] (generic_handle_irq+0x3c/0x4c)
[<c0079a7c>] (generic_handle_irq) from [<c0079dc4>] (__handle_domain_irq+0x6c/0xc4)
[<c0079dc4>] (__handle_domain_irq) from [<c0008740>] (gic_handle_irq+0x34/0x6c)
[<c0008740>] (gic_handle_irq) from [<c0013a44>] (__irq_svc+0x44/0x5c)
Exception stack(0xed5c9d18 to 0xed5c9d60)
9d00: 00000000 00010000
9d20: 0000ffff c02f3898 bf0001b0 c0b1d248 123cc000 00000000 0c99b2c5 00000000
9d40: 00000000 ed5c9d84 ed5c9d60 ed5c9d60 c0070cb4 c0070cb4 60000013 ffffffff
[<c0013a44>] (__irq_svc) from [<c0070cb4>] (do_raw_spin_lock+0xf0/0x1e0)
[<c0070cb4>] (do_raw_spin_lock) from [<c05b4eec>] (_raw_spin_lock+0x84/0x98)
[<c05b4eec>] (_raw_spin_lock) from [<bf002030>] (lockdep_test_init+0x30/0x3c [lock])
[<bf002030>] (lockdep_test_init [lock]) from [<c0008a28>] (do_one_initcall+0x9c/0x1e8)
[<c0008a28>] (do_one_initcall) from [<c05abf30>] (do_init_module+0x70/0x1c0)
[<c05abf30>] (do_init_module) from [<c00a4ddc>] (load_module+0x18b0/0x1f90)
[<c00a4ddc>] (load_module) from [<c00a55fc>] (SyS_init_module+0x140/0x150)
[<c00a55fc>] (SyS_init_module) from [<c000ec80>] (ret_fast_syscall+0x0/0x4c)
BUG: spinlock lockup suspected on CPU#0, insmod/658------------------------------------------------------------錯誤類型是spinlock,下面的backtrace和上面基本一致。
lock: hack_spinB+0x0/0xfffffedc [lock], .magic: dead4ead, .owner: insmod/658, .owner_cpu: 0-----------發生死鎖的是hack_spinB
CPU: 0 PID: 658 Comm: insmod Tainted: G O 4.0.0+ #87
Hardware name: ARM-Versatile Express
[<c00171b4>] (unwind_backtrace) from [<c0012e7c>] (show_stack+0x20/0x24)
[<c0012e7c>] (show_stack) from [<c05ade10>] (dump_stack+0x8c/0xb4)
[<c05ade10>] (dump_stack) from [<c0070b2c>] (spin_dump+0x8c/0xd0)
[<c0070b2c>] (spin_dump) from [<c0070cd0>] (do_raw_spin_lock+0x10c/0x1e0)
[<c0070cd0>] (do_raw_spin_lock) from [<c05b4eec>] (_raw_spin_lock+0x84/0x98)
[<c05b4eec>] (_raw_spin_lock) from [<bf002030>] (lockdep_test_init+0x30/0x3c [lock])
[<bf002030>] (lockdep_test_init [lock]) from [<c0008a28>] (do_one_initcall+0x9c/0x1e8)
[<c0008a28>] (do_one_initcall) from [<c05abf30>] (do_init_module+0x70/0x1c0)
[<c05abf30>] (do_init_module) from [<c00a4ddc>] (load_module+0x18b0/0x1f90)
[<c00a4ddc>] (load_module) from [<c00a55fc>] (SyS_init_module+0x140/0x150)
[<c00a55fc>] (SyS_init_module) from [<c000ec80>] (ret_fast_syscall+0x0/0x4c)
3.2 mutex測試
執行insmod /data/mutexlock.ko,稍後結果如下。
首先是死鎖類型介紹。
然後是產生死鎖的兩個點的調用者,再詳細給出了兩個點的棧回溯。
最後是死鎖點的詳細棧回溯。
====================================================== [ INFO: possible circular locking dependency detected ] 4.0.0+ #92 Tainted: G O ------------------------------------------------------- kworker/1:1/343 is trying to acquire lock: (mutex_a){+.+...}, at: [<bf000080>] lockdep_test_worker+0x24/0x58 [mutexlock] but task is already holding lock: ((&(&delay_task)->work)){+.+...}, at: [<c0041078>] process_one_work+0x130/0x60c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((&(&delay_task)->work)){+.+...}: [<c00406f4>] flush_work+0x4c/0x2bc [<c0041cc4>] __cancel_work_timer+0xa8/0x1d0 [<c0041e28>] cancel_delayed_work_sync+0x1c/0x20 [<bf000138>] lockdep_thread+0x84/0xa4 [mutexlock] [<c0046ee0>] kthread+0x100/0x118 [<c000ed50>] ret_from_fork+0x14/0x24 -> #0 (mutex_a){+.+...}: [<c006c55c>] lock_acquire+0xf4/0x190 [<c05b09e4>] mutex_lock_nested+0x90/0x480 [<bf000080>] lockdep_test_worker+0x24/0x58 [mutexlock] [<c0041138>] process_one_work+0x1f0/0x60c [<c0041fd0>] worker_thread+0x54/0x530 [<c0046ee0>] kthread+0x100/0x118 [<c000ed50>] ret_from_fork+0x14/0x24 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((&(&delay_task)->work)); lock(mutex_a); lock((&(&delay_task)->work)); lock(mutex_a); *** DEADLOCK *** 2 locks held by kworker/1:1/343: #0: ("events"){.+.+.+}, at: [<c0041078>] process_one_work+0x130/0x60c #1: ((&(&delay_task)->work)){+.+...}, at: [<c0041078>] process_one_work+0x130/0x60c stack backtrace: CPU: 1 PID: 343 Comm: kworker/1:1 Tainted: G O 4.0.0+ #92 Hardware name: ARM-Versatile Express Workqueue: events lockdep_test_worker [mutexlock] [<c00171b4>] (unwind_backtrace) from [<c0012e7c>] (show_stack+0x20/0x24) [<c0012e7c>] (show_stack) from [<c05ade10>] (dump_stack+0x8c/0xb4) [<c05ade10>] (dump_stack) from [<c0065e80>] (print_circular_bug+0x21c/0x344) [<c0065e80>] (print_circular_bug) from [<c006be44>] (__lock_acquire+0x1f60/0x1f64) [<c006be44>] (__lock_acquire) from [<c006c55c>] (lock_acquire+0xf4/0x190) [<c006c55c>] (lock_acquire) from [<c05b09e4>] (mutex_lock_nested+0x90/0x480) [<c05b09e4>] (mutex_lock_nested) from [<bf000080>] (lockdep_test_worker+0x24/0x58 [mutexlock]) [<bf000080>] (lockdep_test_worker [mutexlock]) from [<c0041138>] (process_one_work+0x1f0/0x60c) [<c0041138>] (process_one_work) from [<c0041fd0>] (worker_thread+0x54/0x530) [<c0041fd0>] (worker_thread) from [<c0046ee0>] (kthread+0x100/0x118) [<c0046ee0>] (kthread) from [<c000ed50>] (ret_from_fork+0x14/0x24)
上面的backtrace,和下面的代碼流程對照,只有在打開CONFIG_PROVE_LOCKING才會打印相關信息。
lockdep_test_worker ->mutex_lock(&mutex_a) ->mutex_lock_nested ->__mutex_lock_common ->mutex_acquire_nest ->lock_acquire_exclusive ->lock_acquire ->__lock_acquire-----------------------------------------下面的validate_chain在打開CONFIG_PROVE_LOCKING才會進行檢查。 ->validate_chain->...->print_circular_bug
參考文檔
- 《Linux 死鎖檢測模塊 Lockdep 簡介》
- 內核幫助文檔:Documentation/locking/
Linux死鎖檢測-Lockdep