關於call_rcu在內核模塊退出時可能引起kernel panic的問題

阿新 • • 發佈：2017-08-29

ins con ces lte cti notice oca res 退出

http://paulmck.livejournal.com/7314.html

RCU的作者，paul在他的blog中有提到這個問題，也明確提到需要在module exit的地方使用rcu_barrier來等待保證call_rcu的回調函數callback能夠執行完成，然後再正式卸載模塊，方式快速卸載之後call_back回調發現空指針的問題，從而導致kernel panic的問題。

RCU and unloadable modules

Jun. 8th, 2009 at 1:38 PM

The rcu_barrier() function was described some time back in an article on Linux Weekly News. This rcu_barrier()

function solves the problem where a given module invokes call_rcu() using a function in that module, but the module is removed before the corresponding grace period elapses, or at least before the callback can be invoked. This results in an attempt to call a function whose code has been removed from the Linux kernel. Oops!!!

Since the above article was written, rcu_barrier_bh()

and rcu_barrier_sched() have been accepted into the Linux kernel, for use with call_rcu_bh() and call_rcu_sched(), respectively. These functions have seen relatively little use, which is no surprise, given that they are quite specialized. However, Jesper Dangaard recently discovered that they need to be used a bit more heavily. This lead to the question of exactly when they needed to be used, to which I responded as follows:

Unless there is some other mechanism to ensure that all the RCU callbacks have been invoked before the module exit, there needs to be code in the module-exit function that does the following:

Prevents any new RCU callbacks from being posted. In other words, make sure that no future call_rcu()invocations happen from this module unless those call_rcu() invocations touch only functions and data that outlive this module.

Invokes rcu_barrier().

Of course, if the module uses call_rcu_sched() instead of call_rcu(), then it should invoke rcu_barrier_sched() instead of rcu_barrier(). Similarly, if it uses call_rcu_bh() instead of call_rcu(), then it should invoke rcu_barrier_bh() instead of rcu_barrier(). If the module uses more than one of call_rcu(), call_rcu_sched(), and call_rcu_bh(), then it must invoke more than one of rcu_barrier(), rcu_barrier_sched(), and rcu_barrier_bh().

What other mechanism could be used? I cannot think of one that it safe. For example, a module that tried to count the number of RCU callbacks in flight would be vulnerable to races as follows:

CPU 0: RCU callback decrements the counter.

CPU 1: module-exit function notices that the counter is zero, so removes the module.

CPU 0: attempts to execute the code returning from the RCU callback, and dies horribly due to that code no longer being in memory.

If there was an easy solution (or even a hard solution) to this problem, then I do not believe that Nikita Danilov would have asked Dipankar Sarma for rcu_barrier(). Therefore, I do not expect anyone to be able to come up with an alternative to rcu_barrier() and friends. Always happy to learn something by being proven wrong, of course!!!

So unless someone can show me some other safe mechanism, every unloadable module that uses call_rcu(), call_rcu_sched(), or call_rcu_bh() must use rcu_barrier(), rcu_barrier_sched(), and/or rcu_barrier_bh() in its module-exit function.

So if you have a module that uses one of the call_rcu() functions, please use the corresponding rcu_barrier()function in the module-exit code!

Update: Peter Zijlstra rightly points out that the issue is not whether your module invokes call_rcu(), but rather whether the corresponding RCU callback invokes a function that is in a module. So, if there is a call_rcu(), call_rcu_sched(), or call_rcu_bh() anywhere in the kernel whose RCU callback either directly or indirectly invokes a function in your module, then your module‘s exit function needs to invoke rcu_barrier(), rcu_barrier_sched(), and/or rcu_barrier_bh(). Thanks to Peter for pointing this out!

關於call_rcu在內核模塊退出時可能引起kernel panic的問題

ins con ces lte cti notice oca res 退出 http://paulmck.livejournal.com/7314.html RCU的作者，paul在他的blog中有提到這個問題，也明確提到需要在module exit的地方使用rcu_bar

關於call_rcu在內核模塊退出時可能引起kernel panic的問題

RCU and unloadable modules

關於call_rcu在內核模塊退出時可能引起kernel panic的問題

AM335x內核模塊驅動之LED

Linux內核模塊編程與內核模塊LICENSE -《具體解釋（第3版）》預讀

Linux 內核模塊編譯 Makefile

Linux 內核模塊查看命令

在線枚舉內核模塊函數及地址（win64位）

Linux內核模塊簡單示例

【舊文章搬運】ZwQuerySystemInformation枚舉內核模塊及簡單應用

關於linux內核模塊的裝載過程

linux　processinfo-threads內核模塊

python筆記--內置模塊

走入計算機的第二十五天（內置模塊3之正則表達式）

走入計算機的第二十六天（內置模塊4）

python---內置模塊

PYTHON學習第二模塊 python內置模塊介紹

Python內置模塊--os模塊的使用

python內置模塊--re正則

Python內置模塊列表

day15-python常用內置模塊的使用

學習筆記（11月10日）--python常用內置模塊的使用（logging， os， command）

關於call_rcu在內核模塊退出時可能引起kernel panic的問題

RCU and unloadable modules

相關推薦