epoll驚群原因分析
阿新 • • 發佈:2017-05-05
all lec 水平 next log lag 到來 delayed eas 考慮如下情況(實際一般不會做,這裏只是舉個例子):
而在水平觸發方式下,從就緒鏈表中移出來的文件描述符,如果當前仍有事件就緒(可讀、可寫等),會在復制到用戶空間後被再次添加到就緒鏈表中:
- 在主線程中創建一個socket、綁定到本地端口並監聽
- 在主線程中創建一個epoll實例(epoll_create(2))
- 將監聽socket添加到epoll中(epoll_ctl(2))
- 創建多個子線程,每個子線程都共享步驟2裏創建的同一個epoll文件描述符,然後調用epoll_wait(2)等待事件到來accept(2)
- 請求到來,新連接建立
這裏的問題就是,在第5步的時候,會有多少個線程被喚醒而從epoll_wait()調用返回?答案是不一定,可能只有一個,也可能有部分,也可能是全部。當然在多個線程都喚醒的情況下,只會有一個線程accept()調用會成功。
為何如此?從內核代碼分析,原因如下:
在調用epoll_wait(2)的時候,設置的epoll的等待隊列回調函數是default_wake_function,添加隊列的時候調用的是__add_wait_queue_exclusive()。 ep_poll_callback()中喚醒操作調用的是wake_up_locked(&ep->wq),最終會調用__wake_up_common,後者會判斷exclusive標誌:static void __wake_up_common(wait_queue_head_t *q, unsigned int mode, int nr_exclusive, int wake_flags, void *key) { wait_queue_t *curr, *next; list_for_each_entry_safe(curr, next, &q->task_list, task_list) { unsigned flags = curr->flags; if (curr->func(curr, mode, wake_flags, key) && (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) break; } }
因為__wake_up_common()的調用是從wake_up_locked()開始的,__wake_up_common的各個參數值為:
- q: struct eventpoll.wq
- mode: TASK_NORMAL
- nr_exclusive:1
- wake_flags: 0
- key:NULL。
- curr->flags: WQ_FLAG_EXCLUSIVE
- curr->func: default_wake_function
if (!list_empty(&ep->rdllist)) { /* * Wake up (if active) both the eventpoll wait list and * the ->poll() wait list (delayed after we release the lock). */ if (waitqueue_active(&ep->wq)) wake_up_locked(&ep->wq); if (waitqueue_active(&ep->poll_wait)) pwake++; }
if (epi->event.events & EPOLLONESHOT) epi->event.events &= EP_PRIVATE_BITS; else if (!(epi->event.events & EPOLLET)) { /* * If this file has been added with Level * Trigger mode, we need to insert back inside * the ready list, so that the next call to * epoll_wait() will check again the events * availability. At this point, no one can insert * into ep->rdllist besides us. The epoll_ctl() * callers are locked out by * ep_scan_ready_list() holding "mtx" and the * poll callback will queue them in ep->ovflist. */ list_add_tail(&epi->rdllink, &ep->rdllist); ep_pm_stay_awake(epi); }因此在水平觸發模式下,被喚醒的進程又會去喚醒其他進程,除非當前事件已經被處理完或者所有進程都已經被喚醒(被喚醒的進程會從epoll等待隊列上移除)。
epoll驚群原因分析