1. 程式人生 > 其它 >[nginx] async_mode_nginx CPU 100% deadlock問題分析

[nginx] async_mode_nginx CPU 100% deadlock問題分析

很遺憾只定位到了一個比較小的問題範圍,理清了root cause, 但是沒有找到復現的邊界條件以及solution.

Hi all, I have the quite same problem with the latest software version:
async_nginx: 0.4.5
openssl: 1.1.1k
qatengine: 0.6.4
qatdriver: 1.7.l.4.13.0.9

the reproduce situation: config values in nginx.conf :
default_algorithms CIPHERS
qat_poll_mode heuristic

I have debuged async_ningx and found there 
is a infinite loop. I think this is the reason here. 1 in function ngx_http_do_read_client_request_body(), nginx goin the for(;;)[line:288] loop and never break. as recv()[line:343] always return NGX_AGAIN, and c->read->ready always == 1 go deep in recv(), the NGX_AGAIN is return by func ngx_ssl_handle_recv()::line:2546
because of async job is paused. 2. when async context swapd, an other infinite loop was happend. in function qat_chained_ciphers_do_cipher() line:1554 as the read()[qat_pause_job():line279] always return EAGAIN. 3. As I know qat_crypto_callbackFn() is called by func qat_engine_poll(). I think, this
because of the callback function qat_crypto_callbackFn() never have any CPU chance/CPU TIME to be called, then the paused async job never be waked up. then I check the POLL logic in async_nginx. I found point 4 descripte below. 4. In function ngx_ssl_engine_qat_heuristic_poll(), all the values of the six variables(num_*) never grow up, so function qat_engine_poll() have no any chance to execute. when I change my engine config in nginx.conf, this issue is disappear, and i can work around. the config like below: qat_heuristic_poll_asym_threshold = 0 qat_heuristic_poll_sym_threshold = 0 It seems a logic deadlock here ? nginx want qat to update counters but counters updated need nginx release some CPU time. or, maybe the following code do not consider the long time idle SSL connections ? if (*num_asym_requests_in_flight + *num_kdf_requests_in_flight + *num_cipher_requests_in_flight + *num_asym_mb_items_in_queue + *num_kdf_mb_items_in_queue + *num_sym_mb_items_in_queue >= (int) *ngx_ssl_active) { Anyone have any idea about this ?

詳見:https://github.com/intel/QAT_Engine/issues/181