1. 程式人生 > >nginx前臺啟動-段錯誤除錯

nginx前臺啟動-段錯誤除錯

nginx.conf中

daemon on|off

master_process on|off

在除錯Nginx功能的時候,出現如下問題:

2017/02/27 16:23:50 [notice] 13604#0: signal 17 (SIGCHLD) received
2017/02/27 16:23:50 [alert] 13604#0: worker process 13605 exited on signal 11 (core dumped)
2017/02/27 16:23:50 [notice] 13604#0: start worker process 13816 0
2017/02/27 16:23:50 [notice] 13604#0: signal 29 (SIGIO) received

發現有子程序發生了core dump。

但是在nginx/logs目錄下面執行

ulimit -c unlimited 

還是沒有core檔案產生。

解決方法是在nginx.conf檔案中 新增

worker_rlimit_core 10000m;
working_directory /usr/local/nginx/logs;

重新啟動Nginx,復現問題。

相應的core檔案出現在logs目錄下

結合lua時候可能不生產段錯誤:除錯方法如下。
[[email protected] sbin]# gdb smartl7
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/WiseGrid/smartl7/sbin/smartl7_normal...(no debugging symbols found)...done.
(gdb) run
Starting program: /opt/WiseGrid/smartl7/sbin/smartl7 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x0000000000543aeb in ngx_http_lua_ngx_error_page ()
Missing separate debuginfos, use: debuginfo-install glibc-2.17-196.el7.x86_64 lua-5.1.4-15.el7.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64

[[email protected]

sbin]# yum install yum-utils
安裝debuginfo-install glibc-2.17-196.el7.x86_64 lua-5.1.4-15.el7.x86_64 nss-softokn-freebl-3.28.3-8.el7_4.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-17.el7.x86_64
Program received signal SIGSEGV, Segmentation fault.
0x0000000000543aeb in ngx_http_lua_ngx_error_page ()
(gdb) bt
#0  0x0000000000543aeb in ngx_http_lua_ngx_error_page ()
#1  0x00007ffff7766324 in luaD_precall ([email protected]=0xbe5500, [email protected]=0xc03c70, [email protected]=0) at ldo.c:319
#2  0x00007ffff7770e57 in luaV_execute ([email protected]=0xbe5500, [email protected]=1) at lvm.c:590
#3  0x00007ffff776674d in luaD_call (L=0xbe5500, func=0xc03c60, nResults=<optimized out>) at ldo.c:377
#4  0x00007ffff7765a6e in luaD_rawrunprotected ([email protected]=0xbe5500, [email protected]=0x7ffff7761050 <f_call>, [email protected]=0x7fffffffdb80) at ldo.c:116
#5  0x00007ffff77668da in luaD_pcall ([email protected]=0xbe5500, [email protected]=0x7ffff7761050 <f_call>, [email protected]=0x7fffffffdb80, old_top=32, ef=<optimized out>) at ldo.c:463
#6  0x00007ffff776244d in lua_pcall (L=0xbe5500, nargs=0, nresults=1, errfunc=<optimized out>) at lapi.c:821
#7  0x0000000000567324 in ngx_http_lua_header_filter_by_chunk ()
#8  0x000000000056751f in ngx_http_lua_header_filter_inline ()
#9  0x000000000056779b in ngx_http_lua_header_filter ()
#10 0x0000000000586648 in ngx_http_subs_header_filter ()
#11 0x00000000004dd510 in ngx_http_not_modified_header_filter ()
#12 0x000000000049c587 in ngx_http_send_header ()
#13 0x00000000004a5dcf in ngx_http_send_special_response ()
#14 0x00000000004a4ecb in ngx_http_special_response_handler ()
#15 0x00000000004ab450 in ngx_http_finalize_request ()
#16 0x0000000000499eb6 in ngx_http_core_rewrite_phase ()
#17 0x0000000000499cc4 in ngx_http_core_run_phases ()
#18 0x0000000000499c32 in ngx_http_handler ()
#19 0x00000000004aa7ac in ngx_http_process_request ()
#20 0x00000000004a90bd in ngx_http_process_request_headers ()
#21 0x00000000004a8497 in ngx_http_process_request_line ()
#22 0x00000000004a707d in ngx_http_wait_request_handler ()
#23 0x00000000004898d3 in ngx_epoll_process_events ()
#24 0x0000000000479d24 in ngx_process_events_and_timers ()
#25 0x0000000000485ddc in ngx_single_process_cycle ()
#26 0x000000000044eb28 in main ()

nginx 出現 segfault at 0 ip 000000000043f750

2015-08-17 表現為nginx有error.log中出現:

2015/08/17 19:30:54 [alert] 21773#0: worker process 21780 exited on signal 11

系統/var/log/messages中出現:

Aug 17 19:24:09 BJ-ZW-9-123 kernel: nginx[34285]: segfault at 0 ip 000000000043f750 sp 00007fff4f309970 error 4 in nginx[400000+299000]

面客戶端也會報錯,具體是不同的客戶端提示出來不一樣。大致是ssl connect fail。

查詢問題

從log上看,是c語言那種段錯誤。 一般都是使用了空指標變數之類。

1. 下載nginx原始碼進行編譯 CFLAGS=”-g -O0” ./configure –with-debug

WTB,官方里使用worker_rlimit_core 500M; // 500M,實際中還不夠,我的core檔案要達到540多M。

2. 從系統開啟coredump檔案

按照官網的方式足夠開啟dump core檔案的,這裡說一下另一種方法:

$> ulimit -c unlimited

在不限coredump檔案大小後,還可以配置core檔案輸出目錄。 不然的話,ngx預設可能輸出到啟動程序的目錄或配置中的工作目錄working_directory;

$> mkdir /corefile     # 先建目錄,還要確認nginx使用者可以寫此目錄
$> echo "/corefile/core-%e-%p-%h-%t" > /proc/sys/kernel/core_pattern

按以上配置後,coredump檔案就會輸出到/corefile 目錄

  1. 等待coredump檔案生成與使用gdb分析:

    [email protected]:/corefile$ gdb /usr/local/nginx/sbin/nginx core-nginx-40162-1439808176
    GNU gdb (GDB) Red Hat Enterprise Linux (7.2-83.el6)
    Copyright (C) 2010 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-redhat-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>...
    Reading symbols from /usr/local/nginx/sbin/nginx...done.
    
    warning: exec file is newer than core file.
    [New Thread 40162]
    Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
    [Thread debugging using libthread_db enabled]
    Loaded symbols for /lib64/libpthread.so.0
    Core was generated by `nginx: worker process  '.
    Program terminated with signal 11, Segmentation fault.
    #0  0x000000000043f750 in ngx_ssl_new_session (ssl_conn=0x2464a20, sess=0x25689c0) at src/event/ngx_event_openssl.c:2309
    warning: Source file is more recent than executable.
    2309    cache = shm_zone->data;
    Missing separate debuginfos, use: debuginfo-install nginx-1.4.4-1.el6.x86_64
    (gdb)
    
  2. 從上面最後幾句中輸出:

    src/event/ngx_event_openssl.c:2309
    warning: Source file is more recent than executable.
    2309    cache = shm_zone->data;
    

ngx_event_openssl.c檔案的2309行。 很可能就是shm_zone是NULL。開啟原始檔看

2306: ssl_ctx = SSL_get_SSL_CTX(ssl_conn);
2307: shm_zone = SSL_CTX_get_ex_data(ssl_ctx, ngx_ssl_session_cache_index);
2308:
2309: cache = shm_zone->data;

很明顯 並沒有檢測 SSL_CTX_get_ex_data 的返回值,其有可能是NULL。 然後直接使用了shm_zone(可能NULL)。出現段錯誤,nginx崩了。 然後,對原始碼進行小修改後,再編譯,果然在error.log出來下面的log

2308: if(shm_zone == NULL){
2309:     ngx_log_error(NGX_LOG_ERR, c->log, 0,
2310:          "shm_zone was NULL.%s", shpool->log_ctx);
2311:     return 0;
2312: }

4.配置的原因:

產生此段錯誤的原因是,我們整站都是使用HTTPS,對ssl的配置就下面2項:

ssl_session_cache   shared:SSL:500m;
ssl_session_timeout 10m;

很明顯,當連線數很大時,500M的共享記憶體不夠用。然後錯碼沒處理NULL返回. 我再搜尋了一下nginx原始碼對此函式的使用:

[email protected]:~/nginx-1.8.0/src$ grep SSL_CTX_get_ex_data * -Rn
event/ngx_event_openssl.c:2153:    cert = SSL_CTX_get_ex_data(ssl->ctx, ngx_ssl_certificate_index);
event/ngx_event_openssl.c:2307:    shm_zone = SSL_CTX_get_ex_data(ssl_ctx, ngx_ssl_session_cache_index);
event/ngx_event_openssl.c:2458:    shm_zone = SSL_CTX_get_ex_data(SSL_get_SSL_CTX(ssl_conn),
event/ngx_event_openssl.c:2551:    shm_zone = SSL_CTX_get_ex_data(ssl, ngx_ssl_session_cache_index);
event/ngx_event_openssl.c:2845:    keys = SSL_CTX_get_ex_data(ssl_ctx, ngx_ssl_session_ticket_keys_index);
event/ngx_event_openssl_stapling.c:198:    staple = SSL_CTX_get_ex_data(ssl->ctx, ngx_ssl_stapling_index);
event/ngx_event_openssl_stapling.c:267:    staple = SSL_CTX_get_ex_data(ssl->ctx, ngx_ssl_stapling_index);
event/ngx_event_openssl_stapling.c:268:    cert = SSL_CTX_get_ex_data(ssl->ctx, ngx_ssl_certificate_index);
event/ngx_event_openssl_stapling.c:353:    staple = SSL_CTX_get_ex_data(ssl->ctx, ngx_ssl_stapling_index);
event/ngx_event_openssl_stapling.c:440:    staple = SSL_CTX_get_ex_data(ssl->ctx, ngx_ssl_stapling_index);

5. 最新版nginx-1.9.3依然有此問題