Mysql5.7 的錯誤日誌中最常見的note級別日誌解釋
阿新 • • 發佈:2018-11-19
在使用mysql5.7的時候,發現了不少在mysql5.6上不曾見過的日誌,級別為note, 最常見的note日誌以下三種,下面我們來逐個解釋。
第一種,Aborted connection
. 如上圖,資訊如下:
2016-03-17T14:44:24.102542Z 59 [Note] Aborted connection 59 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx’ (Got an error reading communication packets)
2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx(Got an error reading communication packets)
2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx(Got an error reading communication packets)
2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: ‘unconnected’ user: ‘mha’ host: ‘197.xx.xx.xx(Got timeout reading communication packets)
上面兩個是連線中斷的錯誤資訊,原因不一樣,
Got an error reading communication packets 的原因是因為網路等原因導致 。
Got timeout reading communication packets 這個錯誤的原因是會話的idle時間達到了資料庫指定的timeout時間。
第二種:SLAVE多執行緒同步的資訊
資訊如下: 2016-03-14T15:48:26.432150Z 73 [Note]Multi-threaded slave statistics for channel ”: seconds elapsed = 121; eventsassigned = 100374529; worker queues filled over
overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1451875661700 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000 2016-03-14T15:50:28.398745Z 73 [Note]Multi-threaded slave statistics for channel ”: seconds elapsed = 122; eventsassigned = 100500481; worker queues filled over
overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1452001865500 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000 我們通過原始碼,找到下面一段,該段實現了上述日誌的輸出。 if ((my_now
– rli->mts_last_online_stat)>= mts_online_stat_period) { sql_print_information(“Multi-threadedslave
statistics%s: “ “seconds
elapsed = %lu; “ “events
assigned = %llu; “ “worker
queues filled over overrun level = %lu;” “waited
due a Worker queue full = %lu; “ “waited
due the total size = %lu; “ “waited
at clock conflicts = %llu “ “waited(count)
when Workers occupied = %lu “ “waited
when Workers occupied = %llu”, rli->get_for_channel_str(), static_cast<unsignedlong> (my_now – rli->mts_last_online_stat), rli->mts_events_assigned, rli->mts_wq_overrun_cnt, rli->mts_wq_overfill_cnt, rli->wq_size_waits_cnt, rli->mts_total_wait_overlap, rli->mts_wq_no_underrun_cnt, rli->mts_total_wait_worker_avail); rli->mts_last_online_stat=my_now; 通過這一句(my_now
– rli->mts_last_online_stat), 以及最後一句rli->mts_last_online_stat=my_now; 可以得知,
seconds elapsed 就是上一次統計跟這一次統計的時間間隔。 而mts_online_stat_period =120秒, 硬程式碼,因此就是幾乎每隔120秒,就有上述日誌的輸出。
通過進一步檢視原始碼,初步瞭解上述日誌資訊的含義,如下: events assigned:總共有多少個event被分配執行,計的是總數。 worker queues filled over overrun level:多執行緒同步中,worker 的私有佇列長度超長的次數,計的是總數。 waited due a Worker queue full :因為worker的佇列超長而產生等待的次數,計的是總數。 waited due the total size :超過最大size的次數,這個由引數slave_pending_jobs_size_max 指定。
waited at clock conflicts :因為邏輯時間產生衝突的等待時間,單位是納秒。
waited (count) when Workers occupied :因為workder被佔用而出現等待的次數。(總計值)。
waited when Workersoccupied :因為workder被佔用而出現等待的總時間,總計值,單位是納秒。
第三種:page_cleaner執行緒的輸出日誌
如圖,資訊如下: 2016-03-24T17:45:28.005117Z 0 [Note] InnoDB:page_cleaner: 1000ms intended loop took 4750ms.The
settings might not be optimal. (flushed=1519 and evicted=0, during the time.) 查詢原始碼,發現是上面的日誌由下面一段程式碼輸出: if (ret_sleep
== OS_SYNC_TIME_EXCEEDED) { ulint
curr_time = ut_time_ms(); if (curr_time
>next_loop_time + 3000) { if (warn_count
== 0) { ib::info()
<< “page_cleaner: 1000ms” ” intended
loop took “ <<1000 + curr_time – next_loop_time <<“ms. The
settings might not” ” be optimal.
(flushed=” <<n_flushed_last <<” and evicted=” <<n_evicted <<“, during
the time.)”; if (warn_interval
>300) { warn_interval= 600; }else { warn_interval*= 2; } warn_count= warn_interval; } else { –warn_count; } } else { /* reset counter */ warn_interval= 1; warn_count= 0; } next_loop_time= curr_time + 1000; n_flushed_last= n_evicted = 0; } 通過分析原始碼, 輸出上述日誌的條件是curr_time> next_loop_time + 3000 ,即比原定的迴圈時間next_loop_time多3000毫秒,而next_loop_time的標準時間是1000毫秒,即1秒鐘做一次重新整理頁的操作。 loop took 4750ms ,即是這次重新整理迴圈的實際經歷時間。 後面還有一個 (flushed=1519 and evicted=0,during the time.)這樣的日誌, 即對應n_flushed_last與n_evicted 變數,而這兩個變數又由n_flushed_list與n_flushed_lru賦值。 ./storage/innobase/buf/buf0flu.cc:3322: n_flushed_last +=n_flushed_list; ./storage/innobase/buf/buf0flu.cc:3321: n_evicted += n_flushed_lru; 而n_flushed_list與n_flushed_lru賦值函式為pc_wait_finished,如下,我們來看看該函式的註釋。 pc_wait_finished(&n_flushed_lru,&n_flushed_list); /** Wait until all flush requests are finished. @param n_flushed_lru numberof pages flushed from the end of the LRU list. @param n_flushed_list numberof pages flushed from the end of the flush_list. @return trueif all flush_list flushing batch were success. */ static bool pc_wait_finished( ulint* n_flushed_lru, ulint* n_flushed_list) { 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 } 通過原始碼的註釋,我們獲知如下資訊: n_flushed_lru number of pages flushed from the end of the LRU list. n_flushed_lru 這個值表示從lru 列表尾部重新整理的頁數,也就是日誌中如evicted=0 指標的所表示的值,如果該值不為零,則存在innodb buffer不夠的嫌疑。 n_flushed_list 這個是從重新整理列表中重新整理的頁數,也就是髒頁數,也就是日誌中flushed=1519 的值。
資訊如下: 2016-03-14T15:48:26.432150Z 73 [Note]Multi-threaded slave statistics for channel ”: seconds elapsed = 121; eventsassigned = 100374529; worker queues filled over
overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1451875661700 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000 2016-03-14T15:50:28.398745Z 73 [Note]Multi-threaded slave statistics for channel ”: seconds elapsed = 122; eventsassigned = 100500481; worker queues filled over
overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1452001865500 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000 我們通過原始碼,找到下面一段,該段實現了上述日誌的輸出。 if ((my_now
– rli->mts_last_online_stat)>= mts_online_stat_period) { sql_print_information(“Multi-threadedslave
statistics%s: “ “seconds
elapsed = %lu; “ “events
assigned = %llu; “ “worker
queues filled over overrun level = %lu;” “waited
due a Worker queue full = %lu; “ “waited
due the total size = %lu; “ “waited
at clock conflicts = %llu “ “waited(count)
when Workers occupied = %lu “ “waited
when Workers occupied = %llu”, rli->get_for_channel_str(), static_cast<unsignedlong> (my_now – rli->mts_last_online_stat), rli->mts_events_assigned, rli->mts_wq_overrun_cnt, rli->mts_wq_overfill_cnt, rli->wq_size_waits_cnt, rli->mts_total_wait_overlap, rli->mts_wq_no_underrun_cnt, rli->mts_total_wait_worker_avail); rli->mts_last_online_stat=my_now; 通過這一句(my_now
– rli->mts_last_online_stat), 以及最後一句rli->mts_last_online_stat=my_now; 可以得知,
seconds elapsed 就是上一次統計跟這一次統計的時間間隔。 而mts_online_stat_period =120秒, 硬程式碼,因此就是幾乎每隔120秒,就有上述日誌的輸出。
通過進一步檢視原始碼,初步瞭解上述日誌資訊的含義,如下: events assigned:總共有多少個event被分配執行,計的是總數。 worker queues filled over overrun level:多執行緒同步中,worker 的私有佇列長度超長的次數,計的是總數。 waited due a Worker queue full :因為worker的佇列超長而產生等待的次數,計的是總數。 waited due the total size :超過最大size的次數,這個由引數slave_pending_jobs_size_max 指定。
如圖,資訊如下: 2016-03-24T17:45:28.005117Z 0 [Note] InnoDB:page_cleaner: 1000ms intended loop took 4750ms.The
settings might not be optimal. (flushed=1519 and evicted=0, during the time.) 查詢原始碼,發現是上面的日誌由下面一段程式碼輸出: if (ret_sleep
== OS_SYNC_TIME_EXCEEDED) { ulint
curr_time = ut_time_ms(); if (curr_time
>next_loop_time + 3000) { if (warn_count
== 0) { ib::info()
<< “page_cleaner: 1000ms” ” intended
loop took “ <<1000 + curr_time – next_loop_time <<“ms. The
settings might not” ” be optimal.
(flushed=” <<n_flushed_last <<” and evicted=” <<n_evicted <<“, during
the time.)”; if (warn_interval
>300) { warn_interval= 600; }else { warn_interval*= 2; } warn_count= warn_interval; } else { –warn_count; } } else { /* reset counter */ warn_interval= 1; warn_count= 0; } next_loop_time= curr_time + 1000; n_flushed_last= n_evicted = 0; } 通過分析原始碼, 輸出上述日誌的條件是curr_time> next_loop_time + 3000 ,即比原定的迴圈時間next_loop_time多3000毫秒,而next_loop_time的標準時間是1000毫秒,即1秒鐘做一次重新整理頁的操作。 loop took 4750ms ,即是這次重新整理迴圈的實際經歷時間。 後面還有一個 (flushed=1519 and evicted=0,during the time.)這樣的日誌, 即對應n_flushed_last與n_evicted 變數,而這兩個變數又由n_flushed_list與n_flushed_lru賦值。 ./storage/innobase/buf/buf0flu.cc:3322: n_flushed_last +=n_flushed_list; ./storage/innobase/buf/buf0flu.cc:3321: n_evicted += n_flushed_lru; 而n_flushed_list與n_flushed_lru賦值函式為pc_wait_finished,如下,我們來看看該函式的註釋。 pc_wait_finished(&n_flushed_lru,&n_flushed_list); /** Wait until all flush requests are finished. @param n_flushed_lru numberof pages flushed from the end of the LRU list. @param n_flushed_list numberof pages flushed from the end of the flush_list. @return trueif all flush_list flushing batch were success. */ static bool pc_wait_finished( ulint* n_flushed_lru, ulint* n_flushed_list) { 。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。 } 通過原始碼的註釋,我們獲知如下資訊: n_flushed_lru number of pages flushed from the end of the LRU list. n_flushed_lru 這個值表示從lru 列表尾部重新整理的頁數,也就是日誌中如evicted=0 指標的所表示的值,如果該值不為零,則存在innodb buffer不夠的嫌疑。 n_flushed_list 這個是從重新整理列表中重新整理的頁數,也就是髒頁數,也就是日誌中flushed=1519 的值。