【轉載】誰記錄了mysql error log中的超長信息
轉載: https://www.cnblogs.com/DataArt/p/10260994.html
【問題】
最近查看MySQL的error log文件時,發現有很多服務器的文件中有大量的如下日誌,內容很長(大小在200K左右),從記錄的內容看,並沒有明顯的異常信息。
有一臺測試服務器也有類似的問題,為什麽會記錄這些信息,是誰記錄的這些信息,分析的過程比較周折。
Status information:
Current dir:
Running threads: 2452 Stack size: 262144
Current locks:
lock: 0x7f783f5233f0:
Key caches:
default
Buffer_size: 8388608
Block_size: 1024
Division_limit: 100
Age_limit: 300
blocks used: 10
not flushed: 0
w_requests: 6619
writes: 1
r_requests: 275574
reads: 1235
handler status:
read_key: 32241480828
read_next: 451035381896
read_rnd 149361175
read_first: 1090473
write: 4838429521
delete 12155820
update: 3331297842
【分析過程】
1、首先在官方文檔中查到,當mysqld進程收到SIGHUP信號量時,就會輸出類似的信息,
On Unix, signals can be sent to processes. mysqld responds to signals sent to it as follows:
SIGHUP
causes the server to reload the grant tables and to flush tables, logs, the thread cache, and the host cache. These actions are like various forms of the FLUSH
statement. The server also writes a status report to the error log that has this format:
https://dev.mysql.com/doc/refman/5.6/en/server-signal-response.html
2、有別的程序在kill mysqld進程嗎,用systemtap腳本監控kill命令
probe nd_syscall.kill
{
target[tid()] = uint_arg(1);
signal[tid()] = uint_arg(2);
}
probe nd_syscall.kill.return
{
if (target[tid()] != 0) {
printf("%-6d %-12s %-5d %-6d %6d\n", pid(), execname(),
signal[tid()], target[tid()], int_arg(1));
delete target[tid()];
delete signal[tid()];
}
}
用下面命令測試,確實會在error log中記錄日誌
kill -SIGHUP 12455
從systemtap的輸出看到12455就是mysqld進程,被kill掉了,信號量是1,對應的就是SIGHUP
不過在測試環境後面問題重現時,卻沒有抓到SIGHUP的信號量。
FROM COMMAND SIG TO RESULT
17010 who 0 12153 1340429600
36681 bash 1 12455 642
3、看來並不是kill導致的,後面用gdb attach到mysqld進程上,在error log的三個入口函數sql_print_error,sql_print_warning,sql_print_information加上斷點
但是在問題重現時,程序並沒有停在斷點處
4、寫error log還有別的分支嗎,翻源碼找到了答案,原來是通過mysql_print_status函數直接寫到error log中
void mysql_print_status()
{
char current_dir[FN_REFLEN];
STATUS_VAR current_global_status_var;
printf("\nStatus information:\n\n");
(void) my_getwd(current_dir, sizeof(current_dir),MYF(0));
printf("Current dir: %s\n", current_dir);
printf("Running threads: %u Stack size: %ld\n",
Global_THD_manager::get_instance()->get_thd_count(),
(long) my_thread_stack_size);
…
puts("");
fflush(stdout);
}
5、再次用gdb attach到mysqld進程上,在mysql_print_status函數上加斷點,在問題重現時,線程停在斷點處,通過ps的結果多次對比,判斷是pt-stalk工具運行時調用了mysql_print_status
6、從堆棧中看到dispatch_command調用了mysql_print_status,下面是具體的邏輯,當command=COM_DEBUG時就會執行到mysql_print_status
case COM_DEBUG:
thd->status_var.com_other++;
if (check_global_access(thd, SUPER_ACL))
break; /* purecov: inspected */
mysql_print_status();
query_logger.general_log_print(thd, command, NullS);
my_eof(thd);
break;
7、查看pt-stalk的代碼
if [ "$mysql_error_log" -a ! "$OPT_MYSQL_ONLY" ]; then
log "The MySQL error log seems to be $mysql_error_log"
tail -f "$mysql_error_log" >"$d/$p-log_error" &
tail_error_log_pid=$!
$CMD_MYSQLADMIN $EXT_ARGV debug
else
log "Could not find the MySQL error log"
在調用mysqladmin時使用了debug模式
debug Instruct server to write debug information to log
8、在percona官網上搜到了相關的bug描述,目前bug還未修復,會在下個版本中3.0.13中修復。
https://jira.percona.com/browse/PT-1340
【解決方案】
定位到問題後,實際修復也比較簡單,將pt-stalk腳本中$CMD_MYSQLADMIN $EXT_ARGV debug中的debug去掉就可以了,測試生效。
總結:
(1) 通過mysql_print_status函數直接寫到error log中
(2) 執行mysqladmin debug
(3) 資源緊張,kill session等 (同時參考: https://dev.mysql.com/doc/refman/5.7/en/server-signal-response.html)
Status information:
Current dir: /data/mysql/mysql3306/data/
Running threads: 7 Stack size: 262144
Current locks:
lock: 0x7fdcb0a44780:
lock: 0x7fdcaf0ea980:
lock: 0x1edb5a0:
..........
..........
Key caches:
default
Buffer_size: 8388608
Block_size: 1024
Division_limit: 100
Age_limit: 300
blocks used: 9
not flushed: 0
w_requests: 0
writes: 0
r_requests: 82
reads: 13
handler status:
read_key: 16981474
read_next: 33963080
read_rnd 6
read_first: 192
write: 21270
delete 0
update: 16981221
Table status:
Opened tables: 956
Open tables: 206
Open files: 13
Open streams: 0
Memory status:
<malloc version="1">
<heap nr="0">
<sizes>
<unsorted from="140586808432240" to="140585778669336" total="0" count="140585778669312"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="0" size="0"/>
<system type="current" size="0"/>
<system type="max" size="0"/>
<aspace type="total" size="0"/>
<aspace type="mprotect" size="0"/>
</heap>
<total type="fast" count="0" size="0"/>
<total type="rest" count="0" size="0"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="0"/>
<system type="max" size="0"/>
<aspace type="total" size="0"/>
<aspace type="mprotect" size="0"/>
</malloc>
Events status:
LLA = Last Locked At LUA = Last Unlocked At
WOC = Waiting On Condition DL = Data Locked
Event scheduler status:
State : INITIALIZED
Thread id : 0
LLA : n/a:0
LUA : n/a:0
WOC : NO
Workers : 0
Executed : 0
Data locked: NO
Event queue status:
Element count : 0
Data locked : NO
Attempting lock : NO
LLA : init_queue:96
LUA : init_queue:104
WOC : NO
Next activation : never
【轉載】誰記錄了mysql error log中的超長信息