1. 程式人生 > 實用技巧 >Linux OOM Killer造成資料庫訪問異常排查

Linux OOM Killer造成資料庫訪問異常排查

伺服器上的伺服器訪問異常,檢視/va/log/messages發現如下:

Sep 22 16:08:21 safeserver kernel: java invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
Sep 22 16:08:21 safeserver kernel: java cpuset=/ mems_allowed=0
Sep 22 16:08:21 safeserver kernel: Pid: 14859, comm: java Not tainted 2.6.32-754.30.2.el6.x86_64 #1

OOM Killer機制是怎樣?又如何設定防止此種情況發生?Linux記憶體如何排查?

首先看記憶體:
$ free
total used free shared buffers cached
Mem: 4040360 4012200 28160 0 176628 3571348
-/+ buffers/cache: 264224 3776136
Swap: 0 0 0

注意要看紅色的部分,上面的哪個free 28160不是真正的free,有如下說明:
In this example the total amount of available memory is 4040360 KB. 264224 KB are used by processes and 3776136 KB are free for other applications. Do not get confused by the first line which shows that 28160KB are free! If you look at the usage figures you can see that most of the memory use is for buffers and cache. Linux always tries to use RAM to speed up disk operations by using available memory for buffers (file system metadata) and cache (pages with actual contents of files or block devices). This helps the system to run faster because disk information is already in memory which saves I/O operations. If space is needed by programs or applications like Oracle, then Linux will free up the buffers and cache to yield memory for the applications. If your system runs for a while you will usually see a small number under the field "free" on the first line.


--from redhat

發現伺服器沒有設定Swap導致OOM killer頻繁發生。

那又如何檢視swap設定呢?

檢查是否啟用swap:
cat /proc/swaps
grep Swap /proc/meminfo
swapon -s
free -m
vmstat

Swap到底該設定多大呢?

https://access.redhat.com/solutions/15244

redhat 6,7一般推薦和記憶體一致(4~8G),具體參考上面連結。

啟用swap:

swap:可以用邏輯卷或者檔案方式。下面是採用檔案方式。

[root@safedemo bin]# dd if=/dev/zero of=/swapfile bs=1G count=4


4+0 records in
4+0 records out
4294967296 bytes (4.3 GB) copied, 37.4051 s, 115 MB/s
[root@safedemo bin]# chmod 600 /swapfile
[root@safedemo bin]# mkswap /swapfile
mkswap: /swapfile: warning: don't erase bootbits sectors
on whole disk. Use -f to force.
Setting up swapspace version 1, size = 4194300 KiB
no label, UUID=96e8b638-b36c-4660-8667-5654a92dc520
[root@safedemo bin]# swapon /swapfile
[root@safedemo bin]# vi /etc/fstab
/swapfile swap swap defaults 0 0

做了一個例子來重現OOM killer

import java.util.Scanner;

public class OOMTest {

    private static Scanner scanner = new Scanner(System.in);

    public static void main(String[] args) {
        java.util.List<int[]> l = new java.util.ArrayList();
        
        try {
            for (int i = 0; i < 1000; i++) {
                System.out.println("Please press any text to allocate ~100M memory:");
                String input = scanner.nextLine();
                System.out.println("new memory(~100M)");
                l.add(new int[26107200]);
            }
        } catch (Throwable t) {
            t.printStackTrace();
        }
    }

}

執行:
[root@safedemo bin]# java -Xmx2g OOMTest
Picked up JAVA_TOOL_OPTIONS: -Dhttps.protocols=TLSv1.2
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Please press any text to allocate ~100M memory:

new memory(~100M)
Killed <-它自己觸發系統oom killer,結果把自己殺死了。


//check /var/log/messages.
Sep 22 16:08:21 safeserver kernel: java invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
Sep 22 16:08:21 safeserver kernel: java cpuset=/ mems_allowed=0
Sep 22 16:08:21 safeserver kernel: Pid: 14859, comm: java Not tainted 2.6.32-754.30.2.el6.x86_64 #1
//14859就是引發oom killer的程序(上面的OOMTest)
....
Sep 22 16:08:21 safeserver kernel: Out of memory: Kill process 14857 (java) score 142 or sacrifice child
Sep 22 16:08:21 safeserver kernel: Killed process 14857, UID 0, (java) total-vm:3191104kB, anon-rss:676096kB, file-rss:68kB

OOM能不能禁用?
//Disable OOM killer in redhat
Red Hat Enteprise Linux 5, 6 and 7 do not have the ability to completely disable OOM-KILLER. Please see the following section for tuning OOM-KILLER operation within RHEL 5, RHEL 6 and RHEL 7.

答案是不完全能夠禁用。


可以通過調整某個程序的score來避免oom killer
There is also a special value of -17, which disables oom_killer for that process. In the example below, oom_score returns a value of O,indicating that this process would not be killed.
Raw

# cat /proc/12465/oom_score
78
# echo -17 > /proc/12465/oom_adj
# cat /proc/12465/oom_score
0


也可以通過調整overcommit_memory來調整

,如果設定為2,記憶體不夠時會報錯,達到間接控制oom killer的目的(官方文件提到某些情況下也會trigger oom killer)
The /etc/sysctl.conf file consists
vm.overcommit_memory = 2
vm.overcommit_ratio = 100






over