1. 程式人生 > >su: cannot set user id: Resource temporarily unavailable

su: cannot set user id: Resource temporarily unavailable

su - xxxx
  su: cannot set user id: Resource temporarily unavailable 
查資料,認為一般是該使用者的file /proc / stack等不足
於是 su - 
檢查使用者的程序和開啟檔案數 ps -U XXXX lsof |grep XXXX|wc -l 得到的數字都極低
檢查使用者的資源限制 cat /etc/security/limit.conf 看到該使用者的hard nproc && hard nofile 都應該是足夠的,百思不得其解
到處百度及谷歌,看到一個建議是提高
soft nproc,於是修改limit.conf中的soft nproc ,問題解決
下面是其他人解決方法與思路,估計引起錯誤的原因是Oracle使用者開啟linux檔案過多:
大家好!   今天和大家分享一個case,現象是這樣的,centos6 當我用su - oracle時發生錯誤提示:su cannot set user id Resource temporarily unavailable ,但是我用plsql developer 登入是沒有問題的,問題最終定位是因為該機器安裝了grid control 12c agent導致系統資源不足,具體分析過程請看:  

1.1.  
現象描述

suOracle使用者報資源不足

[[email protected] bin]# su - oracle

su: cannot set user id: Resource temporarily unavailable

 

1.2.  分析問題

檢查如下引數,均未發現問題

1.2.1 檢查資源限制檔案

[[email protected] ~]# cat /etc/security/limits.conf

 

oracle soft nproc 2047

oracle hard nproc 16384

oracle soft nofile 1024

oracle hard nofile 65536

oracle soft memlock 33554432

oracle hard memlock 33554432

1.2.2 檢查profile檔案

if [ $USER = "oracle" ]; then

  if [ $SHELL = "/bin/ksh" ]; then

        ulimit -p 16384

        ulimit -n 65536

  else

        ulimit -u 16384 -n 65536

  fi

fi

1.2.3 檢查ulimit限制

[[email protected] ~]# ulimit -a

core file size          (blocks, -c) 0

data seg size           (kbytes, -d) unlimited

scheduling priority             (-e) 0

file size               (blocks, -f) unlimited

pending signals                 (-i) 62749

max locked memory       (kbytes, -l) 64

max memory size         (kbytes, -m) unlimited

open files                      (-n) 1024

pipe size            (512 bytes, -p) 8

POSIX message queues     (bytes, -q) 819200

real-time priority              (-r) 0

stack size              (kbytes, -s) 10240

cpu time               (seconds, -t) unlimited

max user processes              (-u) 1024

virtual memory          (kbytes, -v) unlimited

file locks                      (-x) unlimited

1.3  google看是否有類似問題

執行 su - oracle

報錯:su: cannot set user id: Resource temporarily unavailable

 

執行以下命令檢視

ps -U oracle | wc -l

 

lsof | grep oracle | wc -l

多數情況是超過限制

 

解決辦法

1。刪掉無用的程序

2。修改 /etc/security/limits.conf

修改oracle的相關設定

 

1.4 根據上述命令,查詢自己伺服器

[[email protected] ~]# ps -ef | grep oracle | wc -l

89

[[email protected] ~]# lsof | grep oracle | wc -l

49998

開啟檔案居然有這麼多,太恐怖了

1.4.1 檢視開啟檔案的具體資訊

非常大的檔案

[[email protected] ~]# lsof | grep oracle > oracle.txt

[[email protected] ~]# more oracle.txt

oracle      927    oracle  mem       REG               0,16   16777216   28080570 /dev/shm/ora_lottery_393218_52

1.4.2 為什麼Oracle會開啟這麼多檔案?

Oracle是通過session連線到資料庫,難道是session數超了?用plsql dev 登入Oracle,成功了

select count(*) from v$session

查詢當前session69

分析結果和session無關

1.4.3 茫然中,靈感在呼喚我,查詢網路連線

[[email protected] log]# netstat -anp | wc -l

12178

檢視具體內容

[[email protected] log]# netstat -anp

tcp        1      0 192.168.3.21:3938           192.168.3.21:34272          CLOSE_WAIT  18490/emagent

發現大量類似的連線,趕快檢視emagent的個數

[[email protected] log]# netstat -anp|grep emagent| wc -l

11608

12178-11608 如此接近,這時肯定有問題的,在等待關閉,處於沒人管狀態

 

1.5 解決問題

因為這臺機器安裝了grid control 12c agent,而oms服務已經停止,根據這些推斷,大致可以確定問題所在

1.5.1 先停止agent

root沒有許可權停止agent

[[email protected] bin]# ./emctl status agent

Cannot execute /home/oracle/agent12c//core/12.1.0.1.0/bin/emctl.pl since its userid does not match yours.

1.5.2 強力殺死agent

[[email protected] bin]# ps -ef|grep java

.........

oracle    5541  2512  0 Oct24 ?        00:23:38 /home/oracle/agent12c//core/12.1.0.1.0/jdk/bin/java -Xmx128M -server -Djava.security.egd=file:///dev/./urandom -Dsun.lang.ClassLoader.allowArraySyntax=true -XX:+UseLinuxPosixThreadCPUClocks -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+UseCompressedOops -Dwatchdog.pid=2512 -cp /home/oracle/agent12c//core/12.1.0.1.0/jdbc/lib/ojdbc5.jar:/home/oracle/agent12c//core/12.1.0.1.0/ucp/lib/ucp.jar:/home/oracle/agent12c//core/12.1.0.1.0/modules/oracle.http_client_11.1.1.jar:/home/oracle/agent12c//core/12.1.0.1.0/lib/xmlparserv2.jar:/home/oracle/agent12c//core/12.1.0.1.0/lib/jsch.jar:/home/oracle/agent12c//core/12.1.0.1.0/lib/optic.jar:/home/oracle/agent12c//core/12.1.0.1.0/modules/oracle.dms_11.1.1/dms.jar:/home/oracle/agent12c//core/12.1.0.1.0/modules/oracle.odl_11.1.1/ojdl.jar:/home/oracle/agent12c//core/12.1.0.1.0/modules/oracle.odl_11.1.1/ojdl2.jar:/home/oracle/agent12c//core/12.1.0.1.0/sysman/jlib/log4j-core.jar:/home/oracle/agent12c//core/12.1.0.1.0/jlib/gcagent_core.jar:/home/oracle/agent12c//core/12.1.0.1.0/sysman/jlib/emagentSDK-intg.jar:/home/oracle/agent12c//core/12.1.0.1.0/sysman/jlib/emagentSDK.jar oracle.sysman.gcagent.tmmain.TMMain

..........

[[email protected] bin]# kill -9 5541

[[email protected] bin]# kill -9 5541

-bash: kill: (5541) - No such process

1.5.3 測試su是否成功

[[email protected] bin]# su - oracle

su: cannot set user id: Resource temporarily unavailable

1.5.4 問題沒有解決,這是為什麼呢?

大家還記得前面我們抓取的lsof 執行結果嗎?檢視那裡到底是開啟的哪些檔案

more Oracle.txt

大量的mem資訊

     

     

     

     

除了這些還有

emagent   18490    oracle 5713u     IPv4           78422180        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:26944 (CLOSE_WAIT)

emagent   18490    oracle 5714u     IPv4           78424155        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:27136 (CLOSE_WAIT)

emagent   18490    oracle 5715u     IPv4           78425077        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:27272 (CLOSE_WAIT)

emagent   18490    oracle 5716u     IPv4           78427040        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:27464 (CLOSE_WAIT)

emagent   18490    oracle 5717u     IPv4           78427952        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:27600 (CLOSE_WAIT)

emagent   18490    oracle 5718u     IPv4           78429917        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:27792 (CLOSE_WAIT)

emagent   18490    oracle 5719u     IPv4           78430804        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:27930 (CLOSE_WAIT)

emagent   18490    oracle 5720u     IPv4           78432780        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:28122 (CLOSE_WAIT)

emagent   18490    oracle 5721u     IPv4           78433663        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:28258 (CLOSE_WAIT)

emagent   18490    oracle 5722u     IPv4           78435657        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:28446 (CLOSE_WAIT)

emagent   18490    oracle 5723u     IPv4           78436596        0t0        TCP snaqi-test3:dbcontrol_agent->snaqi-test3:28577 (CLOSE_WAIT)

發現大量的emagent,和我們netstat的結果一樣

果斷kill

[[email protected] log]# ps -ef|grep 18490

root      9214  7039  0 00:52 pts/2    00:00:00 grep 18490

oracle   18490  8679  0 Sep19 ?        00:04:12 /home/oracle/product/11.2.0.3/db_1/bin/emagent

[[email protected] log]# kill -9 18490

[[email protected] log]# ps -ef|grep 18490

root      9216  7039  0 00:52 pts/2    00:00:00 grep 18490

oracle   18490  8679  0 Sep19 ?        00:11:08 /home/oracle/product/11.2.0.3/db_1/bin/emagent

[[email protected] log]# ps -ef|grep 18490

root      9224  7039  0 00:52 pts/2    00:00:00 grep 18490

oracle   18490  8679  0 Sep19 ?        00:11:09 [emagent] <defunct>

[[email protected] log]# ps -A -o stat,ppid,pid,cmd | grep -e '^[Zz]'

[[email protected] log]# ps -A -o stat,ppid,pid,cmd | grep -e '^[Zz]'

[[email protected] log]# ps -ef|grep 18490

root      9259  7039  0 00:53 pts/2    00:00:00 grep 18490

1.5.5 檢視系統開啟的檔案數

[[email protected] ~]# lsof | grep oracle | wc -l

31451

 

1.5.6 檢視網路連線數

[[email protected] log]# netstat -anp | wc -l

532

 

1.5.7 嘗試su Oracle使用者

這次恢復正常

[[email protected] bin]# su - oracle

[[email protected] ~]$

[[email protected] ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Sat Nov 10 01:54:48 2012

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Connected to:

Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

With the Partitioning and Data Mining options

SQL>