1. 程式人生 > >redis cluster 的ERR max number of clients reached 問題排查

redis cluster 的ERR max number of clients reached 問題排查

ets sof 修改 使用 utils 問題排查 could rec rect

早上發現微服務連不上redis cluster了,看來下日誌如下

[root@win-jrh378d7scu 7005]# bin/redis-cli -c -h 15.31.213.183 -p 7005
15.31.213.183:7005> cluster info
ERR max number of clients reached
15.31.213.183:7005>


2019-03-26 22:00:30.011 http-nio-9090-exec-4 ERROR org.apache.juli.logging.DirectJDKLog.log(DirectJDKLog.java:182) - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool] with root cause

java.util.NoSuchElementException: Unable to validate object
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:494) ~[commons-pool2-2.4.3.jar!/:2.4.3]
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361) ~[commons-pool2-2.4.3.jar!/:2.4.3]
at redis.clients.util.Pool.getResource(Pool.java:49) ~[jedis-2.9.0.jar!/:?]
at redis.clients.jedis.JedisPool.getResource(JedisPool.java:226) ~[jedis-2.9.0.jar!/:?]
at redis.clients.jedis.JedisSlotBasedConnectionHandler.getConnectionFromSlot(JedisSlotBasedConnectionHandler.java:66) ~[jedis-2.9.0.jar!/:?]
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:116) ~[jedis-2.9.0.jar!/:?]
at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:31) ~[jedis-2.9.0.jar!/:?]
at redis.clients.jedis.JedisCluster.get(JedisCluster.java:124) ~[jedis-2.9.0.jar!/:?]
at com.hp.nova.utils.RedisClusterUtil.mget(RedisClusterUtil.java:152) ~[classes!/:0.0.1-SNAPSHOT]
at com.hp.nova.service.impl.SectionServiceImpl.getsectionlistBymutipulCode(SectionServiceImpl.java:286) ~[classes!/:0.0.1-SNAPSHOT]
at com.hp.nova.service.impl.SectionServiceImpl$$FastClassBySpringCGLIB$$73b5d4bc.invoke(<generated>) ~[classes!/:0.0.1-SNAPSHOT]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) ~[spring-core-4.3.20.RELEASE.jar!/:4.3.20.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:667) ~[spring-aop-4.3.20.RELEASE.jar!/:4.3.20.RELEASE]
at com.hp.nova.service.impl.SectionServiceImpl$$EnhancerBySpringCGLIB$$c52c4ab.getsectionlistBymutipulCode(<generated>) ~[classes!/:0.0.1-SNAPSHOT]
at com.hp.nova.service.impl.PlanServiceImpl.getChildListDetailByPlan(PlanServiceImpl.java:1051) ~[classes!/:0.0.1-SNAPSHOT]
at com.hp.nova.controller.PlanController.getMultiPlan(PlanController.java:107) ~[classes!/:0.0.1-SNAPSHOT]
at sun.reflect.GeneratedMethodAccessor335.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_131]


首先查看redis.conf中的maxclients大小為默認值,默認為10000。

通過lsof -p 17242 |wc -l 查看redis的連接數,發現連接數量超過10300. 所以出錯。

依次重啟6個節點的redis進程,再用lsof -p pid |wc -l 命令查看redis進程發現連接數變為60

但是最開始沒有重啟java微服務的進程,所以java裏面還會報錯,重啟java微服務進程後就好了

2019-03-26 17:13:42.126 http-nio-9090-exec-1 ERROR org.apache.juli.logging.DirectJDKLog.log(DirectJDKLog.java:182) - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is redis.clients.jedis.exceptions.JedisException: Could not get a resource from the pool] with root cause java.util.NoSuchElementException: Pool exhausted
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:452) ~[commons-pool2-2.4.3.jar!/:2.4.3]
at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361) ~[commons-pool2-2.4.3.jar!/:2.4.3]

但是用了半天系統後lsof -p pid |wc -l發現有幾個redis進程的連接數又到了5000多,我看了一下我們的yml配置如下,暫時不知道是因為每個微服務 maxTotal: 5000 #最大連接數 的原因導致真的redis連接數不夠還是因為java或者.net core的程序問題導致沒有釋放redis cluster的連接數。先把每個redis的maxclients 設為50000觀察幾天再說

redis:
nodes: 15.31.213.3:7001,15.31.213.3:7002,15.31.213.239:7003,15.31.213.239:7004,15.31.213.183:7005,15.31.213.183:7006
commandTimeout: 10000 #redis操作的超時時間
maxTotal: 5000 #最大連接數
maxIdle: 30 #最大空閑連接數
minIdle: 5 #最小空閑連接數
maxWait: 3000 #獲取連接最大等待時間 ms #default -1
pwd:

首先查看redis.conf中的maxclients大小為默認值,默認為10000。

通過lsof -p pid |wc -l ,發現連接數量超過10500. 出錯。

解決方法1:

1. 增加redis的最大連接數:修改redis.conf文件的maxclient ,修改到50000.

2. 一般redis的連接使用完畢之後會釋放,如果要用lsof命令發現鏈接始終沒有減少,則檢查代碼,看下使用redis的代碼部分是否執行類似close()的函數。將資源進行釋放。

通過上述兩個方法基本能解決這個問題。

redis cluster 的ERR max number of clients reached 問題排查