MysqlIO.readFully問題排查分析過程
問題描述
支付清結算系統"外部對賬"部分任務在早上6:00~7:00時間段內對賬速度緩慢,且經常造成對賬統計結果不準確問題,導致結轉記賬不準確。
初步分析及處理過程
由於外部對賬是經過效能優化的,且在壓測環境表現良好,所以最開始初步認為是由於資料庫效能瓶頸導致,初步處理方式為將線上6:00~7:00之前的慢SQL進行了一番排查,並對涉及外部系統的慢SQL進行了優化,這個步驟理論上是可以解決此類問題,但RDS(阿里雲Mysql可能存在慢SQL統計不準確問題)。
特點時間段內執行緒棧分析
特定時間段執行緒棧分析,由於SQL優化部分並沒有解決此類問題,為了檢視在早上6:00~7:00這個時間段內程式JVM內部到底發生了什麼,所以採取了定時抓取執行緒棧日誌的手段(具體方式為在Linux伺服器新增cron任務,並累計輸出日誌的方式,指令碼如?)。
#clearjstack.sh
#!/bin/sh
DATE='/bin/date'
timestamp() {
$DATE +'%Y-%m-%d %H:%M:%S'
}
ts=`timestamp`
echo "start log $ts" >> /tmp/clearjstack.log
/usr/java/jdk1.7.0_67/bin/jstack -l `/usr/java/jdk1.7.0_67/bin/jps -lvm | grep tomcat-clear | awk '{print $1}'` >> /tmp/clearjstack.log
ts=`timestamp`
echo "end log $ts " >> /tmp/clearjstack.log
複製程式碼
Linux下新增定時任務的方式:
//新增計劃任務
[[email protected]/10.172.233.8 tmp]cd /var/spool/cron/
[[email protected]/10.172.233.8 cron]pwd
/var/spool/cron/具體使用者名稱(如為root使用者)
[[email protected]/10.172.233.8 cron]vim root
*/5 * * * * /bin/sh /tmp/clearjstack.sh > /dev/null 2>&1
複製程式碼
//檢視計劃任務
[[email protected]/10.172.233.8 cron]crontab -l
0 0 * * * /usr/local/script/Cutlog.sh >/dev/null 2>&1
*/5 * * * * /bin/sh /tmp/clearjstack.sh > /dev/null 2>&1
複製程式碼
對執行緒棧日誌分析(1)
#clearjstack.log
start log 2016-11-24 06:20:01
2016-11-24 06:20:02
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode):
"Thread-70" prio=10 tid=0x00007f21f4022800 nid=0x12da runnable [0x00007f216cace000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
- locked <0x0000000786facbc8> (a com.mysql.jdbc.util.ReadAheadInputStream)
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3014)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3467)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3456)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3997)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2468)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2629)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2719)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1379)
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.execute(DruidPooledPreparedStatement.java:493)
at sun.reflect.GeneratedMethodAccessor169.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:55)
at com.sun.proxy.$Proxy84.execute(Unknown Source)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:56)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:70)
at org.apache.ibatis.executor.ReuseExecutor.doQuery(ReuseExecutor.java:54)
at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:259)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:132)
at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:115)
at sun.reflect.GeneratedMethodAccessor165.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:46)
at com.github.miemiedev.mybatis.paginator.OffsetLimitInterceptor.intercept(OffsetLimitInterceptor.java:66)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:57)
at com.sun.proxy.$Proxy82.query(Unknown Source)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:104)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:98)
at org.apache.ibatis.session.defaults.DefaultSqlSession.selectOne(DefaultSqlSession.java:62)
at sun.reflect.GeneratedMethodAccessor203.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:358)
at com.sun.proxy.$Proxy29.selectOne(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.selectOne(SqlSessionTemplate.java:163)
at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:63)
at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:43)
at com.sun.proxy.$Proxy87.countOrgSettleFlow(Unknown Source)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl.isExistOrgSettleFlow(OrgSettleFlowServiceImpl.java:45)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$FastClassBySpringCGLIB$$c921c9ca.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:700)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:96)
at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:260)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:633)
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$EnhancerBySpringCGLIB$$6b2e2fc1.isExistOrgSettleFlow(<generated>)
at com.ninefbank.smallpay.clear.util.FeeCountUtil.createOrgSettleFlow(FeeCountUtil.java:235)
- locked <0x000000078c795ed8> (a java.lang.Class for com.ninefbank.smallpay.clear.util.FeeCountUtil)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveTaskErrorInfo(OuterReconWriter.java:126)
- locked <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:70)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x000000078deebbf0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"Thread-69" prio=10 tid=0x00007f21f401c000 nid=0x12d9 waiting for monitor entry [0x00007f216ccd1000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
- waiting to lock <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:68)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x000000078defb298> (a java.util.concurrent.ThreadPoolExecutor$Worker)
"Thread-68" prio=10 tid=0x00007f21f401b000 nid=0x12d8 waiting for monitor entry [0x00007f216c7cc000]
java.lang.Thread.State: BLOCKED (on object monitor)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
- waiting to lock <0x0000000788061c30> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:68)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:35)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x000000078deec880> (a java.util.concurrent.ThreadPoolExecutor$Worker)
複製程式碼
從上述日誌分析執行緒Thread-68、Thread-69都為阻塞狀態,且阻塞狀態
smallpay.clear.writer.OuterReconWriter.saveReconData(OuterReconWriter.java:169)
複製程式碼
所對應的程式碼主要體現在對賬結果寫入部分,另外從日誌
- locked <0x0000000786f9a650> (a com.mysql.jdbc.JDBC4Connection)
複製程式碼
所以此時很容易覺得問題出現在連線池與MySQL資料庫本身的問題上。按照這個思路(JVM日誌很豐富,去偽存真,往往需要嘗試與不斷試錯),開始了對連線池本身及MySQL本身問題的排查(Google),以下為幾種Google結果連結及原因概述。
1)、網路原因&MySQL設定&換連線池
在以上鍊接所述的原因中,重點是說了MySQL及網路方面的原因。針對這種敘述,與運維大哥溝通後調取了阿里雲RDS MySQL例項的引數設定,如下:
show global variables like '%timeout%'
複製程式碼
最後一個引數wait_timeout表明Mysql伺服器與客戶端連線空閒自動斷開的超時時間為24小時,也就是說只有連線在與MySQL伺服器連線後24小時內無任何操作,MySQL伺服器才會主動斷開與客戶端(這裡為應用連線池中與MySQL資料庫建立的連線),然而對於應用來說,這種可能性基本不存在,況且大多數連線池配置都有對連線是否可用以及自動探測連線的配置。
檢視應用連線池配置:
<property name="timeBetweenEvictionRunsMillis" value="3000" />
<property name="minEvictableIdleTimeMillis" value="300000" />
<property name="validationQuery" value="SELECT 'x'" />
<property name="testWhileIdle" value="true" />
<property name="testOnBorrow" value="true" />
<property name="testOnReturn" value="true" />
<property name="poolPreparedStatements" value="false" />
<property name="maxPoolPreparedStatementPerConnectionSize" value="-1" />
複製程式碼
按照上述配置基本上不會出現連線長時間與MySQL伺服器無任何互動的可能性,除非連線池本身的處理存在問題(druid連線池為阿里高效能資料庫連線池),但是出於懷疑的態度,按照連結中的說法druid在處理長時間連線等待方面可能不如c3p0好,也不排除druid本身存在問題。但出於謹慎角度,並沒有立刻採用對連線池進行替換的措施,而是將焦點轉移到應用伺服器與MySQL伺服器網路連線上(也確實有人是因為網路防火牆設定導致出現此類問題的)。
2)、網路連線引數調整
由於阿里雲底層網路結構並不對使用者透明,且遇到問題的應用存在批量大資料插入的使用場景,確實是存在TCP緩衝區不夠的可能性,通過對MySQL官方檔案相關章節的閱讀,MySQL連線串引數中確實提供了tcpRcvBuf這樣的引數,其對該引數的說明是:“connecting using TCP/IP,should the driver set SO_RCV_BUF to the given value? The default value of '0',means use the platform default value for this property)”。 於是覺得這種可能性會相對大些,所以線上上調整了連線引數而並沒有更換連線池(如若不行,可再嘗試替換連線池),引數調整如下:
jdbc:mysql://xxx.xxx.xx.xx:3306/xx?useUnicode=true&characterEncoding=UTF-8&tcpRcvBuf=1024000&autoReconnect=true&failOverReadOnly=false&connectTimeout=0
複製程式碼
連線TCP緩衝區的大小被調整為1M。
對執行緒棧日誌分析(2)
在4中通過對網路連線引數的調整後,並沒有起到作用,於是對執行緒棧日誌再次進行了調取,並進行了地毯式閱讀,終於發現了問題的癥結:
at com.ninefbank.smallpay.clear.service.impl.OrgSettleFlowServiceImpl$$EnhancerBySpringCGLIB$$5775a83.isExistOrgSettleFlow(<generated>)
at com.ninefbank.smallpay.clear.util.FeeCountUtil.createOrgSettleFlow(FeeCountUtil.java:235)
- locked <0x000000078b4d8e98> (a java.lang.Class for com.ninefbank.smallpay.clear.util.FeeCountUtil)
at com.ninefbank.smallpay.clear.writer.OuterReconWriter.saveTaskErrorInfo(OuterReconWriter.java:126)
- locked <0x00000007880399f0> (a com.ninefbank.smallpay.clear.writer.OuterReconWriter)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterParallTask.java:70)
at com.ninefbank.smallpay.clear.tasklet.task.OuterReconWriterParallTask.call(OuterReconWriterPara
複製程式碼
通過對比多日問題時段的執行緒棧日誌,發現都存在兩處程式碼都指向同一個方法,即isExistOrgSettleFlow,於是找到程式碼所對應的方法,發現是由於該方法高頻訪問了一個未加相關索引的大資料表,導致方法本身的執行存在阻塞及鎖住的情況,佔用了大量連線資源,從而導致4中所看到的其他方法拿不到資料庫連接出現長時間等待,甚至超時的情況。加上索引後,隔日觀察問題消失。
總結
-
對陌生問題的分析,需要通過各類分析手段從而找到問題所在並進行解決(這其中需要運用各類分析工具Linux命令、JDK命令、網路命令等等);
-
要心存思辨之心,對傳統的權威的東西在敬畏的同時也要大膽的懷疑並進行分析、嘗試,切不可畏懼不前;
-
在編碼方面,對程式碼類、方法的命名一定要合理、規範,不然日後出現疑難問題在排查時,會給人造成錯覺從而延長解決時間;