ES客戶端異常關閉原因
一、問題描述
新測試環境的icedoc有個2分鐘觸發一次的定時任務,任務裡有查詢ES邏輯,在啟動後不久(10分鐘以上)開始出現ES客戶端異常關閉的錯誤:
1.1、先出現客戶端連線異常關閉的錯誤
2021-07-02 10:34:00.600 ERROR c.s.a.s.i.SearchServiceImpl org.apache.http.ConnectionClosedException: Connection closed unexpectedly at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:813) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514) at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484) at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454) at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)
1.2、緊接著I/O dispatch
執行緒異常退出
2021-07-02 10:34:00.709 ERROR o.a.h.i.n.c.CloseableHttpAsyncClientBase$1 [run:66] [pool-1-thread-1] [] - I/O reactor terminated abnormally org.apache.http.nio.reactor.IOReactorException: I/O dispatch worker terminated abnormally at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:359) at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoSuchMethodError: org.apache.http.ConnectionClosedException: method <init>()V not found at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.endOfInput(HttpAsyncRequestExecutor.java:356) at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:261) at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
1.3、後續每2分鐘的任務執行都是I/O reactor
已停止的錯誤
2021-07-02 10:36:00.633 ERROR c.s.c.f.ServiceExceptionFiler java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248) at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514) at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484) at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454) at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)
調整定時觸發頻率為1小時,沒有再發現連線異常關閉的錯誤了。由於其他環境都是1小時的頻率,所以也沒有發現這種錯誤
二、復現問題
2.1、在開發環境按1分鐘執行一次,重複執行了1天時間,只出現一次連線中止,但並未影響後續請求
2021-07-22 00:25:44.662 ERROR c.s.a.s.i.SearchServiceImpl [search:235] [DubboServerHandler-172.16.20.14:21027-thread-200] []
java.io.IOException: 您的主機中的軟體中止了一個已建立的連線。
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:828)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)
2.2、通過-Xmx64m
調整JVM堆記憶體後出現GC檢查的異常,並復現了後續連線異常關閉的錯誤
2021-07-22 10:39:50.324 WARN o.a.z.ClientCnxn$SendThread [run:1162] [main-SendThread(172.16.2.55:2181)] []
- Session 0x104d4b207c30562 for server 172.16.2.55/172.16.2.55:2181, unexpected error, closing socket connection and attempting reconnect
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.zookeeper.ClientCnxn$SendThread.sendPing(ClientCnxn.java:973)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1118)
2021-07-22 10:39:50.526 ERROR c.s.c.f.ServiceExceptionFiler [afterThrowing:64] [DubboServerHandler-172.16.20.14:21027-thread-22] []
- SearchResponse com.worley.api.service.impl.SearchServiceImpl.search(SearchRequest)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:254)
at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1322)
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:827)
at org.elasticsearch.common.xcontent.json.JsonXContent.createParser(JsonXContent.java:91)
at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:142)
at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:114)
at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:90)
at org.elasticsearch.search.lookup.SourceLookup.sourceAsMapAndType(SourceLookup.java:86)
at org.elasticsearch.search.lookup.SourceLookup.sourceAsMap(SourceLookup.java:90)
at org.elasticsearch.search.SearchHit.getSourceAsMap(SearchHit.java:412)
at com.worley.api.service.impl.SearchServiceImpl.search(SearchServiceImpl.java:250)
2021-07-22 10:39:51.869 INFO o.a.z.ClientCnxn$SendThread [logStartConnect:1032] [main-SendThread(172.16.2.55:2181)] []
- Opening socket connection to server 172.16.2.55/172.16.2.55:2181. Will not attempt to authenticate using SASL (unknown error)
2021-07-22 10:39:51.991 INFO o.a.z.ClientCnxn$SendThread [primeConnection:876] [main-SendThread(172.16.2.55:2181)] []
- Socket connection established to 172.16.2.55/172.16.2.55:2181, initiating session
2021-07-22 10:39:52.115 INFO o.a.z.ClientCnxn$SendThread [onConnected:1299] [main-SendThread(172.16.2.55:2181)] []
- Session establishment complete on server 172.16.2.55/172.16.2.55:2181, sessionid = 0x104d4b207c30562, negotiated timeout = 30000
2021-07-22 10:41:47.954 ERROR c.s.a.s.i.SearchServiceImpl [search:235] [DubboServerHandler-172.16.20.14:21027-thread-23] []
org.apache.http.ConnectionClosedException: Connection closed unexpectedly
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:813)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)
at com.worley.api.service.impl.SearchServiceImpl.search(SearchServiceImpl.java:228)
2.3、在控制檯看到了幾處系統ERR
的資訊並沒有列印在日誌檔案裡
通過檢視新測試環境啟動命令,發現堆記憶體配置是
-Xms256m -Xmx512m
,在加上每2分鐘觸發一次任務,有可能是記憶體不足導致連線未正常退出。
但是一般這種錯誤會丟擲業務層並列印在日誌檔案裡,在新測試環境未發現GC
的日誌
2021-07-22 12:46:56.597 INFO
Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.HashMap$KeySet.iterator(HashMap.java:916)
at java.util.HashSet.iterator(HashSet.java:172)
at java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1039)
at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1038)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processTimeouts(DefaultConnectingIOReactor.java:204)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:158)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.lang.Thread.run(Thread.java:748)
# 這裡在控制檯的ERR列印的,所以日誌檔案裡找不到
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "I/O dispatcher 2" Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded
# 日誌檔案裡找到了關於GC的錯誤日誌,說明新測試的iecdoc不是GC問題引起的
2021-07-22 14:40:15.802 ERROR c.s.c.f.ServiceExceptionFiler [afterThrowing:64] [DubboServerHandler-172.16.20.14:21027-thread-39] []
- SearchResult com.worley.ecdoc.service.index.ecdoc.EcdocContentIndexSearchService.cmsPublishListSearch(EcdocContentSearchVO)
java.lang.OutOfMemoryError: GC overhead limit exceeded
2.4、使用與新測試一樣的頻率(2分鐘)重複呼叫,在40分鐘後出現一樣的錯誤,問題完整復現
由於nginx長連線超時配置為120s,客戶端1.0.3版本未設定長連線保持時間,全部依賴nginx的配置。
我懷疑是客戶端恰好在即將斷開長連線的時候拿到連線,但在請求時突然關閉連線導致這個錯誤。
這裡有個錯誤需要明確指出:
java.lang.NoSuchMethodError: org.apache.http.ConnectionClosedException: method <init>()V not found
這是由於在httpcore-nio-4.4.12
的org.apache.http.nio.protocol.HttpAsyncRequestExecutor.endOfInput
方法裡使用無參構造器例項化ConnectionClosedException
時報錯導致的,
因為httpcore-4.4.6
裡沒有這個無參構造器,高版本里才有
@Override
public void endOfInput(final NHttpClientConnection conn) throws IOException {
final State state = getState(conn);
final HttpContext context = conn.getContext();
synchronized (context) {
if (state != null) {
if (state.getRequestState().compareTo(MessageState.READY) != 0) {
state.invalidate();
}
final HttpAsyncClientExchangeHandler handler = getHandler(conn);
if (handler != null) {
if (state.isValid()) {
handler.inputTerminated();
} else {
handler.failed(new ConnectionClosedException()); // 這裡例項化物件失敗,從錯誤日誌裡可以得到確認
}
}
}
if (conn.getSocketTimeout() <= 0) {
conn.setSocketTimeout(1000);
}
conn.close();
}
}
最終出現I/O reactor terminated abnormally
錯誤,導致後續請求一直報Request cannot be executed; I/O reactor status: STOPPED
的錯誤:
public CloseableHttpAsyncClientBase(
final NHttpClientConnectionManager connmgr,
final ThreadFactory threadFactory,
final NHttpClientEventHandler handler) {
super();
this.connmgr = connmgr;
if (threadFactory != null && handler != null) {
this.reactorThread = threadFactory.newThread(new Runnable() {
@Override
public void run() {
try {
final IOEventDispatch ioEventDispatch = new InternalIODispatch(handler);
connmgr.execute(ioEventDispatch);
} catch (final Exception ex) {
log.error("I/O reactor terminated abnormally", ex);
} finally {
status.set(Status.STOPPED); // 異常後設置了狀態為:STOPPED
}
}
});
} else {
this.reactorThread = null;
}
this.status = new AtomicReference<Status>(Status.INACTIVE);
}
2.5、使用與新測試一樣的頻率(2分鐘)重複呼叫,調整httpcore-core-4.4.12
的依賴
在iecdoc裡調整版本
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.4.12</version>
</dependency>
使用debug啟動,並在endOfInput
裡設定斷點。等待20分鐘後端點被觸發,通過單步執行走到handler.failed(new ConnectionClosedException())
這行程式碼,這次沒有類構造失敗的錯誤了,
只是報了下列Connection is closed
的錯誤,後續請求正常執行,說明只要I/O dispatch
執行緒正常就可以繼續獲取連線發起請求:
2021-07-23 14:44:26.804 ERROR c.s.a.s.i.SearchServiceImpl [search:235] [DubboServerHandler-172.16.20.14:21027-thread-88] []
- search error:
org.apache.http.ConnectionClosedException: Connection is closed
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:813)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454)
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)
at com.worley.api.service.impl.SearchServiceImpl.search(SearchServiceImpl.java:228)
這個異常只是影響當前請求,可以使用sekorm-es-sdk
的重試機制解決。
三、總結
導致Request cannot be executed; I/O reactor status: STOPPED
的錯誤是由於I/O dispatch
執行緒異常退出導致
- 唯一原因是
ConnectionClosedException
的構造錯誤引起(httpcore-4.4.6
版本過低)
導致程式在endOfInput
方法裡判斷連線狀態是無效的原因,目前發現有兩種:
- 堆記憶體不足
GC overhead limit exceeded
導致連線關閉 - 由於客戶端長連線未設定,完全依賴服務端主動關閉長連線。在獲取請求後,此時服務端有可能主動(達到設定的過期時間)關閉長連線。
解決方式:
- 客戶端設定長連線時間小於服務端,有效避免客戶端發起請求前服務端突然斷開長連線