1. 程式人生 > 其它 >ES客戶端異常關閉原因

ES客戶端異常關閉原因

一、問題描述

新測試環境的icedoc有個2分鐘觸發一次的定時任務,任務裡有查詢ES邏輯,在啟動後不久(10分鐘以上)開始出現ES客戶端異常關閉的錯誤:

1.1、先出現客戶端連線異常關閉的錯誤

2021-07-02 10:34:00.600 ERROR c.s.a.s.i.SearchServiceImpl
org.apache.http.ConnectionClosedException: Connection closed unexpectedly
	at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:813)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454)
	at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)

1.2、緊接著I/O dispatch執行緒異常退出

2021-07-02 10:34:00.709 ERROR o.a.h.i.n.c.CloseableHttpAsyncClientBase$1 [run:66] [pool-1-thread-1] []
				- I/O reactor terminated abnormally
org.apache.http.nio.reactor.IOReactorException: I/O dispatch worker terminated abnormally
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:359)
	at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
	at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.http.ConnectionClosedException: method <init>()V not found
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.endOfInput(HttpAsyncRequestExecutor.java:356)
	at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:261)
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
	at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)

1.3、後續每2分鐘的任務執行都是I/O reactor已停止的錯誤

2021-07-02 10:36:00.633 ERROR c.s.c.f.ServiceExceptionFiler
java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
	at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454)
	at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)

調整定時觸發頻率為1小時,沒有再發現連線異常關閉的錯誤了。由於其他環境都是1小時的頻率,所以也沒有發現這種錯誤

二、復現問題

2.1、在開發環境按1分鐘執行一次,重複執行了1天時間,只出現一次連線中止,但並未影響後續請求

2021-07-22 00:25:44.662 ERROR c.s.a.s.i.SearchServiceImpl [search:235] [DubboServerHandler-172.16.20.14:21027-thread-200] []
java.io.IOException: 您的主機中的軟體中止了一個已建立的連線。
	at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:828)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454)
	at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)

2.2、通過-Xmx64m調整JVM堆記憶體後出現GC檢查的異常,並復現了後續連線異常關閉的錯誤

2021-07-22 10:39:50.324 WARN  o.a.z.ClientCnxn$SendThread [run:1162] [main-SendThread(172.16.2.55:2181)] []
				- Session 0x104d4b207c30562 for server 172.16.2.55/172.16.2.55:2181, unexpected error, closing socket connection and attempting reconnect
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.apache.zookeeper.ClientCnxn$SendThread.sendPing(ClientCnxn.java:973)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1118)
2021-07-22 10:39:50.526 ERROR c.s.c.f.ServiceExceptionFiler [afterThrowing:64] [DubboServerHandler-172.16.20.14:21027-thread-22] []
				- SearchResponse com.worley.api.service.impl.SearchServiceImpl.search(SearchRequest)
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:254)
	at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1322)
	at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:827)
	at org.elasticsearch.common.xcontent.json.JsonXContent.createParser(JsonXContent.java:91)
	at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:142)
	at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:114)
	at org.elasticsearch.common.xcontent.XContentHelper.convertToMap(XContentHelper.java:90)
	at org.elasticsearch.search.lookup.SourceLookup.sourceAsMapAndType(SourceLookup.java:86)
	at org.elasticsearch.search.lookup.SourceLookup.sourceAsMap(SourceLookup.java:90)
	at org.elasticsearch.search.SearchHit.getSourceAsMap(SearchHit.java:412)
	at com.worley.api.service.impl.SearchServiceImpl.search(SearchServiceImpl.java:250)
2021-07-22 10:39:51.869 INFO  o.a.z.ClientCnxn$SendThread [logStartConnect:1032] [main-SendThread(172.16.2.55:2181)] []
				- Opening socket connection to server 172.16.2.55/172.16.2.55:2181. Will not attempt to authenticate using SASL (unknown error)
2021-07-22 10:39:51.991 INFO  o.a.z.ClientCnxn$SendThread [primeConnection:876] [main-SendThread(172.16.2.55:2181)] []
				- Socket connection established to 172.16.2.55/172.16.2.55:2181, initiating session
2021-07-22 10:39:52.115 INFO  o.a.z.ClientCnxn$SendThread [onConnected:1299] [main-SendThread(172.16.2.55:2181)] []
				- Session establishment complete on server 172.16.2.55/172.16.2.55:2181, sessionid = 0x104d4b207c30562, negotiated timeout = 30000

2021-07-22 10:41:47.954 ERROR c.s.a.s.i.SearchServiceImpl [search:235] [DubboServerHandler-172.16.20.14:21027-thread-23] []
org.apache.http.ConnectionClosedException: Connection closed unexpectedly
	at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:813)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454)
	at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)
	at com.worley.api.service.impl.SearchServiceImpl.search(SearchServiceImpl.java:228)

2.3、在控制檯看到了幾處系統ERR的資訊並沒有列印在日誌檔案裡

通過檢視新測試環境啟動命令,發現堆記憶體配置是-Xms256m -Xmx512m,在加上每2分鐘觸發一次任務,有可能是記憶體不足導致連線未正常退出。
但是一般這種錯誤會丟擲業務層並列印在日誌檔案裡,在新測試環境未發現GC的日誌

2021-07-22 12:46:56.597 INFO
Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.HashMap$KeySet.iterator(HashMap.java:916)
	at java.util.HashSet.iterator(HashSet.java:172)
	at java.util.Collections$UnmodifiableCollection$1.<init>(Collections.java:1039)
	at java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1038)
	at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processTimeouts(DefaultConnectingIOReactor.java:204)
	at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:158)
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)
	at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
	at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
	at java.lang.Thread.run(Thread.java:748)

# 這裡在控制檯的ERR列印的,所以日誌檔案裡找不到
Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "I/O dispatcher 2" Exception in thread "RMI TCP Connection(idle)" java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded

# 日誌檔案裡找到了關於GC的錯誤日誌,說明新測試的iecdoc不是GC問題引起的
2021-07-22 14:40:15.802 ERROR c.s.c.f.ServiceExceptionFiler [afterThrowing:64] [DubboServerHandler-172.16.20.14:21027-thread-39] []
				- SearchResult com.worley.ecdoc.service.index.ecdoc.EcdocContentIndexSearchService.cmsPublishListSearch(EcdocContentSearchVO)
java.lang.OutOfMemoryError: GC overhead limit exceeded

2.4、使用與新測試一樣的頻率(2分鐘)重複呼叫,在40分鐘後出現一樣的錯誤,問題完整復現

由於nginx長連線超時配置為120s,客戶端1.0.3版本未設定長連線保持時間,全部依賴nginx的配置。
我懷疑是客戶端恰好在即將斷開長連線的時候拿到連線,但在請求時突然關閉連線導致這個錯誤。

這裡有個錯誤需要明確指出:

java.lang.NoSuchMethodError: org.apache.http.ConnectionClosedException: method <init>()V not found

這是由於在httpcore-nio-4.4.12org.apache.http.nio.protocol.HttpAsyncRequestExecutor.endOfInput方法裡使用無參構造器例項化ConnectionClosedException時報錯導致的,
因為httpcore-4.4.6裡沒有這個無參構造器,高版本里才有

    @Override
    public void endOfInput(final NHttpClientConnection conn) throws IOException {
        final State state = getState(conn);
        final HttpContext context = conn.getContext();
        synchronized (context) {
            if (state != null) {
                if (state.getRequestState().compareTo(MessageState.READY) != 0) {
                    state.invalidate();
                }
                final HttpAsyncClientExchangeHandler handler = getHandler(conn);
                if (handler != null) {
                    if (state.isValid()) {
                        handler.inputTerminated();
                    } else {
                        handler.failed(new ConnectionClosedException()); // 這裡例項化物件失敗,從錯誤日誌裡可以得到確認
                    }
                }
            }
            if (conn.getSocketTimeout() <= 0) {
                conn.setSocketTimeout(1000);
            }
            conn.close();
        }
    }

最終出現I/O reactor terminated abnormally錯誤,導致後續請求一直報Request cannot be executed; I/O reactor status: STOPPED的錯誤:

    public CloseableHttpAsyncClientBase(
            final NHttpClientConnectionManager connmgr,
            final ThreadFactory threadFactory,
            final NHttpClientEventHandler handler) {
        super();
        this.connmgr = connmgr;
        if (threadFactory != null && handler != null) {
            this.reactorThread = threadFactory.newThread(new Runnable() {
                @Override
                public void run() {
                    try {
                        final IOEventDispatch ioEventDispatch = new InternalIODispatch(handler);
                        connmgr.execute(ioEventDispatch);
                    } catch (final Exception ex) {
                        log.error("I/O reactor terminated abnormally", ex);
                    } finally {
                        status.set(Status.STOPPED); // 異常後設置了狀態為:STOPPED
                    }
                }
            });
        } else {
            this.reactorThread = null;
        }
        this.status = new AtomicReference<Status>(Status.INACTIVE);
    }

2.5、使用與新測試一樣的頻率(2分鐘)重複呼叫,調整httpcore-core-4.4.12的依賴

在iecdoc裡調整版本

<dependency>
	<groupId>org.apache.httpcomponents</groupId>
	<artifactId>httpcore</artifactId>
	<version>4.4.12</version>
</dependency>

使用debug啟動,並在endOfInput裡設定斷點。等待20分鐘後端點被觸發,通過單步執行走到handler.failed(new ConnectionClosedException())這行程式碼,這次沒有類構造失敗的錯誤了,
只是報了下列Connection is closed的錯誤,後續請求正常執行,說明只要I/O dispatch執行緒正常就可以繼續獲取連線發起請求:

2021-07-23 14:44:26.804 ERROR c.s.a.s.i.SearchServiceImpl [search:235] [DubboServerHandler-172.16.20.14:21027-thread-88] []
				- search error:
org.apache.http.ConnectionClosedException: Connection is closed
	at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:813)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454)
	at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)
	at com.worley.api.service.impl.SearchServiceImpl.search(SearchServiceImpl.java:228)

這個異常只是影響當前請求,可以使用sekorm-es-sdk的重試機制解決。

三、總結

導致Request cannot be executed; I/O reactor status: STOPPED的錯誤是由於I/O dispatch執行緒異常退出導致

  • 唯一原因是ConnectionClosedException的構造錯誤引起(httpcore-4.4.6版本過低)

導致程式在endOfInput方法裡判斷連線狀態是無效的原因,目前發現有兩種:

  • 堆記憶體不足GC overhead limit exceeded導致連線關閉
  • 由於客戶端長連線未設定,完全依賴服務端主動關閉長連線。在獲取請求後,此時服務端有可能主動(達到設定的過期時間)關閉長連線。

解決方式:

  • 客戶端設定長連線時間小於服務端,有效避免客戶端發起請求前服務端突然斷開長連線