thrift java.io.IOException: Connection reset by peer引發的oom
阿新 • • 發佈:2019-01-14
server 退出前的異常log資訊:
[2015-06-20 10:45:56,713 WARN] Got an IOException in internalRead! java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:142) at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:539) at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338) at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203) at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.select(TThreadedSelectorServer.java:590) at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.run(TThreadedSelectorServer.java:545) [2015-06-20 10:46:08,110 ERROR] run() exiting due to uncaught error java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:371) at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203) at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.select(TThreadedSelectorServer.java:590) at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.run(TThreadedSelectorServer.java:545)
一個thrift rpc 請求接收過程如下:(參考中文註釋)
/** * Give this FrameBuffer a chance to read. The selector loop should have * received a read event for this FrameBuffer. * * @return true if the connection should live on, false if it should be * closed */ public boolean read() { if (state_ == FrameBufferState.READING_FRAME_SIZE) { // try to read the frame size completely // 讀取 frame 頭,即frame 的size,根據 thrift 協議定義,應該是請求體前4個位元組 if (!internalRead()) { return false; } // if the frame size has been read completely, then prepare to read the // actual frame. if (buffer_.remaining() == 0) { // pull out the frame size as an integer. int frameSize = buffer_.getInt(0); if (frameSize <= 0) { LOGGER.error("Read an invalid frame size of " + frameSize + ". Are you using TFramedTransport on the client side?"); return false; } // if this frame will always be too large for this server, log the // error and close the connection. // frame 長度異常判斷 if (frameSize > MAX_READ_BUFFER_BYTES) { LOGGER.error("Read a frame size of " + frameSize + ", which is bigger than the maximum allowable buffer size for ALL connections."); return false; } // if this frame will push us over the memory limit, then return. // with luck, more memory will free up the next time around. // frame 長度異常判斷 if (readBufferBytesAllocated.get() + frameSize > MAX_READ_BUFFER_BYTES) { return true; } // increment the amount of memory allocated to read buffers readBufferBytesAllocated.addAndGet(frameSize + 4); // reallocate the readbuffer as a frame-sized buffer // 為 frame 分配足夠大記憶體空間, 從 log 列印資訊來看 server 就是在這個呼叫點 oom 了 buffer_ = ByteBuffer.allocate(frameSize + 4); buffer_.putInt(frameSize); state_ = FrameBufferState.READING_FRAME; } else { // this skips the check of READING_FRAME state below, since we can't // possibly go on to that state if there's data left to be read at // this one. return true; } } // it is possible to fall through from the READING_FRAME_SIZE section // to READING_FRAME if there's already some frame data available once // READING_FRAME_SIZE is complete. if (state_ == FrameBufferState.READING_FRAME) { // 讀取 frame 體 if (!internalRead()) { return false; } // since we're already in the select loop here for sure, we can just // modify our selection key directly. if (buffer_.remaining() == 0) { // get rid of the read select interests selectionKey_.interestOps(0); state_ = FrameBufferState.READ_FRAME_COMPLETE; } return true; } // if we fall through to this point, then the state must be invalid. LOGGER.error("Read was called but state is invalid (" + state_ + ")"); return false; } /** * Perform a read into buffer. * * @return true if the read succeeded, false if there was an error or the * connection closed. */ // 讀 SocketChannel private boolean internalRead() { try { if (trans_.read(buffer_) < 0) { return false; } return true; } catch (IOException e) { LOGGER.warn("Got an IOException in internalRead!", e); return false; } }
程式碼分析到這裡問題就可以定位到是 frame size(MAX_READ_BUFFER_BYTES) 過大導致的異常退出。
public static abstract class AbstractNonblockingServerArgs<T extends AbstractNonblockingServerArgs<T>> extends AbstractServerArgs<T> { // 預設是 Long.MAX_VALUE,肯定 OOM public long maxReadBufferBytes = Long.MAX_VALUE; public AbstractNonblockingServerArgs(TNonblockingServerTransport transport) { super(transport); transportFactory(new TFramedTransport.Factory()); } } /** * The maximum amount of memory we will allocate to client IO buffers at a * time. Without this limit, the server will gladly allocate client buffers * right into an out of memory exception, rather than waiting. */ // 該註釋指出了問題所在 final long MAX_READ_BUFFER_BYTES; // 可以通過建構函式設定MAX_READ_BUFFER_BYTES public AbstractNonblockingServer(AbstractNonblockingServerArgs args) { super(args); MAX_READ_BUFFER_BYTES = args.maxReadBufferBytes; }
問題復現(驗證)
- echo “something” nc 到 thrift 埠
- telnet 到 thrift 埠,隨意敲幾個字元
thrift server log中列印幾行錯誤資訊後即退出。
原因:
異常的客戶端連線到 thrift server 埠,隨機的傳送資料導致 thrift協議解析到異常大的 frameSize。(內網攻擊)
解決辦法:
使用 TFramedTransport,設定初始化函式中maxLength,預設值DEFAULT_MAX_LENGTH = 16384000;
其他請務必設定AbstractNonblockingServerArgs.maxReadBufferBytes,預設值Long.MAX_VALUE;
例如:
服務端修改
客戶端
參考:
http://www.concurrent.work/java/thrift/case/one-java-thrift-case-caused-by-default-parameters/