1. 程式人生 > >IOCP使用時常見的幾個錯誤

IOCP使用時常見的幾個錯誤

在使用IOCP時,最重要的幾個API就是GetQueueCompeltionStatus、WSARecv、WSASend,資料的I/O及其完成狀態通過這幾個介面獲取並進行後續處理。

GetQueueCompeltionStatus attempts to dequeue an I/O completion packet from the specified I/O completion port. If there is no completion packet queued, the function waits for a pending I/O operation associated with the completion port to complete.

						BOOL WINAPI GetQueuedCompletionStatus(
  __in   HANDLE CompletionPort,
  __out  LPDWORD lpNumberOfBytes,
  __out  PULONG_PTR lpCompletionKey,
  __out  LPOVERLAPPED *lpOverlapped,
  __in   DWORD dwMilliseconds
);
				

If the function dequeues a completion packet for a successful I/O operation from the completion port, the return value is nonzero. The function stores information in the variables pointed to by the lpNumberOfBytes

lpCompletionKey, and lpOverlapped parameters.

除了關心這個API的in & out(這是MSDN開頭的幾行就可以告訴我們的)之外,我們更加關心不同的return & out意味著什麼,因為由於各種已知或未知的原因,我們的程式並不總是有正確的return & out。

If *lpOverlapped is NULL and the function does not dequeue a completion packet from the completion port, the return value is zero. The function does not store information in the variables pointed to by the lpNumberOfBytes

 and lpCompletionKey parameters. To get extended error information, call GetLastError. If the function did not dequeue a completion packet because the wait timed out, GetLastError returns WAIT_TIMEOUT.

假設我們指定dwMilliseconds為INFINITE。

這裡常見的幾個錯誤有:

WSA_OPERATION_ABORTED (995): Overlapped operation aborted.

由於執行緒退出或應用程式請求,已放棄I/O 操作。

MSDN: An overlapped operation was canceled due to the closure of the socket, or the execution of the SIO_FLUSH command in WSAIoctl. Note that this error is returned by the operating system, so the error number may change in future releases of Windows.

成因分析:這個錯誤一般是由於peer socket被closesocket或者WSACleanup關閉後,針對這些socket的pending overlapped I/O operation被中止。

解決方案:針對socket,一般應該先呼叫shutdown禁止I/O操作後再呼叫closesocket關閉。

嚴重程度輕微易處理

WSAENOTSOCK (10038): Socket operation on nonsocket.

MSDN: An operation was attempted on something that is not a socket. Either the socket handle parameter did not reference a valid socket, or for select, a member of an fd_set was not valid.

成因分析:在一個非套接字上嘗試了一個操作。

使用closesocket關閉socket之後,針對該invalid socket的任何操作都會獲得該錯誤。

解決方案:如果是多執行緒存在對同一socket的操作,要保證對socket的I/O操作邏輯上的順序,做好socket的graceful disconnect。

嚴重程度輕微易處理

WSAECONNRESET (10054): Connection reset by peer.

遠端主機強迫關閉了一個現有的連線。

MSDN: An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, the host or remote network interface is disabled, or the remote host uses a hard close (see setsockopt for more information on the SO_LINGER option on the remote socket). This error may also result if a connection was broken due to keep-alive activity detecting a failure while one or more operations are in progress. Operations that were in progress fail with WSAENETRESET. Subsequent operations fail with WSAECONNRESET.

成因分析:在使用WSAAccpet、WSARecv、WSASend等介面時,如果peer application突然中止(原因如上所述),往其對應的socket上投遞的operations將會失敗。

解決方案:如果是對方主機或程式意外中止,那就只有各安天命了。但如果這程式是你寫的,而你只是hard close,那就由不得別人了。至少,你要知道這樣的錯誤已經出現了,就不要再費勁的繼續投遞或等待了。

嚴重程度輕微易處理

WSAECONNREFUSED (10061): Connection refused.

由於目標機器積極拒絕,無法連線。

MSDN: No connection could be made because the target computer actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host—that is, one with no server application running.

成因分析:在使用connect或WSAConnect時,伺服器沒有執行或者伺服器的監聽佇列已滿;在使用WSAAccept時,客戶端的連線請求被condition function拒絕。

解決方案:Call connect or WSAConnect again for the same socket. 等待伺服器開啟、監聽空閒或檢視被拒絕的原因。是不是長的醜或者錢沒給夠,要不就是伺服器拒絕接受天價薪酬自主創業去了?

嚴重程度輕微易處理

WSAENOBUFS (10055): No buffer space available.

由於系統緩衝區空間不足或列隊已滿,不能執行套接字上的操作。

MSDN: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

成因分析:這個錯誤是我檢視錯誤日誌後,最在意的一個錯誤。因為伺服器對於訊息收發有明確限制,如果緩衝區不足應該早就處理了,不可能待到send/recv失敗啊。而且這個錯誤在之前的版本中幾乎沒有出現過。這也是這篇文章的主要內容。像connect和accept因為緩衝區空間不足都可以理解,而且危險不高,但如果send/recv造成擁堵並惡性迴圈下去,麻煩就大了,至少說明之前的驗證邏輯有疏漏。

WSASend失敗的原因是:The Windows Sockets provider reports a buffer deadlock. 這裡提到的是buffer deadlock,顯然是由於多執行緒I/O投遞不當引起的。

解決方案:在訊息收發前,對最大掛起的訊息總的數量和容量進行檢驗和控制。

嚴重程度嚴重

本文主要參考MSDN

************* 說明 *************

Fox只是對自己關心的幾個錯誤和API參照MSDN進行分析,不提供額外幫助。