伺服器效能測試實戰總結
Table of Contents
第二個問題:Tomcat JDBC Connection not enough的問題
第三個問題: Postgresql too many clients 的問題
伺服器背景介紹
整個service架構如下
資料庫: postgresql
後端框架: spring boot + embbed tomcat + java
部署環境:某一個知名的雲平臺
效能測試方法
測試工具:jemeter
測試思路:使用jemeter測試,分別針對不同的需要測試的api建立test plan。有兩個變數
- 執行緒數
- 資料量
每一個test plan分別起10,20,50,100,200個執行緒同時執行.分別在資料庫資料量大致為一萬,五萬,十萬,二十萬,五十萬時候執行。一共會產生25組結果,表格如下
資料量(單位:萬) | 執行緒數(單位:個) | 併發執行時間(單位:分鐘) |
1 | 10 | 5 |
5 | 20 | 5 |
10 | 50 | 5 |
20 | 100 | 5 |
50 | 200 | 5 |
在jemeter當中,我們可以填寫啟動的執行緒數,執行的時間,測試資料等。介面如下:
下面是一個一個test plan的例項
<?xml version="1.0" encoding="UTF-8"?> <jmeterTestPlan version="1.2" properties="3.2" jmeter="3.3 r1808647"> <hashTree> <TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="Get by customerNumber with markets defaultAddress" enabled="true"> <stringProp name="TestPlan.comments"></stringProp> <boolProp name="TestPlan.functional_mode">false</boolProp> <boolProp name="TestPlan.serialize_threadgroups">true</boolProp> <elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true"> <collectionProp name="Arguments.arguments"/> </elementProp> <stringProp name="TestPlan.user_define_classpath"></stringProp> </TestPlan> <hashTree> <ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Get by customerNumber with markets defaultAddress" enabled="true"> <stringProp name="ThreadGroup.on_sample_error">continue</stringProp> <elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true"> <boolProp name="LoopController.continue_forever">false</boolProp> <intProp name="LoopController.loops">-1</intProp> </elementProp> <stringProp name="ThreadGroup.num_threads">100</stringProp> <stringProp name="ThreadGroup.ramp_time">1</stringProp> <longProp name="ThreadGroup.start_time">1515376034000</longProp> <longProp name="ThreadGroup.end_time">1515376034000</longProp> <boolProp name="ThreadGroup.scheduler">true</boolProp> <stringProp name="ThreadGroup.duration">300</stringProp> <stringProp name="ThreadGroup.delay"></stringProp> </ThreadGroup> <hashTree> <HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="HTTP Header Manager" enabled="true"> <collectionProp name="HeaderManager.headers"> <elementProp name="" elementType="Header"> <stringProp name="Header.name">hybris-tenant</stringProp> <stringProp name="Header.value">tenantIT</stringProp> </elementProp> <elementProp name="" elementType="Header"> <stringProp name="Header.name">hybris-user</stringProp> <stringProp name="Header.value">jmeter</stringProp> </elementProp> <elementProp name="" elementType="Header"> <stringProp name="Header.name">Content-Type</stringProp> <stringProp name="Header.value">application/json</stringProp> </elementProp> </collectionProp> </HeaderManager> <hashTree/> <HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Get by customerNumber with markets defaultAddress" enabled="true"> <elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true"> <collectionProp name="Arguments.arguments"/> </elementProp> <stringProp name="HTTPSampler.domain"></stringProp> <stringProp name="HTTPSampler.port"></stringProp> <stringProp name="HTTPSampler.protocol"></stringProp> <stringProp name="HTTPSampler.contentEncoding"></stringProp> <stringProp name="HTTPSampler.path">requesturl</stringProp> <stringProp name="HTTPSampler.method">GET</stringProp> <boolProp name="HTTPSampler.follow_redirects">false</boolProp> <boolProp name="HTTPSampler.auto_redirects">false</boolProp> <boolProp name="HTTPSampler.use_keepalive">false</boolProp> <boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp> <stringProp name="HTTPSampler.embedded_url_re"></stringProp> <stringProp name="HTTPSampler.connect_timeout"></stringProp> <stringProp name="HTTPSampler.response_timeout"></stringProp> </HTTPSamplerProxy> <hashTree> <ResultCollector guiclass="ObsoleteGui" testclass="ResultCollector" testname="Monitor Results" enabled="true"> <boolProp name="ResultCollector.error_logging">false</boolProp> <objProp> <name>saveConfig</name> <value class="SampleSaveConfiguration"> <time>true</time> <latency>true</latency> <timestamp>true</timestamp> <success>true</success> <label>true</label> <code>true</code> <message>true</message> <threadName>true</threadName> <dataType>true</dataType> <encoding>false</encoding> <assertions>true</assertions> <subresults>true</subresults> <responseData>false</responseData> <samplerData>false</samplerData> <xml>false</xml> <fieldNames>false</fieldNames> <responseHeaders>false</responseHeaders> <requestHeaders>false</requestHeaders> <responseDataOnError>false</responseDataOnError> <saveAssertionResultsFailureMessage>false</saveAssertionResultsFailureMessage> <assertionsResultsToSave>0</assertionsResultsToSave> <bytes>true</bytes> <threadCounts>true</threadCounts> </value> </objProp> <stringProp name="filename"></stringProp> </ResultCollector> <hashTree/> </hashTree> <ResultCollector guiclass="ViewResultsFullVisualizer" testclass="ResultCollector" testname="View Results Tree" enabled="true"> <boolProp name="ResultCollector.error_logging">true</boolProp> <objProp> <name>saveConfig</name> <value class="SampleSaveConfiguration"> <time>true</time> <latency>true</latency> <timestamp>true</timestamp> <success>true</success> <label>true</label> <code>true</code> <message>true</message> <threadName>true</threadName> <dataType>true</dataType> <encoding>false</encoding> <assertions>true</assertions> <subresults>true</subresults> <responseData>false</responseData> <samplerData>false</samplerData> <xml>false</xml> <fieldNames>false</fieldNames> <responseHeaders>false</responseHeaders> <requestHeaders>false</requestHeaders> <responseDataOnError>false</responseDataOnError> <saveAssertionResultsFailureMessage>false</saveAssertionResultsFailureMessage> <assertionsResultsToSave>0</assertionsResultsToSave> <bytes>true</bytes> <threadCounts>true</threadCounts> </value> </objProp> <stringProp name="filename"></stringProp> </ResultCollector> <hashTree/> <ResultCollector guiclass="StatVisualizer" testclass="ResultCollector" testname="Aggregate Report" enabled="true"> <boolProp name="ResultCollector.error_logging">false</boolProp> <objProp> <name>saveConfig</name> <value class="SampleSaveConfiguration"> <time>true</time> <latency>true</latency> <timestamp>true</timestamp> <success>true</success> <label>true</label> <code>true</code> <message>true</message> <threadName>true</threadName> <dataType>true</dataType> <encoding>false</encoding> <assertions>true</assertions> <subresults>true</subresults> <responseData>false</responseData> <samplerData>false</samplerData> <xml>false</xml> <fieldNames>false</fieldNames> <responseHeaders>false</responseHeaders> <requestHeaders>false</requestHeaders> <responseDataOnError>false</responseDataOnError> <saveAssertionResultsFailureMessage>false</saveAssertionResultsFailureMessage> <assertionsResultsToSave>0</assertionsResultsToSave> <bytes>true</bytes> <threadCounts>true</threadCounts> </value> </objProp> <stringProp name="filename"></stringProp> </ResultCollector> <hashTree/> </hashTree> <ResultCollector guiclass="ObsoleteGui" testclass="ResultCollector" testname="Monitor Results" enabled="true"> <boolProp name="ResultCollector.error_logging">false</boolProp> <objProp> <name>saveConfig</name> <value class="SampleSaveConfiguration"> <time>true</time> <latency>true</latency> <timestamp>true</timestamp> <success>true</success> <label>true</label> <code>true</code> <message>true</message> <threadName>true</threadName> <dataType>true</dataType> <encoding>false</encoding> <assertions>true</assertions> <subresults>true</subresults> <responseData>false</responseData> <samplerData>false</samplerData> <xml>false</xml> <fieldNames>false</fieldNames> <responseHeaders>false</responseHeaders> <requestHeaders>false</requestHeaders> <responseDataOnError>false</responseDataOnError> <saveAssertionResultsFailureMessage>false</saveAssertionResultsFailureMessage> <assertionsResultsToSave>0</assertionsResultsToSave> <bytes>true</bytes> <threadCounts>true</threadCounts> </value> </objProp> <stringProp name="filename"></stringProp> </ResultCollector> <hashTree/> </hashTree> <WorkBench guiclass="WorkBenchGui" testclass="WorkBench" testname="WorkBench" enabled="true"> <boolProp name="WorkBench.save">true</boolProp> </WorkBench> <hashTree/> </hashTree> </jmeterTestPlan>
測試結果會議excel文件的形式生成出來,下面是一個測試結果的例項:
測試結果和分析
原本的預想是想發現程式碼和實現邏輯潛在的問題,但事實證明還沒有到那個時候就已經先卡在了框架和環境的效能瓶頸或者問題上面。發現的主要問題有下面三個,其中前面兩個問題在部署的雲環境是發現,第三個問題在本地環境發現,以一張圖說明:
上圖示紅色的地方就是出現的問題。
第一個問題:service crash重啟的問題
錯誤資訊如下:
Failed to make HTTP request to '/admin/health' on port 8080
這個問題的原因和部署的雲環境密不可分,這個伺服器部署的環境,預設提供了一個檢測伺服器狀態的功能,即上圖的monitor process,它提供了一個功能稱之為:health check。所謂health check就是這個雲平臺,通過和你的伺服器建立一個連線(可配置為TCP,HTTP),來檢測這個伺服器的狀態是否是可用的,是不是down掉了。
實現的基礎在於spring boot提供的actuator在一個spring boot服務啟動的時候會預設開啟一個/admin/health結尾的api,用以檢測這個服務的狀態是否可用。雲平臺的health check功能就是去訪問這個介面,來判斷service是否正常,如下:
雲平臺的機制是若超出一定的時間沒有響應,則認為伺服器出現了問題,就會啟用crash重啟機制, 即強制停止這個service,然後重新啟動。
第二個問題:Tomcat JDBC Connection not enough的問題
錯誤資訊如下
[EL Info]: query: 2018-01-22 10:20:52.064--UnitOfWork(269647005)--Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.4.v20160829-44060b6): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.apache.tomcat.jdbc.pool.PoolExhaustedException: [http-nio-8080-exec-1505] Timeout: Pool empty. Unable to fetch a connection in 30 seconds, none available[size:4; busy:4; idle:0; lastwait:30000].
這是一個web容器tomcat產生的問題,spring boot內建了tomcat容器,同時也提供了預設的資料庫連線池配置,當部署在雲平臺上的時候,它的預設配置是最大的連線數和最小的連線數都是4個,我們可以從錯誤資訊看到: Unable to fetch a connection in 30 seconds, none available[size:4; busy:4; idle:0; lastwait:30000].
在spring的原始碼裡,我們也可以找到如下程式碼:
public class DataSourceConfigurer extends PooledServiceConnectorConfigurer<DataSource, DataSourceConfig> {
private MapServiceConnectionConfigurer<DataSource, MapServiceConnectorConfig> mapServiceConnectionConfigurer = new MapServiceConnectionConfigurer();
public DataSourceConfigurer() {
}
public DataSource configure(DataSource dataSource, DataSourceConfig config) {
if (config == null) {
config = new DataSourceConfig(new PoolConfig(4, 30000), (ConnectionConfig)null);
}
this.configureConnection(dataSource, config);
this.configureConnectionProperties(dataSource, config);
return (DataSource)super.configure(dataSource, config);
}
...
}
The code tells everything.
第三個問題: Postgresql too many clients 的問題
這個錯誤是在本地發現的,因為本地的web容器資料庫連線池限制不再是4。錯誤資訊如下
[EL Info]: query: 2018-01-22 16:30:41.721--UnitOfWork(222405369)--Communication failure detected when attempting to perform read query outside of a transaction. Attempting to retry query. Error was: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.4.v20160829-44060b6): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: org.postgresql.util.PSQLException: FATAL: sorry, too many clients already
這是一個postgresql資料庫返回的一個錯誤,錯誤資訊很明顯了:資料庫的連線不夠用了,Clients客戶端已經滿了。
postgresql預設的資料庫連線數量是100,在其官方文件,有如下說明:
而預設情況下,tomcat的請求連線數是200(更多資訊參考),因此當併發數小於請求連線數又大於資料庫連線數的時候,就會出現這個問題,對我們的啟示是,我們在配置伺服器的時候應該協調好各個部分的最大支援的併發量,在這裡應該設定:
連線池大小 * 伺服器instance數量 <= database連線限制