Eureka Server prometheus監控服務健康狀態
阿新 • • 發佈:2019-05-31
背景
服務程序監控一般都有相關元件處理了,早期業務出現特定服務使用的DB資源超過額配量,導致健康檢測失敗,服務陸續從Eureka下線了,業務監控在沒路由到特定節點時候,或者路由到特定節點但沒有碰到閾值場景不會觸發告警,意味著業務短暫性正常,服務陸續下線;Eureka server 作為註冊中心可以較早感知到服務註冊狀態,例項節點掛了(註冊上的例項少了)、節點狀態非UP 場景
監控方案
- Eureka定時採集註冊資訊,例項節點數、例項節點狀態資訊
- prometheus 定時採集Eureka server 採集到的資料
- grafana 查詢及對資料告警
Eureka註冊資訊資料採集
metric 資料結構定義
-
統計節點狀態
type:Gauge
eureka_instance_status{client="{client}",status="{status}"}
client: eureka client application name
status 列舉
狀態 | 列舉值 |
---|---|
UP | 1 |
DOWN | 5 |
STARTING | 2 |
OUT_OF_SERVICE | 3 |
UNKNOW | 4 |
最近n時間內平均值大於1,表示異常,執行告警
-
統計節點數量
type:Gauge
eureka_instance_count{client="{client}",count="{count}"}
client: eureka client application name
count: client count
java pom 依賴
<!-- boot2.x 相容--> <!-- The client --> <dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient</artifactId> <version>0.6.0</version> </dependency> <!-- Hotspot JVM metrics--> <dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_hotspot</artifactId> <version>0.6.0</version> </dependency> <!-- Exposition HTTPServer--> <dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_httpserver</artifactId> <version>0.6.0</version> </dependency> <!-- Pushgateway exposition--> <dependency> <groupId>io.prometheus</groupId> <artifactId>simpleclient_pushgateway</artifactId> <version>0.6.0</version> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> <version>1.1.4</version> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-core</artifactId> <version>1.1.4</version> </dependency>
java程式碼
@Component
public class InstanceStateCollector {
@Autowired
PeerAwareInstanceRegistry registry;
private static final Logger log = LoggerFactory.getLogger(InstanceStateCollector.class);
@Scheduled(cron = "*/5 * * * * ?")
public void collect() {
Applications applications = registry.getApplications();
applications.getRegisteredApplications().forEach((registeredApplication) -> {
Integer count = registeredApplication.size();
String client = registeredApplication.getName();
log.debug("client :{}, count :{}", client, count);
PrometheusMetricsUtils.metricInstanceCount(client, count);
registeredApplication.getInstances().forEach((instance) -> {
String instanceId = instance.getInstanceId();
log.debug("client :{}, instance :{}, status :{}", client, instanceId, instance.getStatus());
PrometheusMetricsUtils.metricInstanceStatus(client, instanceId, instance.getStatus());
});
});
}
}
@Service
public class PrometheusMetricsService {
/**
* 例項狀態統計
* eureka_instance_status{client="{client}",status="{status}"}
*/
private static final String EUREKA_INSTANCE_STATUS = "mall_eureka_instance_status";
/**
* 例項數量統計
* eureka_instance_count{client="{client}",count="{count}"}
*/
private static final String EUREKA_INSTANCE_COUNT = "mall_eureka_instance_count";
private static final String LABEL_CLIENT = "client";
private final Gauge instanceStatusGauge;
private final Gauge instanceCountGauge;
public PrometheusMetricsService(CollectorRegistry registry) {
instanceStatusGauge = Gauge
.build(EUREKA_INSTANCE_STATUS, "instance status")
.labelNames(LABEL_CLIENT)
.register(registry);
instanceCountGauge = Gauge
.build(EUREKA_INSTANCE_COUNT, "instance count")
.labelNames(LABEL_CLIENT)
.register(registry);
}
/**
* 例項狀態埋點
*
* @param client client name || application name
* @param statusValue status
*/
void metricInstanceStatus(String client, Integer statusValue) {
instanceStatusGauge.labels(client).set(statusValue);
}
/**
* 例項數量埋點
*
* @param client client name || application name
* @param count count
*/
void metricInstanceCount(String client, Integer count) {
instanceCountGauge.labels(client).set(count);
}
}
Prometheus採集Eureka server資料
prometheus.yml
- job_name: 'mgmall-eureka'
scrape_interval: 10s
metrics_path: '/actuator/prometheus'
static_configs:
- targets: ['10.124.129.42:19110']
Grafana報表維護
報表
mall_eureka_instance_count{client="MGMALL-CONFIG"}
.....
![image-20190531140258528](/Users/yugj/Library/Application Support/typora-user-images/image-20190531140258528.png)
監控
avg() query(A,10s,now) is below 1
![image-20190531140319350](/Users/yugj/Library/Application Support/typora-user-images/image-2