基於Promethues與Grafana的Greenplum分散式資料庫監控的實現

阿新 • • 發佈：2020-12-12

一、前言

Greenplum是面向資料倉庫應用的分散式關係型MPP資料庫，基於PostgreSQL開發，跟PostgreSQL的相容性非常好，大部分PostgreSQL客戶端工具及PostgreSQL應用都能執行在Greenplum平臺上。GPCC是Greenplum資料庫官方商業版的資料庫監控軟體，對於只能用得起開源的使用者來說，只能考慮其他的監控方案了。本文裡介紹一種基於Promethues與Grafana的Greenplum分散式資料庫監控的實現方案。

二、Promethues與Grafana簡介

2.1、Prometheus簡介

Prometheus是由SoundCloud開發的開源監控報警系統和時序列資料庫(TSDB)，使用Go語言開發。Prometheus目前在開源社群相當活躍。Prometheus效能也足夠支撐上萬臺規模的叢集。其架構圖如下：

Prometheus Server，負責從 Exporter 拉取和儲存監控資料，並提供一套靈活的查詢語言（PromQL）供使用者使用。
Exporter，負責收集目標物件（host, container…）的效能資料，並通過 HTTP 介面供 Prometheus Server 獲取。
視覺化元件，監控資料的視覺化展現對於監控方案至關重要。以前 Prometheus 自己開發了一套工具，不過後來廢棄了，因為開源社群出現了更為優秀的產品 Grafana。Grafana 能夠與 Prometheus 無縫整合，提供完美的資料展示能力。
Alertmanager，使用者可以定義基於監控資料的告警規則，規則會觸發告警。一旦 Alermanager 收到告警，會通過預定義的方式發出告警通知。支援的方式包括 Email、PagerDuty、Webhook 等.

2.2、Grafana簡介

Grafana是一個跨平臺的開源的度量分析和視覺化工具，可以通過將採集的資料查詢然後視覺化的展示，並及時通知。它主要有以下六大特點：

1、展示方式：快速靈活的客戶端圖表，面板外掛有許多不同方式的視覺化指標和日誌，官方庫中具有豐富的儀表盤外掛，比如熱圖、折線圖、圖表等多種展示方式；
2、資料來源：Graphite，InfluxDB，OpenTSDB，Prometheus，Elasticsearch，CloudWatch和KairosDB等；
3、通知提醒：以可視方式定義最重要指標的警報規則，Grafana將不斷計算併發送通知，在資料達到閾值時通過Slack、PagerDuty等獲得通知；

4、混合展示：在同一圖表中混合使用不同的資料來源，可以基於每個查詢指定資料來源，甚至自定義資料來源；
5、註釋：使用來自不同資料來源的豐富事件註釋圖表，將滑鼠懸停在事件上會顯示完整的事件元資料和標記；
6、過濾器：Ad-hoc過濾器允許動態建立新的鍵/值過濾器，這些過濾器會自動應用於使用該資料來源的所有查詢

三、Greenplum監控的實現

Greenplum的監控可類似於PostgreSQL來實現，但又存在差異，不同點在於：

要實現一個Greenplum的Exporter指標採集器；
使用Grafana繪製一個視覺化狀態圖；
基於Prometheus配置報警規則（本文此部分略）；

3.1、Greenplum的Exporter指標採集器

這裡類比PostgreSQL資料庫的Exporter實現方法，實現了一個Greenplum的Exporter，專案地址為：

在greenplum_expoter裡主要擴充套件了實現了客戶連線資訊、賬號連線資訊、Segment儲存資訊、叢集節點同步狀態、資料庫鎖監控等相關指標，具體指標如下：

No.	指標名稱	型別	標籤組	指標描述	資料來源獲取方法
1	greenplum_cluster_state	Gauge	version; master(master主機名)； standby(standby主機名)	gp 可達狀態 ?：1→ 可用;0→ 不可用	SELECT count(*) from gp_dist_random('gp_id'); select version(); SELECT hostname from p_segment_configuration where content=-1 and role='p';
2	greenplum_cluster_uptime	Gauge	-	啟動持續的時間	select extract(epoch from now() - pg_postmaster_start_time());
3	greenplum_cluster_sync	Gauge	-	Master同步Standby狀態? 1→ 正常;0→ 異常	SELECT count(*) from pg_stat_replication where state='streaming'
4	greenplum_cluster_max_connections	Gauge	-	最大連線個數	show max_connections; show superuser_reserved_connections;
5	greenplum_cluster_total_connections	Gauge	-	當前連線個數	select count() total, count() filter(where current_query='') idle, count() filter(where current_query<>'') active, count() filter(where current_query<>'' and not waiting) running, count(*) filter(where current_query<>'' and waiting) waiting from pg_stat_activity where procpid <> pg_backend_pid();
6	greenplum_cluster_idle_connections	Gauge	-	idle連線數	同上
7	greenplum_cluster_active_connections	Gauge	-	active query	同上
8	greenplum_cluster_running_connections	Gauge	-	query executing	同上
9	greenplum_cluster_waiting_connections	Gauge	-	query waiting execute	同上
10	greenplum_node_segment_status	Gauge	hostname; address; dbid; content; preferred_role; port; replication_port	segment的狀態status: 1(U)→ up; 0(D)→ down	select * from gp_segment_configuration;
11	greenplum_node_segment_role	Gauge	hostname; address; dbid; content; preferred_role; port; replication_port	segment的role角色: 1(P)→ primary; 2(M)→ mirror	同上
12	greenplum_node_segment_mode	Gauge	hostname; address; dbid; content; preferred_role; port; replication_port	segment的mode：1(S)→ Synced; 2(R)→ Resyncing; 3(C)→ Change Tracking; 4(N)→ Not Syncing	同上
13	greenplum_node_segment_disk_free_mb_size	Gauge	hostname	segment主機磁碟空間剩餘大小（MB)	SELECT dfhostname as segment_hostname,sum(dfspace)/count(dfspace)/(1024*1024) as segment_disk_free_gb from gp_toolkit.gp_disk_free GROUP BY dfhostname
14	greenplum_cluster_total_connections_per_client	Gauge	client	每個客戶端的total連線數	select usename, count() total, count() filter(where current_query='') idle, count(*) filter(where current_query<>'') active from pg_stat_activity group by 1;
15	greenplum_cluster_idle_connections_per_client	Gauge	client	每個客戶端的idle連線數	同上
16	greenplum_cluster_active_connections_per_client	Gauge	client	每個客戶端的active連線數	同上
17	greenplum_cluster_total_online_user_count	Gauge	-	線上賬號數	同上
18	greenplum_cluster_total_client_count	Gauge	-	當前所有連線的客戶端個數	同上
19	greenplum_cluster_total_connections_per_user	Gauge	usename	每個賬號的total連線數	select client_addr, count() total, count() filter(where current_query='') idle, count(*) filter(where current_query<>'') active from pg_stat_activity group by 1;
20	greenplum_cluster_idle_connections_per_user	Gauge	usename	每個賬號的idle連線數	同上
21	greenplum_cluster_active_connections_per_user	Gauge	usename	每個賬號的active連線數	同上
22	greenplum_cluster_config_last_load_time_seconds	Gauge	-	系統配置載入時間	SELECT pg_conf_load_time()
23	greenplum_node_database_name_mb_size	Gauge	dbname	每個資料庫佔用的儲存空間大小	SELECT dfhostname as segment_hostname,sum(dfspace)/count(dfspace)/(1024*1024) as segment_disk_free_gb from gp_toolkit.gp_disk_free GROUP BY dfhostname
24	greenplum_node_database_table_total_count	Gauge	dbname	每個資料庫內表的總數量	SELECT count(*) as total from information_schema.tables where table_schema not in ('gp_toolkit','information_schema','pg_catalog');
25	greenplum_exporter_total_scraped	Counter	-	-	-
26	greenplum_exporter_total_error	Counter	-	-	-
27	greenplum_exporter_scrape_duration_second	Gauge	-	-	-
28	greenplum_server_users_name_list	Gauge	-	使用者總數	SELECT usename from pg_catalog.pg_user;
29	greenplum_server_users_total_count	Gauge	-	使用者明細	同上
30	greenplum_server_locks_table_detail	Gauge	pid;datname;usename; locktype;mode; application_name;state; lock_satus;query	鎖資訊	SELECT * from pg_locks
31	greenplum_server_database_hit_cache_percent_rate	Gauge	-	快取命中率	select sum(blks_hit)/(sum(blks_read)+sum(blks_hit))*100 from pg_stat_database;
32	greenplum_server_database_transition_commit_percent_rate	Gauge	-	事務提交

3.2、使用Grafana繪製一個視覺化狀態圖

根據以上監測指標，即可使用Grafana配置影象了，具體內容請見：

文章來源轉自：https://blog.csdn.net/inrgihc/article/details/108686638

基於Promethues與Grafana的Greenplum分散式資料庫監控的實現

一、前言

二、Promethues與Grafana簡介

2.1、Prometheus簡介

2.2、Grafana簡介

三、Greenplum監控的實現

3.1、Greenplum的Exporter指標採集器

3.2、使用Grafana繪製一個視覺化狀態圖

基於Promethues與Grafana的Greenplum分散式資料庫監控的實現

基於 springboot websocket 的分散式群聊實現

基於服務網格的分散式 ESB，實現應用無關的傳統 ESB 轉型升級

分散式事務一：基於資料庫原生分散式事務方案實現

Mycat分散式資料庫架構解決方案--Mycat實現資料庫分表

Mycat分散式資料庫架構解決方案--Mycat實現讀寫分離

基於swoole與php協程實現非同步非阻塞IO

基於Python的SQL Server資料庫實現物件同步輕量級

基於Redis分散式鎖的實現程式碼

基於golang的簡單分散式延時佇列服務的實現

Python3與fastdfs分散式檔案系統如何實現互動

基於keepalived配置資料庫主從實現高可用

《Redis設計與實現》筆記3—多機資料庫的實現

一個輕量級的基於RateLimiter的分散式限流實現

一個輕量級的基於RateLimiter的分散式限流實現（轉載）

SkyWalking —— 分散式應用監控與鏈路追蹤

基於vue與element實現建立試卷相關功能(例項程式碼)

需求分析和概念原型---AFL模糊測試與分散式資料庫

Mycat監控_監控平臺Mycat-web配置指標_作為配置中心註冊發現用---MyCat分散式資料庫叢集架構工作筆記0038

06-MySQL基於MyCat的分散式資料庫的搭建

基於Promethues與Grafana的Greenplum分散式資料庫監控的實現

一、前言

二、Promethues與Grafana簡介

2.1、Prometheus簡介

2.2、Grafana簡介

三、Greenplum監控的實現

3.1、Greenplum的Exporter指標採集器

3.2、使用Grafana繪製一個視覺化狀態圖

相關推薦