1. 程式人生 > >pg inconsistent

pg inconsistent

ceph pg inconsistent

ceph 狀態突然 error

[root@ceph-6-11 ~]# ceph health detail

HEALTH_ERR 1 pgs inconsistent; 1 scrub errors;

pg 2.37c is active+clean+inconsistent, acting [75,6,35]

1 scrub errors

報錯信息總結:

問題PG:2.37c

OSD編號:75,6,35

執行常規修復:

ceph pg repair 2.37c
這時會出現osd節點各別重啟 從新分配pg remap 稍等片刻後恢復ok

如果查看修復結果還不正常:

[root@ceph-6-11 ~]# ceph health detail

HEALTH_ERR 1 pgs inconsistent; 1 scrub errors

pg 2.37c is active+clean+inconsistent, acting [75,6,35]

1 scrub errors

問題依然存在,異常pg沒有修復;

然後執行:

要洗刷一個pg組,執行命令:

ceph pg scrub 2.37c

ceph pg deep-scrub 2.37c

ceph pg repair 2.37c

以上命令執行後均未修復,依然報上面的錯誤,查看相關osd 日誌報錯如下:

2017-07-24 17:31:10.585305 7f72893c4700 0 log_channel(cluster) log [INF] : 2.37c repair starts

2017-07-24 17:31:10.710517 7f72893c4700 -1 log_channel(cluster) log [ERR] : 2.37c repair 1 errors, 0 fixed

決定修復pg 設置的三塊osd ,執行命令如下:

ceph osd repair 75

ceph osd repair 6

ceph osd repair 35

最後決定用一個最粗暴的方法解決,關閉有問題pg 所使用的主osd 75

查詢pg 使用主osd信息

ceph pg 2.37c query |grep primary

            "blocked_by": [],

            "up_primary": 75,

            "acting_primary": 75

執行操作如下:

systemctl stop ceph-osd@75

此時ceph開始數據恢復,將osd75 上面的數據在其它節點恢復,等待一段時間,發現數據滾動完成,執行命令查看集群狀態。

[root@ceph-6-11 ~]# ceph health detail

HEALTH_ERR 1 pgs inconsistent; 1 scrub errors

pg 2.37c is active+clean+inconsistent, acting [8,38,17]

1 scrub errors

[root@ceph-6-11 ~]# ceph pg repair 2.37c

instructing pg 2.37c on osd.8 to repair

然後查看集群狀態:

[root@ceph-6-11 ~]# ceph health detail

HEALTH_OK

pg inconsistent