ceph集群報錯：HEALTH_ERR 1 pgs inconsistent; 1 scrub errors

阿新 • • 發佈：2017-07-25

ceph 報錯 inconsistent; errors scrub

報錯信息如下：

[[email protected] ~]# ceph health detail

HEALTH_ERR 1 pgs inconsistent; 1 scrub errors;

pg 2.37c is active+clean+inconsistent, acting [75,6,35]

1 scrub errors

報錯信息總結：

問題PG：2.37c

OSD編號：75,6,35

執行常規修復：

ceph pg repair 2.37c

查看修復結果：

[[email protected] ~]# ceph health detail

HEALTH_ERR 1 pgs inconsistent; 1 scrub errors

pg 2.37c is active+clean+inconsistent, acting [75,6,35]

1 scrub errors

問題依然存在，異常pg沒有修復；

然後執行:

要洗刷一個pg組，執行命令：

ceph pg scrub 2.37c

ceph pg deep-scrub 2.37c

ceph pg repair 2.37c

以上命令執行後均未修復，依然報上面的錯誤，查看相關osd 日誌報錯如下：

2017-07-24 17:31:10.585305 7f72893c4700 0 log_channel(cluster) log [INF] : 2.37c repair starts

2017-07-24 17:31:10.710517 7f72893c4700 -1 log_channel(cluster) log [ERR] : 2.37c repair 1 errors, 0 fixed

此時已經被坑了好久了，決定修復pg 設置的三塊osd ，執行命令如下：

ceph osd repair 75

ceph osd repair 6

ceph osd repair 35

修復命令執行後等待一段時間，osd 修復完成，發現錯誤依然存在！！！！！！！！！此時想做下面兩個操作，

1：找到pg object信息，把主osd 上面的數據刪掉，讓後讓集群修復；

2：修改pg現在使用的主osd信息，現在是osd 75 ，改成別的磁盤（沒找到方法修改）；

此時看到ceph社區的一個bug 信息：

http://tracker.ceph.com/issues/12577

發現有些嘗試有人已經做過了，而且又是一個bug！！！！！！！！！！

最後決定用一個最粗暴的方法解決，關閉有問題pg 所使用的主osd 75

查詢pg 使用主osd信息

ceph pg 2.37c query |grep primary

"blocked_by": [],

"up_primary": 75,

"acting_primary": 75

執行操作如下：

systemctl stop [email protected]

此時ceph開始數據恢復，將osd75 上面的數據在其它節點恢復，等待一段時間，發現數據滾動完成，執行命令查看集群狀態。

[[email protected] ~]# ceph health detail

HEALTH_ERR 1 pgs inconsistent; 1 scrub errors

pg 2.37c is active+clean+inconsistent, acting [8,38,17]

1 scrub errors

看到上面的信息，心都要碎了！為啥還是這樣？不報希望的執行以下常規修復！

[[email protected] ~]# ceph pg repair 2.37c

‘instructing pg 2.37c on osd.8 to repair

然後查看集群狀態：

[[email protected] ~]# ceph health detail

HEALTH_OK

藥藥徹克鬧!好了。。。。。。。。啥也不說了，下班！

本文出自 “康建華” 博客，謝絕轉載！

ceph集群報錯：HEALTH_ERR 1 pgs inconsistent; 1 scrub errors

ceph 報錯 inconsistent; errors scrub 報錯信息如下：[[email protected]/* */ ~]# ceph health detailHEALTH_ERR 1 pgs inconsistent; 1 scrub errors; pg 2

maven項目中使用redis集群報錯： java.lang.NumberFormatException: For input string: "7006@17006"

body group fail ted XML enc beans mat art Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [redis.c

ceph 集群報 mds cluster is degraded 故障排查

ceph 故障排查 mds degraded ceph 集群報 mds cluster is degraded 故障排查ceph 集群版本：ceph -vceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185)ceph -w

十、cent OS開啟APR模式報錯：configure: error: Found APR 1.3.9. You need version 1.4.3 or newer installed

config 安裝目錄 prefix org col spa -a you 新版錯誤內容顯示APR的版本過低，需要新版本到http://apr.apache.org/download.cgi#apr1這個地址下載所需要的包apr-1.4.5.tar.gz apr-ic

MyBatis多個接口參數報錯：Available parameters are [0, 1, param1, param2]，及解決方法

pan 解決而且 crm ger int mybatis 添加為什麽 1. sql語句如下：　 SELECT * FROM tb_crm_user WHERE id = #{userId, jdbcType=INTEGER} AND

java 連接 redis集群時報錯：Could not get a resource from the pool

rom idt log 圖片 pool 本機ip redis style exce 由於弄這個的時候浪費了太多的時間，所以才記錄下這個錯，給大夥參考下檢查了一下，配置啥的都沒問題的，但在redis集群機器上就可以，錯誤如下： Exception in thread "

windows搭建zookeeper集群報錯之Invalid config, exiting abnormally

war XA disable roc edge 自己 lose incr cli 第一次嘗試自己在Windows上搭建zk的集群，是在搭建單機zk的基礎上操作的。單機時的zoo.cfg文件如下所示，可正常啟動 # The number of millisecond

Tomcat啟動服務報錯：Unknown version string [3.1]. Default version will be used.

Tomcat、jdk、web.xml 對應關係：（版本往下相容） web.xml——version2.2——JDK1.1——Tomcat3.3 web.xml——version2.3——JDK1.3——Tomcat4.1 web.xml——version2.4——JDK1.4——T

報錯：log4j:ERROR Category option " 1 " not a decimal integer.

程式碼： package com.zml; import org.apache.log4j.Logger; public class Day01 { private static Logge

pytorch報錯：ValueError: Expected more than 1 value per channel when training, got input size [1, 768,1

Traceback (most recent call last): File "train_ammeter_twoclass.py", line 189, in <module> train(epoch) File "train_ammeter_twoclass

hive 報錯：Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(me

建立hive表時報如下錯 Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: java.net.ConnectExcept

Linux重啟網絡卡報錯：Bringing up interface eth0:1......

今天樓主想在Linux下配置一款新的網絡卡的eth0:1，在重啟網絡卡的時候報錯：Bringing up interface eth1: Device eth1 does not seem to b

Intellij idea編譯報錯： javacTask: 源發行版 1.7 需要目標發行版 1.7

新建專案，執行起來，既然報一些錯誤，解決之後把方法記錄下的。錯誤資訊如下： 1：File > Project Structure > 介面中：一是看是否jdk設定了：然後看下 Project的設定：這些設定都OK時。

IntelliJ Idea編譯報錯：javacTask: 源發行版 1.7 需要目標發行版 1.7

在idea中編譯時發生如下的錯誤 Information:Using javac 1.7.0_75 to compile java sources Information:java: javacTask:源發行版1.6需要目標發行版1.6Information:java:Errors occurred wh

python locust介面效能測試HTTPS網站報錯：Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certi

問題描述：測試HTTPS SSL 協議的網站介面，用Python Locust模組，不論POST還是GET都提示錯誤： SSLError Max retries exceeded with url: /action.php?m=upload (Caused by SSL

清理登錄檔後，eclipse啟動報錯：JVM terminated. Exit code=1

JVM terminated. Exit code=-1-Xms40m-Xmx256m-Djava.net.preferIPv4Stack=true-XX:MaxPermSize=512m-Djava.class.path=D:/Develop/Php/eclipse/plugins/org.eclipse.

打開Myeclipse時遇到了如下報錯： ‘Building workspace’ has encountered a problem. Errors occurred during the build.

deploy pan plugins ref works AD refs time blog 問題描述：打開Myeclipse時遇到了如下報錯： ‘Building workspace’ has encountered a problem. Errors occurred

報錯：未能加載文件或程序集“WebGrease, Version=1.5.1.25624, Culture=neutral, Publ

技術分享運行文件 web 某個版本 ase 分享 pack neu 通過NuGet安裝某程序包後，運行程序出現如上錯誤。可能是程序集版本不兼容引起的，可以通過NuGet先把程序包刪除，然後再安裝最新或某個版本的程序包。通過"uninstall-package -f

k8s, etcd集群搭建報報錯：request cluster ID mismatch (got

參考 mat art wan 通過 https rev boot conf 目前在學習K8S, etcd 集群搭建，啟動時候報錯： master-16 etcd[25461]: request cluster ID mismatch (got bdd7c7c32bde120

ES集群修改index副本數報錯：index read-only / allow delete

number 磁盤空間 kibana 設置 tools iba 故障 json 增加 ES集群修改index副本數，報錯：index read-only / allow delete (api) 原因： es集群數據量增速過快，導致個別es node節點磁盤使用率在%80

ceph集群報錯：HEALTH_ERR 1 pgs inconsistent; 1 scrub errors

相關推薦