1. 程式人生 > 其它 >2.Ceph 基礎篇 - 叢集部署及故障排查

2.Ceph 基礎篇 - 叢集部署及故障排查

文章轉載自:https://mp.weixin.qq.com/s?__biz=MzI1MDgwNzQ1MQ==&mid=2247485243&idx=1&sn=e425c31af90c72c75d535e16d71f728b&chksm=e9fdd2cfde8a5bd9423b9b15f69e305fc5fa30c543f941f57c8b456d28496e871a46b7faebd7&scene=178&cur_album_id=1600845417376776197#rd

部署之前

安裝方式

ceph-deploy 安裝,官網已經沒有這個部署頁面,N之前的版本可以使用,包括N,自動化安裝工具,後面的版本將不支援,這裡我們選擇使用ceph-deploy安裝;

cephadm 安裝,近期出現的安裝方式,需要 Centos8 的環境,並且支援圖形化安裝或者命令列安裝,O版本之後,未來推薦使用 cephadm 安裝;

手動安裝,一步步的教你如何安裝,這種方法可以清晰瞭解部署細節,以及 Ceph 叢集各元件的關聯關係等;

Rook 安裝,與現有 kubernetes 叢集整合安裝,安裝到叢集中;

ceph-ansiable 自動化安裝;

伺服器規劃

架構圖

Ceph 概念介紹

安裝部署

1.修改主機名以及新增host(所有機器)

# 修改主機名
hostnamectl set-hostname ceph-node01

# 修改hosts
[root@ceph-node01 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
100.73.18.152 ceph-node01
100.73.18.153 ceph-node02
100.73.18.128 ceph-node03
[root@ceph-node01 ~]#

2.ceph-admin 節點與其它節點做信任

由於ceph-deploy命令不支援執行中輸入密碼,因此必須在管理節點(ceph-admin)上生成 ssh 金鑰並將其分發到ceph 叢集的各個節點上面。
# 生成金鑰
ssh-keygen -t rsa -P ""

# copy 金鑰
ssh-copy-id -i .ssh/id_rsa.pub <node-name>

3.安裝 NTP 服務,並做時間同步

yum -y install ntp

安裝完成後,配置/etc/ntp.conf 即可,如果公司有 NTP server,直接配置即可,如果沒有,可以選擇一個公網的;

[root@ceph-node01 ~]# ntpq -p
     remote refid st t when poll reach delay offset jitter
==============================================================================
*100.100.1.2 202.28.93.5 2 u 665 1024 377 1.268 -7.338 1.523
-100.100.1.2 202.28.116.236 2 u 1015 1024 377 0.805 -12.547 0.693
+100.100.1.3 203.159.70.33 2 u 117 1024 377 0.742 -5.007 1.814
+100.100.1.4 203.159.70.33 2 u 19 1024 377 0.731 -5.770 2.652
[root@ceph-node01 ~]#

通過以上命令測試ntp 是否配置好即可,可以配置一臺,其它節點伺服器指向這一臺即可;

4.關閉 iptables 或 firewalld 服務 (也可以指定埠通訊,不關閉)

systemctl stop firewalld.service
systemctl stop iptables.service
systemctl disable firewalld.service
systemctl disable iptables.service

5.關閉並禁用 SELinux

# 修改
sed -i 's@^\(SELINUX=\).*@\1disabled@' /etc/sysconfig/selinux

# 生效
setenforce 0

# 檢視
getenforce

6.配置 yum 源(同步到所有機器)

刪除原來的配置檔案,從阿里源下載最新yum 檔案;阿里雲的映象官網:https://developer.aliyun.com/mirror/,基本上可以在這裡找到所有相關的映象連結;

wget -O /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo

注意 epel-7 也要下載一下,預設的ceph版本較低,配置 ceph 源,需要自己根據阿里提供的進行自己編寫,如下:

ceph:https://mirrors.aliyun.com/ceph/?spm=a2c6h.13651104.0.0.435f22d16X5Jk7

[root@ceph-node01 yum.repos.d]# cat ceph.repo
[norch]
name=norch
baseurl=https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch/
enabled=1
gpgcheck=0

[x86_64]
name=x86_64
baseurl=https://mirrors.aliyun.com/ceph/rpm-nautilus/el7/x86_64/
enabled=1
gpgcheck=0
[root@ceph-node01 yum.repos.d]#

然後同步repo到所有機器上面;

# 檢視當前 yum 源
yum repolist
yum repolist all

就是把伺服器的包資訊下載到本地電腦快取起來,makecache建立一個快取,以後用yum install時就在快取中搜索,提高了速度,配合yum -C search xxx使用。

yum makecache
yum -C search xxx
yum clean all

7.管理節點安裝 ceph-deploy

Ceph 儲存叢集部署過程中可通過管理節點使用ceph-deploy全程進行,這裡首先在管理節點安裝ceph-deploy及其依賴的程式包,這裡要注意安裝 python-setuptools 工具包;

yum install ceph-deploy python-setuptools python2-subprocess32

8.部署RADOS儲存叢集
建立一個專屬目錄;

mkdir ceph-deploy && cd ceph-deploy

初始化第一個MON節點,準備建立叢集

ceph-deploy new --cluster-network 100.73.18.0/24 --public-network 100.73.18.0/24 <node-name>
  • --cluster-network 內部資料同步使用;
  • --public-network 對外提供服務使用的;

生成三個配置檔案,ceph.conf(配置檔案)、ceph-deploy-ceph.log(日誌檔案)、 ceph.mon.keyring(認證檔案)。

9.安裝 ceph 叢集

ceph-deploy install {ceph-node} {....}

這裡使用這種方式安裝的話,會自動化的把軟體安裝包安裝上去,這種安裝方式不太好,因為它會重新配置yum源,包括我們的 epel yum源,還有 ceph 的 yum 源,都會指向他內建的yum源,這樣會導致你訪問到國外,下載很慢,建議手動安裝,下面每臺機器都手動安裝即可,如下:

[root@ceph-node01 ceph-deploy]# yum -y install ceph ceph-mds ceph-mgr ceph-osd ceph-radosgw ceph-mon

10.配置檔案和admin金鑰copy到ceph叢集各節點

[root@ceph-node01 ceph-deploy]# ceph -s
[errno 2] error connecting to the cluster
[root@ceph-node01 ceph-deploy]#
# 原因,沒有admin 檔案,下面通過 admin 命令進行 copy
[root@ceph-node01 ceph-deploy]# ceph-deploy admin ceph-node01 ceph-node02 ceph-node03

再次檢視叢集狀態如下:

[root@ceph-node01 ceph-deploy]# ceph -s
  cluster:
    id: cc10b0cb-476f-420c-b1d6-e48c1dc929af
    health: HEALTH_OK

  services:
    mon: 1 daemons, quorum ceph-node01 (age 2m)
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in

  data:
    pools: 0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage: 0 B used, 0 B / 0 B avail
    pgs:

[root@ceph-node01 ceph-deploy]#

發現 services 只有一個 mon,沒有 mgr、也沒有 osd;

11.安裝 mgr

[root@ceph-node01 ceph-deploy]# ceph-deploy mgr create ceph-node01
[root@ceph-node01 ceph-deploy]# ceph -s
  cluster:
    id: cc10b0cb-476f-420c-b1d6-e48c1dc929af
    health: HEALTH_WARN
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 1 daemons, quorum ceph-node01 (age 4m)
    mgr: ceph-node01(active, since 84s)
    osd: 0 osds: 0 up, 0 in

  data:
    pools: 0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage: 0 B used, 0 B / 0 B avail
    pgs:

[root@ceph-node01 ceph-deploy]#

12.向 RADOS 叢集新增 OSD

[root@ceph-node01 ceph-deploy]# ceph-deploy osd list ceph-node01

ceph-deploy disk 命令可以檢查並列出 OSD 節點上所有可用的磁碟的相關資訊;

[root@ceph-node01 ceph-deploy]# ceph-deploy disk zap ceph-node01 /dev/vdb

在管理節點上使用 ceph-deploy 命令擦除計劃專用於 OSD 磁碟上的所有分割槽表和資料以便用於 OSD,命令格式 為 ceph-deploy disk zap {osd-server-name} {disk-name},需要注意的是此步會清除目標裝置上的所有資料。

# 檢視磁碟情況
[root@ceph-node01 ceph-deploy]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
vda 252:0 0 50G 0 disk
├─vda1 252:1 0 500M 0 part /boot
└─vda2 252:2 0 49.5G 0 part
  ├─centos-root 253:0 0 44.5G 0 lvm /
  └─centos-swap 253:1 0 5G 0 lvm [SWAP]
vdb 252:16 0 100G 0 disk
vdc 252:32 0 100G 0 disk
[root@ceph-node01 ceph-deploy]#
# 新增 osd
[root@ceph-node01 ceph-deploy]# ceph-deploy osd create ceph-node01 --data /dev/vdb
。。。
[root@ceph-node01 ceph-deploy]# ceph-deploy osd create ceph-node02 --data /dev/vdb
。。。
[root@ceph-node01 ceph-deploy]# ceph-deploy osd create ceph-node03 --data /dev/vdb
。。。
[root@ceph-node01 ceph-deploy]# ceph-deploy osd list ceph-node01

ceph-deploy osd list 命令列出指定節點上的 OSD 資訊;

13.檢視 OSD

檢視 OSD 的相關資訊;

[root@ceph-node01 ceph-deploy]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.39067 root default
-3 0.09769 host ceph-node01
 0 hdd 0.09769 osd.0 up 1.00000 1.00000
-5 0.09769 host ceph-node02
 1 hdd 0.09769 osd.1 up 1.00000 1.00000
-7 0.19530 host ceph-node03
 2 hdd 0.19530 osd.2 up 1.00000 1.00000
[root@ceph-node01 ceph-deploy]#
[root@ceph-node01 ceph-deploy]# ceph osd stat
3 osds: 3 up (since 2d), 3 in (since 2d); epoch: e26
[root@ceph-node01 ceph-deploy]# ceph osd ls
0
1
2
[root@ceph-node01 ceph-deploy]# ceph osd dump
epoch 26
fsid cc10b0cb-476f-420c-b1d6-e48c1dc929af
created 2020-09-29 09:14:30.781641
modified 2020-09-29 10:14:06.100849
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 7
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release nautilus
pool 1 'ceph-demo' replicated size 2 min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode warn last_change 25 lfor 0/0/20 flags hashpspool,selfmanaged_snaps stripe_width 0
  removed_snaps [1~3]
max_osd 3
osd.0 up in weight 1 up_from 5 up_thru 22 down_at 0 last_clean_interval [0,0) [v2:100.73.18.152:6802/11943,v1:100.73.18.152:6803/11943] [v2:100.73.18.152:6804/11943,v1:100.73.18.152:6805/11943] exists,up 136f6cf7-05a0-4325-aa92-ad316560edff
osd.1 up in weight 1 up_from 9 up_thru 22 down_at 0 last_clean_interval [0,0) [v2:100.73.18.153:6800/10633,v1:100.73.18.153:6801/10633] [v2:100.73.18.153:6802/10633,v1:100.73.18.153:6803/10633] exists,up 79804c00-2662-47a1-9987-95579afa10b6
osd.2 up in weight 1 up_from 13 up_thru 22 down_at 0 last_clean_interval [0,0) [v2:100.73.18.128:6800/10558,v1:100.73.18.128:6801/10558] [v2:100.73.18.128:6802/10558,v1:100.73.18.128:6803/10558] exists,up f15cacec-fdcd-4d3c-8bb8-ab3565cb4d0b
[root@ceph-node01 ceph-deploy]#

14.擴充套件 mon

[root@ceph-node01 ceph-deploy]# ceph-deploy mon add ceph-node02
[root@ceph-node01 ceph-deploy]# ceph-deploy mon add ceph-node03

由於 mon 需要使用 paxos 演算法進行選舉一個 leader,可以檢視選舉狀態;

[root@ceph-node01 ceph-deploy]# ceph quorum_status

檢視 mon 狀態

[root@ceph-node01 ceph-deploy]# ceph mon stat
e3: 3 mons at {ceph-node01=[v2:100.73.18.152:3300/0,v1:100.73.18.152:6789/0],ceph-node02=[v2:100.73.18.153:3300/0,v1:100.73.18.153:6789/0],ceph-node03=[v2:100.73.18.128:3300/0,v1:100.73.18.128:6789/0]}, election epoch 12, leader 0 ceph-node01, quorum 0,1,2 ceph-node01,ceph-node02,ceph-node03
[root@ceph-node01 ceph-deploy]#

檢視 mon 詳情

[root@ceph-node01 ceph-deploy]# ceph mon dump
dumped monmap epoch 3
epoch 3
fsid cc10b0cb-476f-420c-b1d6-e48c1dc929af
last_changed 2020-09-29 09:28:35.692432
created 2020-09-29 09:14:30.493476
min_mon_release 14 (nautilus)
0: [v2:100.73.18.152:3300/0,v1:100.73.18.152:6789/0] mon.ceph-node01
1: [v2:100.73.18.153:3300/0,v1:100.73.18.153:6789/0] mon.ceph-node02
2: [v2:100.73.18.128:3300/0,v1:100.73.18.128:6789/0] mon.ceph-node03
[root@ceph-node01 ceph-deploy]#

15.擴充套件 mgr

[root@ceph-node01 ceph-deploy]# ceph-deploy mgr create ceph-node02 ceph-node03

16.檢視叢集狀態

[root@ceph-node01 ceph-deploy]# ceph -s
  cluster:
    id: cc10b0cb-476f-420c-b1d6-e48c1dc929af
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph-node01,ceph-node02,ceph-node03 (age 9m)
    mgr: ceph-node01(active, since 20m), standbys: ceph-node02, ceph-node03
    osd: 3 osds: 3 up (since 13m), 3 in (since 13m)

  data:
    pools: 0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage: 3.0 GiB used, 397 GiB / 400 GiB avail
    pgs:

[root@ceph-node01 ceph-deploy]#

3個mon、3個mgr、3個osd的RADOS叢集建立成功;

17.移出故障 OSD
Ceph叢集中的一個OSD通常對應於一個裝置,且運行於專用的守護程序。在某OSD裝置出現故障,或管理員出於管理之需確實要移除特定的OSD裝置時,需要先停止相關的守護程序,而後再進行移除操作。

  1. 停用裝置:ceph osd out {osd-num}

  2. 停止程序:sudo systemctl stop ceph-osd@{osd-num}

  3. 移除裝置:ceph osd purge {id} --yes-i-really-mean-it

[root@ceph-node01 ceph-deploy]# ceph osd out 0
marked out osd.0.
[root@ceph-node01 ceph-deploy]# systemctl stop ceph-osd@0
[root@ceph-node01 ceph-deploy]# ceph osd purge 0 --yes-i-really-mean-it
purged osd.0
[root@ceph-node01 ceph-deploy]#

移出後檢視狀態

[root@ceph-node01 ceph-deploy]# ceph -s
  cluster:
    id: cc10b0cb-476f-420c-b1d6-e48c1dc929af
    health: HEALTH_WARN
            2 daemons have recently crashed
            OSD count 2 < osd_pool_default_size 3

  services:
    mon: 3 daemons, quorum ceph-node01,ceph-node02,ceph-node03 (age 15h)
    mgr: ceph-node01(active, since 15h)
    osd: 2 osds: 2 up (since 37h), 2 in (since 37h)

  data:
    pools: 1 pools, 128 pgs
    objects: 54 objects, 137 MiB
    usage: 2.3 GiB used, 298 GiB / 300 GiB avail
    pgs: 128 active+clean

[root@ceph-node01 ceph-deploy]#

磁碟擦除資料時報錯

[root@ceph-node01 ceph-deploy]# ceph-deploy disk zap ceph-node01 /dev/vdb
。。。
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: /usr/sbin/ceph-volume lvm zap /dev/vdb

根據擦除資料時出現報錯後,可以使用dd清空資料,然後再重啟

[root@ceph-node01 ceph-deploy]# dd if=/dev/zero of=/dev/vdb bs=512K count=1
[root@ceph-node01 ceph-deploy]# reboot

磁碟修復好後,再加入叢集

[root@ceph-node01 ceph-deploy]# ceph-deploy osd create ceph-node01 --data /dev/vdb
。。。
[ceph-node01][WARNIN] Running command: /usr/bin/systemctl enable --runtime ceph-osd@0
[ceph-node01][WARNIN] stderr: Created symlink from /run/systemd/system/ceph-osd.target.wants/[email protected] to /usr/lib/systemd/system/[email protected].
[ceph-node01][WARNIN] Running command: /usr/bin/systemctl start ceph-osd@0
[ceph-node01][WARNIN] --> ceph-volume lvm activate successful for osd ID: 0
[ceph-node01][WARNIN] --> ceph-volume lvm create successful for: /dev/vdb
[ceph-node01][INFO ] checking OSD status...
[ceph-node01][DEBUG ] find the location of an executable
[ceph-node01][INFO ] Running command: /bin/ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host ceph-node01 is now ready for osd use.
[root@ceph-node01 ceph-deploy]#

檢視叢集狀態,發現正在遷移資料

[root@ceph-node01 ceph-deploy]# ceph -s
  cluster:
    id: cc10b0cb-476f-420c-b1d6-e48c1dc929af
    health: HEALTH_WARN
            Degraded data redundancy: 9/88 objects degraded (10.227%), 7 pgs degraded
            2 daemons have recently crashed

  services:
    mon: 3 daemons, quorum ceph-node01,ceph-node02,ceph-node03 (age 15h)
    mgr: ceph-node01(active, since 15h)
    osd: 3 osds: 3 up (since 6s), 3 in (since 6s)

  data:
    pools: 1 pools, 128 pgs
    objects: 44 objects, 105 MiB
    usage: 3.3 GiB used, 397 GiB / 400 GiB avail
    pgs: 24.219% pgs not active
             9/88 objects degraded (10.227%)
             1/88 objects misplaced (1.136%)
             90 active+clean
             31 peering
             6 active+recovery_wait+degraded
             1 active+recovering+degraded

  io:
    recovery: 1.3 MiB/s, 1 keys/s, 1 objects/s

[root@ceph-node01 ceph-deploy]#

等待一段時間後,發現數據遷移完成,但有兩個程序crashed了

[root@ceph-node01 ceph-deploy]# ceph -s
  cluster:
    id: cc10b0cb-476f-420c-b1d6-e48c1dc929af
    health: HEALTH_WARN
            2 daemons have recently crashed

  services:
    mon: 3 daemons, quorum ceph-node01,ceph-node02,ceph-node03 (age 15h)
    mgr: ceph-node01(active, since 15h), standbys: ceph-node02, ceph-node03
    osd: 3 osds: 3 up (since 30m), 3 in (since 30m)

  data:
    pools: 1 pools, 128 pgs
    objects: 54 objects, 137 MiB
    usage: 3.3 GiB used, 397 GiB / 400 GiB avail
    pgs: 128 active+clean

[root@ceph-node01 ceph-deploy]#

通過 ceph health detail 檢視叢集問題

[root@ceph-node01 ceph-deploy]# ceph health
HEALTH_WARN 2 daemons have recently crashed
[root@ceph-node01 ceph-deploy]# ceph health detail
HEALTH_WARN 2 daemons have recently crashed
RECENT_CRASH 2 daemons have recently crashed
    mgr.ceph-node02 crashed on host ceph-node02 at 2020-10-03 01:53:00.058389Z
    mgr.ceph-node03 crashed on host ceph-node03 at 2020-10-03 03:33:30.776755Z
[root@ceph-node01 ceph-deploy]# ceph crash ls
ID ENTITY NEW
2020-10-03_01:53:00.058389Z_c26486ef-adab-4a1f-9b94-68953571e8d3 mgr.ceph-node02 *
2020-10-03_03:33:30.776755Z_88464c4c-0711-42fa-ae05-6196180cfe31 mgr.ceph-node03 *
[root@ceph-node01 ceph-deploy]#

通過 systemctl restart ceph-mgr@ceph-node02無法重啟(原因後續要找下),再次重建了下;

[root@ceph-node01 ceph-deploy]# ceph-deploy mgr create ceph-node02 ceph-node03

再次檢視叢集狀態如下,mgr已經恢復,但還提示兩個程序crashed;

[root@ceph-node01 ceph-deploy]# ceph -s
  cluster:
    id: cc10b0cb-476f-420c-b1d6-e48c1dc929af
    health: HEALTH_WARN
            2 daemons have recently crashed

  services:
    mon: 3 daemons, quorum ceph-node01,ceph-node02,ceph-node03 (age 15h)
    mgr: ceph-node01(active, since 15h), standbys: ceph-node02, ceph-node03
    osd: 3 osds: 3 up (since 30m), 3 in (since 30m)

  data:
    pools: 1 pools, 128 pgs
    objects: 54 objects, 137 MiB
    usage: 3.3 GiB used, 397 GiB / 400 GiB avail
    pgs: 128 active+clean

[root@ceph-node01 ceph-deploy]#

通過 ceph crash archive-all 或者 ID 的形式修復,再次檢視叢集狀態如下:

[root@ceph-node01 ceph-deploy]# ceph crash archive-all
[root@ceph-node01 ceph-deploy]#
[root@ceph-node01 ceph-deploy]# ceph -s
  cluster:
    id: cc10b0cb-476f-420c-b1d6-e48c1dc929af
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph-node01,ceph-node02,ceph-node03 (age 15h)
    mgr: ceph-node01(active, since 15h), standbys: ceph-node02, ceph-node03
    osd: 3 osds: 3 up (since 33m), 3 in (since 33m)

  data:
    pools: 1 pools, 128 pgs
    objects: 54 objects, 137 MiB
    usage: 3.3 GiB used, 397 GiB / 400 GiB avail
    pgs: 128 active+clean

[root@ceph-node01 ceph-deploy]#

1. 安裝前準備

伺服器規劃、伺服器間信任、主機名解析(hosts)、NTP同步、firewalld/iptables關閉、SELinux 關閉、配置yum源等;

  1. Ceph 叢集部署

mon 建立:ceph-deploy new --cluster-network 100.73.18.0/24 --public-network 100.73.18.0/24 (建立第一個 mon )

配置拷貝:ceph-deploy admin ceph-node01 ceph-node02 ceph-node03

mon 擴充套件:ceph-deploy mon add ceph-node02

mgr 建立:ceph-deploy mgr create ceph-node01

mgr 擴充套件:ceph-deploy mgr create ceph-node02 ceph-node03

osd 建立:ceph-deploy osd create ceph-node01 --data /dev/vdb

  1. 叢集資訊檢視

列出可用 osd:ceph-deploy osd list ceph-node01(列出某節點可用osd)

檢視磁碟資訊:lsblk;

擦除已有盤資料使其成為osd加入叢集:ceph-deploy disk zap ceph-node01 /dev/vdb;

檢視 OSD 資訊:ceph osd tree、ceph osd stat、ceph osd ls、ceph osd dump等

檢視 mon 選舉:ceph quorum_status、ceph mon stat、ceph mon dump等

  1. 叢集故障

停止故障 OSD:ceph osd out {osd-num}

停止故障 OSD 程序:systemctl stop ceph-osd@{osd-num}

移出故障 OSD:ceph osd purge {id} --yes-i-really-mean-it

故障資訊檢視:ceph health、ceph health detail

檢視crash程序:ceph crash ls

修復後忽略:ceph crash archive-all