利用HDFS實現ElasticSearch7.2容災方案
阿新 • • 發佈:2020-10-14
# 利用HDFS實現ElasticSearch7.2容災方案
[toc]
# 前言
Elasticsearch 副本提供了高可靠性,它們讓你可以容忍零星的節點丟失而不會中斷服務。但是,副本並不提供對災難性故障的保護。對這種情況,就需要的是對叢集真正的備份(在某些東西確實出問題的時候有一個完整的拷貝)。
案例模擬ElasticSearch7.2叢集環境,採用`snapshot` API基於快照的方式備份叢集。
案例演示`HDFS`分散式檔案系統作為倉庫舉例。
# 快照版本相容
![](http://bed.thunisoft.com:9000/ibed/2020/10/11/AtK7rrrai.png)
# 備份叢集
## HDFS檔案系統
### 軟體下載
[下載地址](https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz)
```
hadoop-3.3.0.tar.gz
```
### JDK環境
> hadoop java編寫,執行需依賴jvm
```
jdk-8u161-linux-x64.tar.gz
```
### 配置系統環境變數
```shell
#JAVA
export JAVA_HOME=/home/hadoop/jdk1.8.0_161
export CLASSPATH=$JAVA_HOME/libdt.jar:$JAVA_HOME/tools.jar
#hadoop
export HADOOP_HOME=/home/hadoop/hadoop-3.3.0
export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
```
### hadoop配置
> hadoop-3.3.0/etc/hadoop 的目錄下
#### 配置JAVA_HOME
hadoop-env.sh
```shell
export JAVA_HOME=/home/hadoop/jdk1.8.0_161
```
#### 配置核心元件檔案
core-site.xml需要在和 之間新增
```xml
fs.defaultFS
hdfs://172.16.176.103:9000
hadoop.tmp.dir
/data
```
#### 配置檔案系統
hdfs-site.xml需要在和 之間新增
```xml
dfs.namenode.name.dir
/data/namenode
dfs.datanode.data.dir
/data/datanode
dfs.replication
1
dfs.permissions
false
```
#### 配置mapred
mapred-site.xml
```xml
mapreduce.framework.name
yarn
```
#### 配置 yarn-site.xml
yarn-site.xml
```xml
yarn.resourcemanager.hostname
elasticsearch01
```
### 格式化檔案系統
````
hdfs namenode -format
````
### 啟動hdfs
start-dfs.sh
```shell
$ start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [host103]
Starting datanodes
Starting secondary namenodes [host103]
```
### 訪問
```
http://localhost:9870/
```
## ES外掛安裝
叢集中每個節點都必須安裝hdfs外掛,安裝後需`重啟ES`
### 外掛下載
> 外掛版本和ES版本相對應
[下載地址](https://artifacts.elastic.co/downloads/elasticsearch-plugins/repository-hdfs/repository-hdfs-7.2.0.zip)
```
repository-hdfs-7.2.0.zip
```
### 外掛安裝
> 提前下載軟體包,離線安裝
叢集中各節點依次安裝
sudo bin/elasticsearch-plugin install file:///path/to/plugin.zip
```shell
$ ./elasticsearch-plugin install file:///home/es/repository-hdfs-7.2.0.zip
-> Downloading file:///home/es/repository-hdfs-7.2.0.zip
[=================================================] 100%
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: plugin requires additional permissions @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.lang.RuntimePermission accessClassInPackage.sun.security.krb5
* java.lang.RuntimePermission accessDeclaredMembers
* java.lang.RuntimePermission getClassLoader
* java.lang.RuntimePermission loadLibrary.jaas
* java.lang.RuntimePermission loadLibrary.jaas_nt
* java.lang.RuntimePermission loadLibrary.jaas_unix
* java.lang.RuntimePermission setContextClassLoader
* java.lang.RuntimePermission shutdownHooks
* java.lang.reflect.ReflectPermission suppressAccessChecks
* java.net.SocketPermission * connect,resolve
* java.net.SocketPermission localhost:0 listen,resolve
* java.security.SecurityPermission insertProvider.SaslPlainServer
* java.security.SecurityPermission putProviderProperty.SaslPlainServer
* java.util.PropertyPermission * read,write
* javax.security.auth.AuthPermission doAs
* javax.security.auth.AuthPermission getSubject
* javax.security.auth.AuthPermission modifyPrincipals
* javax.security.auth.AuthPermission modifyPrivateCredentials
* javax.security.auth.AuthPermission modifyPublicCredentials
* javax.security.auth.PrivateCredentialPermission javax.security.auth.kerberos.KerberosTicket * "*" read
* javax.security.auth.PrivateCredentialPermission javax.security.auth.kerberos.KeyTab * "*" read
* javax.security.auth.PrivateCredentialPermission org.apache.hadoop.security.Credentials * "*" read
* javax.security.auth.kerberos.ServicePermission * initiate
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.
Continue with installation? [y/N]y
-> Installed repository-hdfs
$
```
## 建立倉庫
- 建立
```json
PUT _snapshot/my_hdfs_repository
{
"type": "hdfs", --型別
"settings": {
"uri": "hdfs://172.16.176.103:9000/", --hdfs訪問url
"path": "/data",
"conf.dfs.client.read.shortcircuit": "false"
}
}
```
- 檢視
```json
GET /_snapshot
{
"my_hdfs_repository" : {
"type" : "hdfs",
"settings" : {
"path" : "/data",
"uri" : "hdfs://172.16.176.103:9000/",
"conf" : {
"dfs" : {
"client" : {
"read" : {
"shortcircuit" : "false"
}
}
}
}
}
}
}
```
## 建立快照
- 建立快照
不等待快照完成,即刻返回結果
```shell
PUT _snapshot/my_hdfs_repository/snapshot_i_xfjbblxt_cxfw_xfj_d12
{
"indices": "i_xfjbblxt_cxfw_xfj_d12"
}
```
- 檢視快照當前狀態
```json
GET _snapshot/my_hdfs_repository/snapshot_i_xfjbblxt_cxfw_xfj_d12
{
"snapshots" : [
{
"snapshot" : "snapshot_i_xfjbblxt_cxfw_xfj_d12",
"uuid" : "-BS9XjxvS1Sp6wW_bT02lA",
"version_id" : 7020099,
"version" : "7.2.0",
"indices" : [
"i_xfjbblxt_cxfw_xfj_d12"
],
"include_global_state" : true,
"state" : "IN_PROGRESS", --正在做快照中
"start_time" : "2020-10-12T14:04:49.425Z", --開始時間
"start_time_in_millis" : 1602511489425,
"end_time" : "1970-01-01T00:00:00.000Z",
"end_time_in_millis" : 0,
"duration_in_millis" : -1602511489425,
"failures" : [ ],
"shards" : {
"total" : 0,
"failed" : 0,
"successful" : 0
}
}
]
}
```
- 完成狀態
```json
{
"snapshots" : [
{
"snapshot" : "snapshot_i_xfjbblxt_cxfw_xfj_d12", --快照名稱
"uuid" : "-BS9XjxvS1Sp6wW_bT02lA",
"version_id" : 7020099,
"version" : "7.2.0",
"indices" : [
"i_xfjbblxt_cxfw_xfj_d12" --索引
],
"include_global_state" : true,
"state" : "SUCCESS", --快照成功
"start_time" : "2020-10-12T14:04:49.425Z", --開始時間
"start_time_in_millis" : 1602511489425, --開始時間戳
"end_time" : "2020-10-12T14:24:33.942Z", --結束時間
"end_time_in_millis" : 1602512673942, --結束時間戳
"duration_in_millis" : 1184517, --耗時(毫秒)
"failures" : [ ],
"shards" : {
"total" : 5, --總分片
"failed" : 0,
"successful" : 5 --成功分片
}
}
]
}
```
## 恢復快照
**快照恢復如果恢復到原索引中,需要先把原索引關閉或者先刪除後,在進行快照恢復**
- 恢復快照
```json
POST _snapshot/my_hdfs_repository/snapshot_i_xfjbblxt_cxfw_xfj_d12/_restore
{
"indices": "i_xfjbblxt_cxfw_xfj_d12" --快照備份索引名稱
,"rename_pattern": "i_xfjbblxt_cxfw_xfj_d12" --檢索匹配到的索引名稱
, "rename_replacement": "restored_i_xfjbblxt_cxfw_xfj_d12" --重新命名索引
}
```
- 狀態檢視
```json
{
"restored_i_xfjbblxt_cxfw_xfj_d12" : {
"shards" : [
{
"id" : 4,
"type" : "SNAPSHOT",
"stage" : "INDEX",
"primary" : true,
"start_time_in_millis" : 1602571287856,
"total_time_in_millis" : 1249147,
"source" : {
"repository" : "my_hdfs_repository",
"snapshot" : "snapshot_i_xfjbblxt_cxfw_xfj_d12",
"version" : "7.2.0",
"index" : "i_xfjbblxt_cxfw_xfj_d12",
"restoreUUID" : "KM1EaKsAQkO4OxB0PwKe0Q"
},
"target" : {
"id" : "DWvUrfqQRxGLIWm6SQmunA",
"host" : "172.16.176.104",
"transport_address" : "172.16.176.104:9300",
"ip" : "172.16.176.104",
"name" : "node-104"
},
"index" : {
"size" : {
"total_in_bytes" : 8312825377,
"reused_in_bytes" : 0,
"recovered_in_bytes" : 6781859331,
"percent" : "81.6%"
},
"files" : {
"total" : 104,
"reused" : 0,
"recovered" : 86,
"percent" : "82.7%"
},
"total_time_in_millis" : 1249039,
"source_throttle_time_in_millis" : 0,
"target_throttle_time_in_millis" : 0
},
"translog" : {
"recovered" : 0,
"total" : 0,
"percent" : "100.0%",
"total_on_start" : 0,
"total_time_in_millis" : 0
},
"verify_index" : {
"check_index_time_in_millis" : 0,
"total_time_in_millis" : 0
}
},
--部分省略
```
# 備份恢復時間
## 案例快照詳情
> 第一次快照
| 節點數 | 主分片 | 副本分配 | 資料量 | 大小 | 快照大小 | 耗時(快照) |
| ------ | ------ | -------- | ------- | ------ | -------- | ------------ |
| 3 | 5 | 1 | 5149535 | 77.4gb | 40gb | 19.74195分鐘 |
## 案例快照恢復詳情
**快照恢復過程為並行恢復**
| 分片 | 耗時(恢復) | 恢復位元組 |
| ------- | ------------ | -------- |
| 0(主) | 27.42分鐘 | 7.75G |
| 1(主) | 27.14分鐘 | 7.72G |
| 2(主) | 27.45分鐘 | 7.75G |
| 3(主) | 25.89分鐘 | 7.74G |
| 4(主) | 25.5分鐘 | 7.74G |
| 0(副) | 18.65分鐘 | 7.75G |
| 1(副) | 10.3分鐘 | 7.72G |
| 2(副) | 17.21分鐘 | 7.75G |
| 3(副) | 10.6分鐘 | 7.74G |
| 4(副) | 18.32分鐘 | 7.74G |
# 常見問題
## 啟動hdfs
### 問題1
```shell
$ start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [host103]
Last login: Sun Oct 11 22:32:11 CST 2020 from 172.16.176.46 on pts/1
host103: ERROR: JAVA_HOME is not set and could not be found.
Starting datanodes
Last login: Sun Oct 11 22:32:23 CST 2020 on pts/1
localhost: ERROR: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [host103]
Last login: Sun Oct 11 22:32:24 CST 2020 on pts/1
host103: ERROR: JAVA_HOME is not set and could not be found.
```
- 解決
配置java環境變數
```shell
export JAVA_HOME=/home/hadoop/jdk1.8.0_161
export CLASSPATH=$JAVA_HOME/libdt.jar:$JAVA_HOME/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
```
### 問題2
```shell
$ start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [host103]
host103: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Starting datanodes
localhost: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
Starting secondary namenodes [host103]
host103: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
```
- 解決
> hadoop使用者執行
```shell
[hadoop@host103 ~]$ ssh-copy-id hadoop@host103
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/hadoop/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@host103's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'hadoop@host103'"
and check to make sure that only the key(s) you wanted were added.
```
## 建立倉庫
### 問題1
- 建立
```json
PUT _snapshot/my_hdfs_repository
{
"type": "hdfs",
"settings": {
"uri": "hdfs://172.16.176.103:9000/",
"path": "/",
"conf.dfs.client.read.shortcircuit": "false"
}
}
```
- 錯誤
```shell
error": {
"root_cause": [
{
"type": "repository_exception",
"reason": "[my_hdfs_repository] cannot create blob store"
}
],
"type": "repository_exception",
"reason": "[my_hdfs_repository] cannot create blob store",
"caused_by": {
"type": "unchecked_i_o_exception",
"reason": "Cannot create HDFS repository for uri [hdfs://172.16.176.103:9000/]",
"caused_by": {
"type": "access_control_exception",
"reason": "Permission denied: user=es, access=WRITE, inode=\"/\":hadoop:supergroup:drwxr-xr-x\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:496)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:336)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:360)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:239)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1909)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1893)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1852)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:60)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3407)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1161)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:739)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:532)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1020)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:948)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2952)\n",
```
- 問題解決
新增hdfs-site.xml
```xml
dfs.permissions
false
```
# 參考文件
- HDFS外掛
https://www.elastic.co/guide/en/elasticsearch/plugins/7.2/repository-hdfs.html
- HDFS SingleCluster
https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleClust