環境篇:Atlas2.1.0相容CDH6.3.2部署
阿新 • • 發佈:2020-12-16
# 環境篇:Atlas2.1.0相容CDH6.3.2部署
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505173158110-2132862529.png)
> Atlas 是什麼?
> Atlas是一組可擴充套件和可擴充套件的核心基礎治理服務,使企業能夠有效地滿足Hadoop中的合規性要求,並允許與整個企業資料生態系統整合。
>
> Apache Atlas為組織提供了開放的元資料管理和治理功能,以建立其資料資產的目錄,對這些資產進行分類和治理,併為資料科學家,分析師和資料治理團隊提供圍繞這些資料資產的協作功能。
> 如果沒有Atlas
>
> 大資料表依賴問題不好解決,元資料管理需要自行開發,如:hive血緣依賴圖
>
> 對於表依賴問題,沒有一個可以查詢的工具,不方便錯誤定位,即業務sql開發
- 官網:[http://atlas.apache.org](http://atlas.apache.org/)
- 表與表之間的血緣依賴
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505173428100-1858217094.png)
- 欄位與欄位之間的血緣依賴
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505173445712-2066238416.png)
## 1 Atlas 架構原理
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505173620236-637016777.png)
## 2 Atlas 安裝及使用
> 安裝需要元件,HDFS、Yarn、Zookeeper、Kafka、Hbase、Solr、Hive,Python2.7環境
>
> 需要Maven3.5.0以上,jdk_151以上,python2.7。
### 2.1 下載原始碼包2.0.0,IDEA開啟
- 因與CDH整合,修改`pom`檔案
- 在repositories標籤中增加CDH倉庫
```xml
cloudera
https://repository.cloudera.com/artifactory/cloudera-repos
true
false
```
### 2.2 修改相關版本與CDH版本對應
```java
7.4.0
3.0.0-cdh6.3.2
2.1.0-cdh6.3.2
7.4.0-cdh6.3.2
2.1.1-cdh6.3.2
2.2.1-cdh6.3.2
2.11
3.4.5-cdh6.3.2
1.4.7-cdh6.3.2
```
### 2.3 相容Hive2.1.1
- 所需修改的專案位置:`apache-atlas-sources-2.1.0\addons\hive-bridge`
①.org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 577行
```java
String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null;
```
改為:
```java
String catalogName = null;
```
②.org/apache/atlas/hive/hook/AtlasHiveHookContext.java 81行
```java
this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null;
```
改為:C:\Users\Heaton\Desktop\apache-atlas-2.1.0-sources\apache-atlas-sources-2.1.0\addons
```java
this.metastoreHandler = null;
```
### 2.3 編譯
```
mvn clean -DskipTests package -Pdist -X -T 8
```
- 編譯完成的檔案在此目錄`apache-atlas-sources-2.1.0\distro\target`
![](https://img2020.cnblogs.com/blog/1235870/202012/1235870-20201216135706379-224518859.png)
![](https://img2020.cnblogs.com/blog/1235870/202012/1235870-20201216135715526-2016758261.png)
### 2.5 安裝
```
mkdir /usr/local/src/atlas
cd /usr/local/src/atlas
#複製apache-atlas-2.1.0-bin.tar.gz到安裝目錄
tar -zxvf apache-atlas-2.1.0-bin.tar.gz
cd apache-atlas-2.1.0/
```
### 2.6 修改配置檔案
> vim conf\atlas-application.properties
```java
#整合修改hbase配置
atlas.graph.storage.hostname=cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181
#整合修改solr配置
atlas.graph.index.search.solr.zookeeper-url=cdh01.cm:2181/solr,cdh02.cm:2181/solr,cdh03.cm:2181/solr
#整合修改kafka配置
atlas.notification.embedded=false #false外接的kafka
atlas.kafka.zookeeper.connect=cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181
atlas.kafka.bootstrap.servers=cdh01.cm:9092,cdh02.cm:9092,cdh03.cm:9092
atlas.kafka.zookeeper.session.timeout.ms=60000
atlas.kafka.zookeeper.connection.timeout.ms=30000
atlas.kafka.enable.auto.commit=true
#整合修改其他配置
atlas.rest.address=http://cdh01.cm:21000 #訪問地址埠,此值修改不生效,預設本地21000埠,此埠和impala衝突
atlas.server.run.setup.on.start=false #如果啟用並設定為true,則在伺服器啟動時將執行安裝步驟
atlas.audit.hbase.zookeeper.quorum=cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181
#整合新增hive鉤子配置(檔案最下面即可)
#在hive中做任何操作,都會被鉤子所感應到,並生成相應的事件發往atlas所訂閱的kafka-topic,再由atlas進行元資料生成和儲存管理
######### Hive Hook Configs #######
atlas.hook.hive.synchronous=false
atlas.hook.hive.numRetries=3
atlas.hook.hive.queueSize=10000
atlas.cluster.name=primary
#配置使用者名稱密碼(選做)
#開啟或關閉三種驗證方法
atlas.authentication.method.kerberos=true|false
atlas.authentication.method.ldap=true|false
atlas.authentication.method.file=true
#vim users-credentials.properties(修改該檔案)
#> >>原始檔
#username=group::sha256-password
admin=ADMIN::8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
rangertagsync=RANGER_TAG_SYNC::e3f67240f5117d1753c940dae9eea772d36ed5fe9bd9c94a300e40413f1afb9d
#<<<
#>>>修改成使用者名稱bigdata123,密碼bigdata123
#username=group::sha256-password
bigdata123=ADMIN::aa0336d976ba6db36f33f75a20f68dd9035b1e0e2315c331c95c2dc19b2aac13
rangertagsync=RANGER_TAG_SYNC::e3f67240f5117d1753c940dae9eea772d36ed5fe9bd9c94a300e40413f1afb9d
#<<<
#計算sha256:echo -n "bigdata123"|sha256sum
```
> vim conf/atlas-env.sh
```java
#整合新增hbase配置->下面的目錄為atlas下的hbase配置目錄,需要後面加入叢集hbase配置
export HBASE_CONF_DIR=/usr/local/src/atlas/apache-atlas-2.1.0/conf/hbase/conf
#export HBASE_CONF_DIR=/etc/hbase/conf------------------------------
export MANAGE_LOCAL_HBASE=false (false外接的zk和hbase)
export MANAGE_LOCAL_SOLR=false (false外接的solr)
#修改記憶體指標(根據線上機器配置)
export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0
-XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=dumps/atlas_server.hprof
-Xloggc:logs/gc-worker.log -verbose:gc
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC
-XX:+PrintGCTimeStamps"
#優化 JDK1.8(以下需要16G記憶體)
export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m
-XX:MaxNewSize=5120m -XX:MetaspaceSize=100M
-XX:MaxMetaspaceSize=512m"
```
> vim conf/atlas-log4j.xml
```java
#去掉如下程式碼的註釋(開啟如下程式碼)
```
### 2.7 整合Hbase
- 新增hbase叢集配置檔案到apache-atlas-2.0.0/conf/hbase下(這裡連線的路徑需要和上面atlas-env.sh配置中一樣)
```ruby
ln -s /etc/hbase/conf/ /usr/local/src/atlas/apache-atlas-2.1.0/conf/hbase/
```
### 2.8 整合Solr
- 將apache-atlas-2.1.0/conf/solr檔案拷貝到solr所有節點的安裝目錄下,更名為`atlas-solr`
```bash
scp -r /usr/local/src/atlas/apache-atlas-2.1.0/conf/solr [email protected]:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/
scp -r /usr/local/src/atlas/apache-atlas-2.1.0/conf/solr [email protected]:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/
scp -r /usr/local/src/atlas/apache-atlas-2.1.0/conf/solr [email protected]:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/
#在solr節點
cd /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/
mv solr/ atlas-solr
#在任意solr節點修改solr對應的bash
vi /etc/passwd
/sbin/nologin 修改為 /bin/bash
#切換solr使用者執行
su solr
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c vertex_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c edge_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2
/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c fulltext_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2
#如果建立錯誤,可使用 /opt/cloudera/parcels/CDH/lib/solr/bin/solr delete -c ${collection_name} 刪除
#切換root使用者繼續配置其他
su root
```
- solr web控制檯: http://cdh01.cm:8983 驗證是否啟動成功
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200508215733689-1113328901.png)
### 2.9 整合kafka
- 建立kafka-topic
```ruby
kafka-topics --zookeeper cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK
kafka-topics --zookeeper cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES
kafka-topics --zookeeper cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK
```
### 2.10 啟動測試
```java
cd /usr/local/src/atlas/apache-atlas-2.1.0/
./bin/atlas_start.py
#停止:./bin/atlas_stop.py
```
- http://cdh01.cm:21000
- 預設使用者名稱和密碼為:admin
### 2.11 整合Hive
- 將 atlas-application.properties 配置檔案,壓縮加入到 atlas-plugin-classloader-2.0.0.jar 中
```java
#必須在此路徑打包,才能打到第一級目錄下
cd /usr/local/src/atlas/apache-atlas-2.1.0/conf
zip -u /usr/local/src/atlas/apache-atlas-2.1.0/hook/hive/atlas-plugin-classloader-2.1.0.jar atlas-application.properties
```
- 修改 hive-site.xml
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505203647134-1118916738.png)
```xml
hive.exec.post.hooks
org.apache.atlas.hive.hook.HiveHook
```
- 修改 hive-env.sh 的 Gateway 客戶端環境高階配置程式碼段(安全閥)
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200516172231145-509856612.png)
```java
HIVE_AUX_JARS_PATH=/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive
```
- 修改 HIVE_AUX_JARS_PATH
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200508221245626-1522493547.png)
- 修改 hive-site.xml 的 HiveServer2 高階配置程式碼段(安全閥)
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200516172017283-1084600594.png)
```java
hive.exec.post.hooks
org.apache.atlas.hive.hook.HiveHook
hive.reloadable.aux.jars.path
/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive
```
- 修改 HiveServer2 環境高階配置程式碼段
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200516172042491-1270046401.png)
```java
HIVE_AUX_JARS_PATH=/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive
```
- 將配置好的Atlas包發往各個hive節點後重啟叢集
```java
scp -r /usr/local/src/atlas/apache-atlas-2.1.0 [email protected]:/usr/local/src/atlas/
scp -r /usr/local/src/atlas/apache-atlas-2.1.0 [email protected]:/usr/local/src/atlas/
```
> 更新配置重啟叢集
- 將atlas配置檔案copy到/etc/hive/conf下(叢集各個節點)
```java
scp /usr/local/src/atlas/apache-atlas-2.1.0/conf/atlas-application.properties [email protected]:/etc/hive/conf
scp /usr/local/src/atlas/apache-atlas-2.1.0/conf/atlas-application.properties [email protected]:/etc/hive/conf
scp /usr/local/src/atlas/apache-atlas-2.1.0/conf/atlas-application.properties [email protected]:/etc/hive/conf
```
### 2.12 再次啟動 Atlas
```java
#啟動
./bin/atlas_start.py
#停止:./bin/atlas_stop.py
```
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505210349074-368340091.png)
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505210612356-131502284.png)
> 注意監控日誌,看是否報錯。主要日誌application.log
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200508223553755-843949455.png)
### 2.13 將 Hive 元資料匯入 Atlas
- atlas節點新增hive環境變數
```java
vim /etc/profile
#>>>
#hive
export HIVE_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hive
export HIVE_CONF_DIR=/etc/hive/conf
export PATH=$HIVE_HOME/bin:$PATH
#<<<
source /etc/profile
```
- 執行atlas指令碼
```java
./bin/import-hive.sh
#輸入使用者名稱:admin;輸入密碼:admin(如修改請使用修改的)
```
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200508224442954-256592723.png)
![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200508224402902-11640994.png)
> 體驗