1. 程式人生 > >環境篇:Atlas2.1.0相容CDH6.3.2部署

環境篇:Atlas2.1.0相容CDH6.3.2部署

# 環境篇:Atlas2.1.0相容CDH6.3.2部署 ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505173158110-2132862529.png) > Atlas 是什麼? > Atlas是一組可擴充套件和可擴充套件的核心基礎治理服務,使企業能夠有效地滿足Hadoop中的合規性要求,並允許與整個企業資料生態系統整合。 > > Apache Atlas為組織提供了開放的元資料管理和治理功能,以建立其資料資產的目錄,對這些資產進行分類和治理,併為資料科學家,分析師和資料治理團隊提供圍繞這些資料資產的協作功能。 > 如果沒有Atlas > > 大資料表依賴問題不好解決,元資料管理需要自行開發,如:hive血緣依賴圖 > > 對於表依賴問題,沒有一個可以查詢的工具,不方便錯誤定位,即業務sql開發 - 官網:[http://atlas.apache.org](http://atlas.apache.org/) - 表與表之間的血緣依賴 ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505173428100-1858217094.png) - 欄位與欄位之間的血緣依賴 ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505173445712-2066238416.png) ## 1 Atlas 架構原理 ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505173620236-637016777.png) ## 2 Atlas 安裝及使用 > 安裝需要元件,HDFS、Yarn、Zookeeper、Kafka、Hbase、Solr、Hive,Python2.7環境 > > 需要Maven3.5.0以上,jdk_151以上,python2.7。 ### 2.1 下載原始碼包2.0.0,IDEA開啟 - 因與CDH整合,修改`pom`檔案 - 在repositories標籤中增加CDH倉庫 ```xml cloudera
https://repository.cloudera.com/artifactory/cloudera-repos true false
``` ### 2.2 修改相關版本與CDH版本對應 ```java 7.4.0 3.0.0-cdh6.3.2 2.1.0-cdh6.3.2 7.4.0-cdh6.3.2 2.1.1-cdh6.3.2 2.2.1-cdh6.3.2 2.11 3.4.5-cdh6.3.2 1.4.7-cdh6.3.2 ``` ### 2.3 相容Hive2.1.1 - 所需修改的專案位置:`apache-atlas-sources-2.1.0\addons\hive-bridge` ①.org/apache/atlas/hive/bridge/HiveMetaStoreBridge.java 577行 ```java String catalogName = hiveDB.getCatalogName() != null ? hiveDB.getCatalogName().toLowerCase() : null; ``` 改為: ```java String catalogName = null; ``` ②.org/apache/atlas/hive/hook/AtlasHiveHookContext.java 81行 ```java this.metastoreHandler = (listenerEvent != null) ? metastoreEvent.getIHMSHandler() : null; ``` 改為:C:\Users\Heaton\Desktop\apache-atlas-2.1.0-sources\apache-atlas-sources-2.1.0\addons ```java this.metastoreHandler = null; ``` ### 2.3 編譯 ``` mvn clean -DskipTests package -Pdist -X -T 8 ``` - 編譯完成的檔案在此目錄`apache-atlas-sources-2.1.0\distro\target` ![](https://img2020.cnblogs.com/blog/1235870/202012/1235870-20201216135706379-224518859.png) ![](https://img2020.cnblogs.com/blog/1235870/202012/1235870-20201216135715526-2016758261.png) ### 2.5 安裝 ``` mkdir /usr/local/src/atlas cd /usr/local/src/atlas #複製apache-atlas-2.1.0-bin.tar.gz到安裝目錄 tar -zxvf apache-atlas-2.1.0-bin.tar.gz cd apache-atlas-2.1.0/ ``` ### 2.6 修改配置檔案 >
vim conf\atlas-application.properties ```java #整合修改hbase配置 atlas.graph.storage.hostname=cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 #整合修改solr配置 atlas.graph.index.search.solr.zookeeper-url=cdh01.cm:2181/solr,cdh02.cm:2181/solr,cdh03.cm:2181/solr #整合修改kafka配置 atlas.notification.embedded=false #false外接的kafka atlas.kafka.zookeeper.connect=cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 atlas.kafka.bootstrap.servers=cdh01.cm:9092,cdh02.cm:9092,cdh03.cm:9092 atlas.kafka.zookeeper.session.timeout.ms=60000 atlas.kafka.zookeeper.connection.timeout.ms=30000 atlas.kafka.enable.auto.commit=true #整合修改其他配置 atlas.rest.address=http://cdh01.cm:21000 #訪問地址埠,此值修改不生效,預設本地21000埠,此埠和impala衝突 atlas.server.run.setup.on.start=false #如果啟用並設定為true,則在伺服器啟動時將執行安裝步驟 atlas.audit.hbase.zookeeper.quorum=cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 #整合新增hive鉤子配置(檔案最下面即可) #在hive中做任何操作,都會被鉤子所感應到,並生成相應的事件發往atlas所訂閱的kafka-topic,再由atlas進行元資料生成和儲存管理 ######### Hive Hook Configs ####### atlas.hook.hive.synchronous=false atlas.hook.hive.numRetries=3 atlas.hook.hive.queueSize=10000 atlas.cluster.name=primary #配置使用者名稱密碼(選做) #開啟或關閉三種驗證方法 atlas.authentication.method.kerberos=true|false atlas.authentication.method.ldap=true|false atlas.authentication.method.file=true #vim users-credentials.properties(修改該檔案) #>
>>原始檔 #username=group::sha256-password admin=ADMIN::8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918 rangertagsync=RANGER_TAG_SYNC::e3f67240f5117d1753c940dae9eea772d36ed5fe9bd9c94a300e40413f1afb9d #<<< #>>>修改成使用者名稱bigdata123,密碼bigdata123 #username=group::sha256-password bigdata123=ADMIN::aa0336d976ba6db36f33f75a20f68dd9035b1e0e2315c331c95c2dc19b2aac13 rangertagsync=RANGER_TAG_SYNC::e3f67240f5117d1753c940dae9eea772d36ed5fe9bd9c94a300e40413f1afb9d #<<< #計算sha256:echo -n "bigdata123"|sha256sum ``` > vim conf/atlas-env.sh ```java #整合新增hbase配置->下面的目錄為atlas下的hbase配置目錄,需要後面加入叢集hbase配置 export HBASE_CONF_DIR=/usr/local/src/atlas/apache-atlas-2.1.0/conf/hbase/conf #export HBASE_CONF_DIR=/etc/hbase/conf------------------------------ export MANAGE_LOCAL_HBASE=false (false外接的zk和hbase) export MANAGE_LOCAL_SOLR=false (false外接的solr) #修改記憶體指標(根據線上機器配置) export ATLAS_SERVER_OPTS="-server -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+PrintTenuringDistribution -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=dumps/atlas_server.hprof -Xloggc:logs/gc-worker.log -verbose:gc -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps" #優化 JDK1.8(以下需要16G記憶體) export ATLAS_SERVER_HEAP="-Xms15360m -Xmx15360m -XX:MaxNewSize=5120m -XX:MetaspaceSize=100M -XX:MaxMetaspaceSize=512m" ``` > vim conf/atlas-log4j.xml ```java #去掉如下程式碼的註釋(開啟如下程式碼) ``` ### 2.7 整合Hbase - 新增hbase叢集配置檔案到apache-atlas-2.0.0/conf/hbase下(這裡連線的路徑需要和上面atlas-env.sh配置中一樣) ```ruby ln -s /etc/hbase/conf/ /usr/local/src/atlas/apache-atlas-2.1.0/conf/hbase/ ``` ### 2.8 整合Solr - 將apache-atlas-2.1.0/conf/solr檔案拷貝到solr所有節點的安裝目錄下,更名為`atlas-solr` ```bash scp -r /usr/local/src/atlas/apache-atlas-2.1.0/conf/solr [email protected]:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/ scp -r /usr/local/src/atlas/apache-atlas-2.1.0/conf/solr [email protected]:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/ scp -r /usr/local/src/atlas/apache-atlas-2.1.0/conf/solr [email protected]:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/ #在solr節點 cd /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/ mv solr/ atlas-solr #在任意solr節點修改solr對應的bash vi /etc/passwd /sbin/nologin 修改為 /bin/bash #切換solr使用者執行 su solr /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c vertex_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2 /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c edge_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2 /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/bin/solr create -c fulltext_index -d /opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/solr/atlas-solr -shards 3 -replicationFactor 2 #如果建立錯誤,可使用 /opt/cloudera/parcels/CDH/lib/solr/bin/solr delete -c ${collection_name} 刪除 #切換root使用者繼續配置其他 su root ``` - solr web控制檯: http://cdh01.cm:8983 驗證是否啟動成功 ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200508215733689-1113328901.png) ### 2.9 整合kafka - 建立kafka-topic ```ruby kafka-topics --zookeeper cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 --create --replication-factor 3 --partitions 3 --topic _HOATLASOK kafka-topics --zookeeper cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_ENTITIES kafka-topics --zookeeper cdh01.cm:2181,cdh02.cm:2181,cdh03.cm:2181 --create --replication-factor 3 --partitions 3 --topic ATLAS_HOOK ``` ### 2.10 啟動測試 ```java cd /usr/local/src/atlas/apache-atlas-2.1.0/ ./bin/atlas_start.py #停止:./bin/atlas_stop.py ``` - http://cdh01.cm:21000 - 預設使用者名稱和密碼為:admin ### 2.11 整合Hive - 將 atlas-application.properties 配置檔案,壓縮加入到 atlas-plugin-classloader-2.0.0.jar 中 ```java #必須在此路徑打包,才能打到第一級目錄下 cd /usr/local/src/atlas/apache-atlas-2.1.0/conf zip -u /usr/local/src/atlas/apache-atlas-2.1.0/hook/hive/atlas-plugin-classloader-2.1.0.jar atlas-application.properties ``` - 修改 hive-site.xml ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505203647134-1118916738.png) ```xml hive.exec.post.hooks org.apache.atlas.hive.hook.HiveHook ``` - 修改 hive-env.sh 的 Gateway 客戶端環境高階配置程式碼段(安全閥) ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200516172231145-509856612.png) ```java HIVE_AUX_JARS_PATH=/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive ``` - 修改 HIVE_AUX_JARS_PATH ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200508221245626-1522493547.png) - 修改 hive-site.xml 的 HiveServer2 高階配置程式碼段(安全閥) ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200516172017283-1084600594.png) ```java hive.exec.post.hooks org.apache.atlas.hive.hook.HiveHook hive.reloadable.aux.jars.path /usr/local/src/atlas/apache-atlas-2.1.0/hook/hive ``` - 修改 HiveServer2 環境高階配置程式碼段 ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200516172042491-1270046401.png) ```java HIVE_AUX_JARS_PATH=/usr/local/src/atlas/apache-atlas-2.1.0/hook/hive ``` - 將配置好的Atlas包發往各個hive節點後重啟叢集 ```java scp -r /usr/local/src/atlas/apache-atlas-2.1.0 [email protected]:/usr/local/src/atlas/ scp -r /usr/local/src/atlas/apache-atlas-2.1.0 [email protected]:/usr/local/src/atlas/ ``` > 更新配置重啟叢集 - 將atlas配置檔案copy到/etc/hive/conf下(叢集各個節點) ```java scp /usr/local/src/atlas/apache-atlas-2.1.0/conf/atlas-application.properties [email protected]:/etc/hive/conf scp /usr/local/src/atlas/apache-atlas-2.1.0/conf/atlas-application.properties [email protected]:/etc/hive/conf scp /usr/local/src/atlas/apache-atlas-2.1.0/conf/atlas-application.properties [email protected]:/etc/hive/conf ``` ### 2.12 再次啟動 Atlas ```java #啟動 ./bin/atlas_start.py #停止:./bin/atlas_stop.py ``` ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505210349074-368340091.png) ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200505210612356-131502284.png) > 注意監控日誌,看是否報錯。主要日誌application.log ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200508223553755-843949455.png) ### 2.13 將 Hive 元資料匯入 Atlas - atlas節點新增hive環境變數 ```java vim /etc/profile #>>> #hive export HIVE_HOME=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hive export HIVE_CONF_DIR=/etc/hive/conf export PATH=$HIVE_HOME/bin:$PATH #<<< source /etc/profile ``` - 執行atlas指令碼 ```java ./bin/import-hive.sh #輸入使用者名稱:admin;輸入密碼:admin(如修改請使用修改的) ``` ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200508224442954-256592723.png) ![img](https://img2020.cnblogs.com/blog/1235870/202005/1235870-20200508224402902-11640994.png) > 體驗