Apache Atlas 安裝,配置 HiveHook
Apache Atlas 安裝,配置 HiveHook
下載原始碼
Apache Atlas 官網只提供原始碼包下載: Download
直接下載最新版本: 2.2.0
venn@venn git % wget https://downloads.apache.org/atlas/2.2.0/apache-atlas-2.2.0-sources.tar.gz # 下載 venn@venn git % tar -zxvf apache-atlas-2.2.0-sources.tar.gz # 解壓 venn@venn git % ls apache-atlas-2.2.0-sources.tar.gz apache-atlas-sources-2.2.0
編譯
編譯參考官網: 構建和安裝 Apache Atlas
基於 Atlas 對各種元件的依賴,編譯提供了很多嵌入式的元件,比如: Hbase、Solr 等
直接編譯 “使用嵌入式 Apache HBase 和 Apache Solr 打包 Apache Atlas”
mvn clean -DskipTests package -Pdist,embedded-hbase-solr .... [INFO] Reactor Summary for apache-atlas 2.2.0: [INFO] [INFO] Apache Atlas Server Build Tools .................... SUCCESS [ 1.190 s] [INFO] apache-atlas ....................................... SUCCESS [ 4.594 s] [INFO] Apache Atlas Integration ........................... SUCCESS [ 12.596 s] [INFO] Apache Atlas Test Utility Tools .................... SUCCESS [ 3.830 s] [INFO] Apache Atlas Common ................................ SUCCESS [ 2.827 s] [INFO] Apache Atlas Client ................................ SUCCESS [ 0.331 s] [INFO] atlas-client-common ................................ SUCCESS [ 1.281 s] [INFO] atlas-client-v1 .................................... SUCCESS [ 1.968 s] [INFO] Apache Atlas Server API ............................ SUCCESS [ 1.484 s] [INFO] Apache Atlas Notification .......................... SUCCESS [ 3.790 s] [INFO] atlas-client-v2 .................................... SUCCESS [ 0.969 s] [INFO] Apache Atlas Graph Database Projects ............... SUCCESS [ 0.198 s] [INFO] Apache Atlas Graph Database API .................... SUCCESS [ 1.180 s] [INFO] Graph Database Common Code ......................... SUCCESS [ 1.120 s] [INFO] Apache Atlas JanusGraph-HBase2 Module .............. SUCCESS [ 0.988 s] [INFO] Apache Atlas JanusGraph DB Impl .................... SUCCESS [ 5.341 s] [INFO] Apache Atlas Graph DB Dependencies ................. SUCCESS [ 1.406 s] [INFO] Apache Atlas Authorization ......................... SUCCESS [ 1.509 s] [INFO] Apache Atlas Repository ............................ SUCCESS [ 9.290 s] [INFO] Apache Atlas UI .................................... SUCCESS [ 22.949 s] [INFO] Apache Atlas New UI ................................ SUCCESS [ 22.439 s] [INFO] Apache Atlas Web Application ....................... SUCCESS [ 51.105 s] [INFO] Apache Atlas Documentation ......................... SUCCESS [ 0.996 s] [INFO] Apache Atlas FileSystem Model ...................... SUCCESS [ 1.768 s] [INFO] Apache Atlas Plugin Classloader .................... SUCCESS [ 0.791 s] [INFO] Apache Atlas Hive Bridge Shim ...................... SUCCESS [ 1.902 s] [INFO] Apache Atlas Hive Bridge ........................... SUCCESS [ 4.811 s] [INFO] Apache Atlas Falcon Bridge Shim .................... SUCCESS [ 27.805 s] [INFO] Apache Atlas Falcon Bridge ......................... SUCCESS [ 3.164 s] [INFO] Apache Atlas Sqoop Bridge Shim ..................... SUCCESS [ 3.344 s] [INFO] Apache Atlas Sqoop Bridge .......................... SUCCESS [ 8.621 s] [INFO] Apache Atlas Storm Bridge Shim ..................... SUCCESS [ 48.489 s] [INFO] Apache Atlas Storm Bridge .......................... SUCCESS [ 4.718 s] [INFO] Apache Atlas Hbase Bridge Shim ..................... SUCCESS [ 2.068 s] [INFO] Apache Atlas Hbase Bridge .......................... SUCCESS [01:13 min] [INFO] Apache HBase - Testing Util ........................ SUCCESS [ 3.748 s] [INFO] Apache Atlas Kafka Bridge .......................... SUCCESS [ 28.061 s] [INFO] Apache Atlas classification updater ................ SUCCESS [ 0.906 s] [INFO] Apache Atlas index repair tool ..................... SUCCESS [ 3.032 s] [INFO] Apache Atlas Impala Hook API ....................... SUCCESS [ 0.309 s] [INFO] Apache Atlas Impala Bridge Shim .................... SUCCESS [ 0.348 s] [INFO] Apache Atlas Impala Bridge ......................... SUCCESS [ 3.057 s] [INFO] Apache Atlas Distribution .......................... SUCCESS [15:53 min] [INFO] atlas-examples ..................................... SUCCESS [ 0.386 s] [INFO] sample-app ......................................... SUCCESS [ 3.217 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 22:12 min [INFO] Finished at: 2022-03-21T15:24:37+08:00 [INFO] ------------------------------------------------------------------------
-
使用 Embedded-hbase-solr 配置檔案將配置 Apache Atlas,以便 Apache HBase 例項和 Apache Solr 例項將與 Apache Atlas 伺服器一起啟動和停止。
-
注意:此分發配置檔案僅用於單節點開發而非生產。
編譯完成後包的路徑: apache-atlas-sources-2.2.0/distro/target,將生成好的安裝包 apache-atlas-2.1.0-server.tar.gz 拷貝到 /opt 下,解壓
venn@venn target % pwd /Users/venn/git/apache-atlas-sources-2.2.0/distro/target venn@venn target % ls META-INF apache-atlas-2.2.0-kafka-hook.tar.gz hbase antrun apache-atlas-2.2.0-server.tar.gz hbase.temp apache-atlas-2.2.0-atlas-index-repair.zip apache-atlas-2.2.0-sources.tar.gz maven-archiver apache-atlas-2.2.0-bin.tar.gz apache-atlas-2.2.0-sqoop-hook.tar.gz maven-shared-archive-resources apache-atlas-2.2.0-classification-updater.zip apache-atlas-2.2.0-storm-hook.tar.gz rat.txt apache-atlas-2.2.0-falcon-hook.tar.gz archive-tmp solr apache-atlas-2.2.0-hbase-hook.tar.gz atlas-distro-2.2.0.jar solr.temp apache-atlas-2.2.0-hive-hook.tar.gz bin test-classes apache-atlas-2.2.0-impala-hook.tar.gz conf venn@venn /opt % ls apache-atlas-2.2.0 apache-atlas-2.2.0-server.tar.gz
修改配置
進入conf目錄下:
vi atlas-env.sh
指定 JAVA_HOME (預設啟動內嵌 hbase/solr )
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home
export MANAGE_LOCAL_HBASE=true
export MANAGE_LOCAL_SOLR=true
啟動 atlas
venn@venn atlas-2.22 % bin/atlas_start.py
venn@venn atlas-2.22 % bin/atlas_stop.py
No process ID file found. Server not running?
venn@venn atlas-2.22 % bin/atlas_start.py
Configured for local HBase.
Starting local HBase...
Local HBase started!
Configured for local Solr.
Starting local Solr...
Local Solr started!
Creating Solr collections for Atlas using config: /opt/atlas-2.22/conf/solr
Starting Atlas server on host: localhost
Starting Atlas server on port: 21000
Apache Atlas Server started!!!
啟動成功後,開啟 web 介面:
- 使用者名稱、密碼: admin/admin
配置 hive hook
官網 HookHive
hive 版本: 3.1.2
- 注:本來版本是 2.3.3,一直報包衝突,想編譯一個 hive 版本是 2.3.3 的 atlas,失敗了,就又安裝了一個 3.1.2 版本的 hive
- 在 hive-site.xml 中新增如下引數,設定 Atlas hook:
<property>
<name>hive.exec.post.hooks</name>
<value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
- 解壓 apache-atlas-2.2.0-hive-hook.tar.gz
venn@venn target % pwd
/Users/venn/git/apache-atlas-sources-2.2.0/distro/target
venn@venn target % ls
META-INF apache-atlas-2.2.0-kafka-hook.tar.gz conf
antrun apache-atlas-2.2.0-server.tar.gz hbase
apache-atlas-2.2.0-atlas-index-repair.zip apache-atlas-2.2.0-sources.tar.gz hbase.temp
apache-atlas-2.2.0-bin.tar.gz apache-atlas-2.2.0-sqoop-hook.tar.gz maven-archiver
apache-atlas-2.2.0-classification-updater.zip apache-atlas-2.2.0-storm-hook.tar.gz maven-shared-archive-resources
apache-atlas-2.2.0-falcon-hook.tar.gz apache-atlas-hive-hook-2.2.0 rat.txt
apache-atlas-2.2.0-hbase-hook.tar.gz archive-tmp solr
apache-atlas-2.2.0-hive-hook.tar.gz atlas-distro-2.2.0.jar solr.temp
apache-atlas-2.2.0-impala-hook.tar.gz bin test-classes
-
複製 apache-atlas-hive-hook-2.2.0/hook/hive to atlas 安裝目錄: /opt/atlas-2.2.0/hook/hive
-
hive-env.sh HIVE_AUX_JARS_PATH 新增 atlas hive hook
export HIVE_AUX_JARS_PATH=/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home,/opt/atlas-2.22/hook/hive
- 複製 /opt/atlas-2.2.0/conf/atlas-application.properties 到 hive conf 目錄
初始化 hive 元資料到 atlas
複製 import-hive.sh 到 atlas bin 目錄
venn@venn hook-bin % pwd
/Users/venn/git/apache-atlas-sources-2.2.0/distro/target/apache-atlas-hive-hook-2.2.0/hook-bin
venn@venn hook-bin % ls
import-hive.sh
venn@venn hook-bin % cp import-hive.sh /opt/atlas-2.22/bin
venn@venn hook-bin % ls /opt/atlas-2.22/bin
atlas_admin.py atlas_config.pyc atlas_start.py cputil.py quick_start_v1.py
atlas_client_cmdline.py atlas_kafka_setup.py atlas_stop.py import-hive.sh
atlas_config.py atlas_kafka_setup_hook.py atlas_update_simple_auth_json.py quick_start.py
venn@venn hook-bin %
venn@venn atlas-2.22 % sh bin/import-hive.sh
Using Hive configuration directory [/opt/hive-3.1.2/conf]
Log file for import is /var/log/atlas/import-hive.log
...
2022-03-23T09:48:30,575 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-cache-size = 15000
2022-03-23T09:48:30,575 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-dirty-size = 120
Enter username for atlas :- admin
Enter password for atlas :-
2022-03-23T09:48:34,444 INFO [main] org.apache.atlas.AtlasBaseClient - Client has only one service URL, will use that for all actions: http://localhost:21000
2022-03-23T09:48:34,483 INFO [main] org.apache.hadoop.hive.conf.HiveConf - Found configuration file file:/opt/hive-3.1.2/conf/hive-site.xml
2
...
2022-03-23T09:48:44,204 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Created hive_db entity: name=default@primary, guid=1ceabd72-505f-4338-b9eb-c4e1511fd882
2022-03-23T09:48:44,247 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - No tables to import in database default
Hive Meta Data imported successfully!!!
匯入成功,檢視 atlas 管理頁面:
動態載入 hive 元資料、血緣
create table as
hive> use atlas1;
OK
Time taken: 0.634 seconds
hive> show tables;
OK
tab_name
t_a
Time taken: 0.255 seconds, Fetched: 1 row(s)
hive> create table t_b as select * from t_a;
Query ID = venn_20220324151022_2b39072e-5137-4544-b77f-a616b5713314
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
2022-03-24 15:10:25,785 INFO [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-03-24 15:10:26,049 INFO [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Starting Job = job_1648105770329_0001, Tracking URL = http://venn.local:8088/proxy/application_1648105770329_0001/
Kill Command = /opt/hadoop-3.2.2/bin/mapred job -kill job_1648105770329_0001
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2022-03-24 15:10:38,140 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1648105770329_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://venn:9000/user/hive/warehouse/atlas1.db/.hive-staging_hive_2022-03-24_15-10-22_500_6973722723439996636-1/-ext-10002
Moving data to directory hdfs://venn:9000/user/hive/warehouse/atlas1.db/t_b
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
aa
Time taken: 20.185 seconds
元資料:
表血緣:
欄位血緣:
insert into
hive> create table t_c(aa string);
OK
Time taken: 0.45 seconds
hive> insert into t_c select aa from t_b;
Query ID = venn_20220324151139_074245e5-efe8-4ae7-9893-7ec07d06f242
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
2022-03-24 15:11:39,848 INFO [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-03-24 15:11:39,874 INFO [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Starting Job = job_1648105770329_0002, Tracking URL = http://venn.local:8088/proxy/application_1648105770329_0002/
Kill Command = /opt/hadoop-3.2.2/bin/mapred job -kill job_1648105770329_0002
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2022-03-24 15:11:48,811 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1648105770329_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://venn:9000/user/hive/warehouse/atlas1.db/t_c/.hive-staging_hive_2022-03-24_15-11-39_101_3895814859882010053-1/-ext-10000
Loading data to table atlas1.t_c
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
aa
Time taken: 13.734 seconds
表血緣:
欄位血緣:
歡迎關注Flink菜鳥公眾號,會不定期更新Flink(開發技術)相關的推文