1. 程式人生 > 其它 >Apache Atlas 安裝,配置 HiveHook

Apache Atlas 安裝,配置 HiveHook

Apache Atlas 安裝,配置 HiveHook

下載原始碼

Apache Atlas 官網只提供原始碼包下載: Download

直接下載最新版本: 2.2.0

venn@venn git % wget https://downloads.apache.org/atlas/2.2.0/apache-atlas-2.2.0-sources.tar.gz  # 下載 
venn@venn git % tar -zxvf apache-atlas-2.2.0-sources.tar.gz # 解壓
venn@venn git % ls
apache-atlas-2.2.0-sources.tar.gz
apache-atlas-sources-2.2.0                         

編譯

編譯參考官網: 構建和安裝 Apache Atlas

基於 Atlas 對各種元件的依賴,編譯提供了很多嵌入式的元件,比如: Hbase、Solr 等

直接編譯 “使用嵌入式 Apache HBase 和 Apache Solr 打包 Apache Atlas”

mvn clean -DskipTests package -Pdist,embedded-hbase-solr

....

[INFO] Reactor Summary for apache-atlas 2.2.0:
[INFO] 
[INFO] Apache Atlas Server Build Tools .................... SUCCESS [  1.190 s]
[INFO] apache-atlas ....................................... SUCCESS [  4.594 s]
[INFO] Apache Atlas Integration ........................... SUCCESS [ 12.596 s]
[INFO] Apache Atlas Test Utility Tools .................... SUCCESS [  3.830 s]
[INFO] Apache Atlas Common ................................ SUCCESS [  2.827 s]
[INFO] Apache Atlas Client ................................ SUCCESS [  0.331 s]
[INFO] atlas-client-common ................................ SUCCESS [  1.281 s]
[INFO] atlas-client-v1 .................................... SUCCESS [  1.968 s]
[INFO] Apache Atlas Server API ............................ SUCCESS [  1.484 s]
[INFO] Apache Atlas Notification .......................... SUCCESS [  3.790 s]
[INFO] atlas-client-v2 .................................... SUCCESS [  0.969 s]
[INFO] Apache Atlas Graph Database Projects ............... SUCCESS [  0.198 s]
[INFO] Apache Atlas Graph Database API .................... SUCCESS [  1.180 s]
[INFO] Graph Database Common Code ......................... SUCCESS [  1.120 s]
[INFO] Apache Atlas JanusGraph-HBase2 Module .............. SUCCESS [  0.988 s]
[INFO] Apache Atlas JanusGraph DB Impl .................... SUCCESS [  5.341 s]
[INFO] Apache Atlas Graph DB Dependencies ................. SUCCESS [  1.406 s]
[INFO] Apache Atlas Authorization ......................... SUCCESS [  1.509 s]
[INFO] Apache Atlas Repository ............................ SUCCESS [  9.290 s]
[INFO] Apache Atlas UI .................................... SUCCESS [ 22.949 s]
[INFO] Apache Atlas New UI ................................ SUCCESS [ 22.439 s]
[INFO] Apache Atlas Web Application ....................... SUCCESS [ 51.105 s]
[INFO] Apache Atlas Documentation ......................... SUCCESS [  0.996 s]
[INFO] Apache Atlas FileSystem Model ...................... SUCCESS [  1.768 s]
[INFO] Apache Atlas Plugin Classloader .................... SUCCESS [  0.791 s]
[INFO] Apache Atlas Hive Bridge Shim ...................... SUCCESS [  1.902 s]
[INFO] Apache Atlas Hive Bridge ........................... SUCCESS [  4.811 s]
[INFO] Apache Atlas Falcon Bridge Shim .................... SUCCESS [ 27.805 s]
[INFO] Apache Atlas Falcon Bridge ......................... SUCCESS [  3.164 s]
[INFO] Apache Atlas Sqoop Bridge Shim ..................... SUCCESS [  3.344 s]
[INFO] Apache Atlas Sqoop Bridge .......................... SUCCESS [  8.621 s]
[INFO] Apache Atlas Storm Bridge Shim ..................... SUCCESS [ 48.489 s]
[INFO] Apache Atlas Storm Bridge .......................... SUCCESS [  4.718 s]
[INFO] Apache Atlas Hbase Bridge Shim ..................... SUCCESS [  2.068 s]
[INFO] Apache Atlas Hbase Bridge .......................... SUCCESS [01:13 min]
[INFO] Apache HBase - Testing Util ........................ SUCCESS [  3.748 s]
[INFO] Apache Atlas Kafka Bridge .......................... SUCCESS [ 28.061 s]
[INFO] Apache Atlas classification updater ................ SUCCESS [  0.906 s]
[INFO] Apache Atlas index repair tool ..................... SUCCESS [  3.032 s]
[INFO] Apache Atlas Impala Hook API ....................... SUCCESS [  0.309 s]
[INFO] Apache Atlas Impala Bridge Shim .................... SUCCESS [  0.348 s]
[INFO] Apache Atlas Impala Bridge ......................... SUCCESS [  3.057 s]
[INFO] Apache Atlas Distribution .......................... SUCCESS [15:53 min]
[INFO] atlas-examples ..................................... SUCCESS [  0.386 s]
[INFO] sample-app ......................................... SUCCESS [  3.217 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  22:12 min
[INFO] Finished at: 2022-03-21T15:24:37+08:00
[INFO] ------------------------------------------------------------------------

  • 使用 Embedded-hbase-solr 配置檔案將配置 Apache Atlas,以便 Apache HBase 例項和 Apache Solr 例項將與 Apache Atlas 伺服器一起啟動和停止。

  • 注意:此分發配置檔案僅用於單節點開發而非生產。

編譯完成後包的路徑: apache-atlas-sources-2.2.0/distro/target,將生成好的安裝包 apache-atlas-2.1.0-server.tar.gz 拷貝到 /opt 下,解壓

venn@venn target % pwd
/Users/venn/git/apache-atlas-sources-2.2.0/distro/target
venn@venn target % ls
META-INF                                      apache-atlas-2.2.0-kafka-hook.tar.gz          hbase
antrun                                        apache-atlas-2.2.0-server.tar.gz              hbase.temp
apache-atlas-2.2.0-atlas-index-repair.zip     apache-atlas-2.2.0-sources.tar.gz             maven-archiver
apache-atlas-2.2.0-bin.tar.gz                 apache-atlas-2.2.0-sqoop-hook.tar.gz          maven-shared-archive-resources
apache-atlas-2.2.0-classification-updater.zip apache-atlas-2.2.0-storm-hook.tar.gz          rat.txt
apache-atlas-2.2.0-falcon-hook.tar.gz         archive-tmp                                   solr
apache-atlas-2.2.0-hbase-hook.tar.gz          atlas-distro-2.2.0.jar                        solr.temp
apache-atlas-2.2.0-hive-hook.tar.gz           bin                                           test-classes
apache-atlas-2.2.0-impala-hook.tar.gz         conf

venn@venn /opt % ls
apache-atlas-2.2.0              
apache-atlas-2.2.0-server.tar.gz

修改配置

進入conf目錄下:

vi  atlas-env.sh 

指定 JAVA_HOME (預設啟動內嵌 hbase/solr )

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home
export MANAGE_LOCAL_HBASE=true
export MANAGE_LOCAL_SOLR=true 

啟動 atlas

venn@venn atlas-2.22 % bin/atlas_start.py
venn@venn atlas-2.22 % bin/atlas_stop.py 
No process ID file found. Server not running?
venn@venn atlas-2.22 % bin/atlas_start.py 

Configured for local HBase.
Starting local HBase...
Local HBase started!

Configured for local Solr.
Starting local Solr...
Local Solr started!

Creating Solr collections for Atlas using config: /opt/atlas-2.22/conf/solr

Starting Atlas server on host: localhost
Starting Atlas server on port: 21000

Apache Atlas Server started!!!

啟動成功後,開啟 web 介面:

  • 使用者名稱、密碼: admin/admin

配置 hive hook

官網 HookHive

hive 版本: 3.1.2

  • 注:本來版本是 2.3.3,一直報包衝突,想編譯一個 hive 版本是 2.3.3 的 atlas,失敗了,就又安裝了一個 3.1.2 版本的 hive
  1. 在 hive-site.xml 中新增如下引數,設定 Atlas hook:
<property>
    <name>hive.exec.post.hooks</name>
    <value>org.apache.atlas.hive.hook.HiveHook</value>
</property>
  1. 解壓 apache-atlas-2.2.0-hive-hook.tar.gz
venn@venn target % pwd
/Users/venn/git/apache-atlas-sources-2.2.0/distro/target
venn@venn target % ls
META-INF                                      apache-atlas-2.2.0-kafka-hook.tar.gz          conf
antrun                                        apache-atlas-2.2.0-server.tar.gz              hbase
apache-atlas-2.2.0-atlas-index-repair.zip     apache-atlas-2.2.0-sources.tar.gz             hbase.temp
apache-atlas-2.2.0-bin.tar.gz                 apache-atlas-2.2.0-sqoop-hook.tar.gz          maven-archiver
apache-atlas-2.2.0-classification-updater.zip apache-atlas-2.2.0-storm-hook.tar.gz          maven-shared-archive-resources
apache-atlas-2.2.0-falcon-hook.tar.gz         apache-atlas-hive-hook-2.2.0                  rat.txt
apache-atlas-2.2.0-hbase-hook.tar.gz          archive-tmp                                   solr
apache-atlas-2.2.0-hive-hook.tar.gz           atlas-distro-2.2.0.jar                        solr.temp
apache-atlas-2.2.0-impala-hook.tar.gz         bin                                           test-classes

  1. 複製 apache-atlas-hive-hook-2.2.0/hook/hive to atlas 安裝目錄: /opt/atlas-2.2.0/hook/hive

  2. hive-env.sh HIVE_AUX_JARS_PATH 新增 atlas hive hook

export HIVE_AUX_JARS_PATH=/Library/Java/JavaVirtualMachines/jdk1.8.0_321.jdk/Contents/Home,/opt/atlas-2.22/hook/hive

  1. 複製 /opt/atlas-2.2.0/conf/atlas-application.properties 到 hive conf 目錄

初始化 hive 元資料到 atlas

複製 import-hive.sh 到 atlas bin 目錄

venn@venn hook-bin % pwd
/Users/venn/git/apache-atlas-sources-2.2.0/distro/target/apache-atlas-hive-hook-2.2.0/hook-bin
venn@venn hook-bin % ls
import-hive.sh
venn@venn hook-bin % cp import-hive.sh /opt/atlas-2.22/bin 
venn@venn hook-bin % ls /opt/atlas-2.22/bin
atlas_admin.py                   atlas_config.pyc                 atlas_start.py                   cputil.py                        quick_start_v1.py
atlas_client_cmdline.py          atlas_kafka_setup.py             atlas_stop.py                    import-hive.sh
atlas_config.py                  atlas_kafka_setup_hook.py        atlas_update_simple_auth_json.py quick_start.py
venn@venn hook-bin % 

venn@venn atlas-2.22 % sh bin/import-hive.sh                                                                                                           
Using Hive configuration directory [/opt/hive-3.1.2/conf]
Log file for import is /var/log/atlas/import-hive.log

...

2022-03-23T09:48:30,575 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-cache-size = 15000
2022-03-23T09:48:30,575 INFO [main] org.apache.atlas.ApplicationProperties - Property (set to default) atlas.graph.cache.tx-dirty-size = 120
Enter username for atlas :- admin
Enter password for atlas :- 
2022-03-23T09:48:34,444 INFO [main] org.apache.atlas.AtlasBaseClient - Client has only one service URL, will use that for all actions: http://localhost:21000
2022-03-23T09:48:34,483 INFO [main] org.apache.hadoop.hive.conf.HiveConf - Found configuration file file:/opt/hive-3.1.2/conf/hive-site.xml
2

...

2022-03-23T09:48:44,204 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - Created hive_db entity: name=default@primary, guid=1ceabd72-505f-4338-b9eb-c4e1511fd882
2022-03-23T09:48:44,247 INFO [main] org.apache.atlas.hive.bridge.HiveMetaStoreBridge - No tables to import in database default
Hive Meta Data imported successfully!!!

匯入成功,檢視 atlas 管理頁面:

動態載入 hive 元資料、血緣

create table as


hive> use atlas1;
OK
Time taken: 0.634 seconds
hive> show tables;
OK
tab_name
t_a
Time taken: 0.255 seconds, Fetched: 1 row(s)
hive> create table t_b as select * from t_a;
Query ID = venn_20220324151022_2b39072e-5137-4544-b77f-a616b5713314
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
2022-03-24 15:10:25,785 INFO  [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-03-24 15:10:26,049 INFO  [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Starting Job = job_1648105770329_0001, Tracking URL = http://venn.local:8088/proxy/application_1648105770329_0001/
Kill Command = /opt/hadoop-3.2.2/bin/mapred job  -kill job_1648105770329_0001
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2022-03-24 15:10:38,140 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1648105770329_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://venn:9000/user/hive/warehouse/atlas1.db/.hive-staging_hive_2022-03-24_15-10-22_500_6973722723439996636-1/-ext-10002
Moving data to directory hdfs://venn:9000/user/hive/warehouse/atlas1.db/t_b
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
aa
Time taken: 20.185 seconds

元資料:

表血緣:

欄位血緣:

insert into

hive> create table t_c(aa string);
OK
Time taken: 0.45 seconds
hive> insert into t_c select aa from t_b;
Query ID = venn_20220324151139_074245e5-efe8-4ae7-9893-7ec07d06f242
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
2022-03-24 15:11:39,848 INFO  [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2022-03-24 15:11:39,874 INFO  [692022c8-8306-4f56-83cb-17f671c866ce main] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
Starting Job = job_1648105770329_0002, Tracking URL = http://venn.local:8088/proxy/application_1648105770329_0002/
Kill Command = /opt/hadoop-3.2.2/bin/mapred job  -kill job_1648105770329_0002
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2022-03-24 15:11:48,811 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1648105770329_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://venn:9000/user/hive/warehouse/atlas1.db/t_c/.hive-staging_hive_2022-03-24_15-11-39_101_3895814859882010053-1/-ext-10000
Loading data to table atlas1.t_c
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
aa
Time taken: 13.734 seconds

表血緣:

欄位血緣:

歡迎關注Flink菜鳥公眾號,會不定期更新Flink(開發技術)相關的推文