1. 程式人生 > 實用技巧 >Apache Atlas 2.1.0編譯部署手冊

Apache Atlas 2.1.0編譯部署手冊

環境準備

元件版本

元件 部署版本 原始碼版本
os CentOS 7.6.1810 --
java 1.8.0_252 --
zookeeper 3.4.14 3.4.6
kafka 2.11-2.0.0 2.11-2.0.0
hadoop 3.1.1 3.1.1
hbase 2.0.2 2.0.2
solr 7.5.0 7.5.0
hive 3.1.0 3.1.0
atlas 2.1.0 2.1.0

角色分配

元件 n1
192.168.222.11
n2
192.168.222.12
n3
192.168.222.13
JDK
zookeeper
kafka
NameNode -- --
SecondaryNameNode -- --
MR JobHistory Server -- --
DataNode
ResourceManager -- --
NodeManager
hbase √(Master)
solr
hive -- --
MySQL -- --
atlas -- --

配置域名解析

在各節點 /etc/hosts 檔案中新增如下內容

192.168.222.11 n1

192.168.222.12 n2

192.168.222.13 n3


配置Maven

修改 conf/settings.xml 配置檔案如下內容

<!-- 修改Maven包存放路徑 -->
<localRepository>/home/atlas/maven_packages</localRepository>

<!-- 修改映象 -->
<mirror>
    <id>mirrorId</id>
    <mirrorOf>repositoryId</mirrorOf>
    <name>Human Readable Name for this Mirror.</name>
    <url>http://my.repository.com/repo/path</url>
</mirror>
    -->
<mirror>
    <id>alimaven</id>
    <name>aliyun maven</name>
    <url>https://maven.aliyun.com/repository/public</url>
    <mirrorOf>central</mirrorOf>
</mirror>
<mirror>
    <id>Central</id>
    <mirrorOf>central</mirrorOf>
    <name>Central Maven</name>
    <url>https://repo1.maven.org/maven2</url>
</mirror>

環境變數

export MAVEN_OPTS="-Xms4g -Xmx4g"
export MAVEN_HOME=/home/atlas/maven-3.6.3
export PATH=$MAVEN_HOME/bin:$PATH

配置SSH免密

  1. 在各節點執行ssh-keygen -t rsa,輸入三次回車完成配置
  2. 將n2、n3節點的/root/.ssh/id_rsa.pub複製到n1節點,並重命名成對應的節點名稱
    scp n2:/root/.ssh/id_rsa.pub /root/n2
    scp n3:/root/.ssh/id_rsa.pub /root/n3
    
  3. 在n1節點上,將所有節點的 id_rsa.pub 內容寫入至n1節點的 /root/.ssh/authorized_keys 檔案中
    cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys
    cat /root/n2 >> /root/.ssh/authorized_keys
    cat /root/n3 >> /root/.ssh/authorized_keys
    
  4. 在n1節點使用ssh登陸各節點(包含本機),填充 known_hosts 檔案
  5. 將n1節點上的 authorized_keys 和 known_hosts 複製到其餘各節點的 /root/.ssh/ 目錄中
    scp /root/.ssh/authorized_keys n2:/root/.ssh
    scp /root/.ssh/authorized_keys n3:/root/.ssh
    scp /root/.ssh/known_hosts n2:/root/.ssh
    scp /root/.ssh/known_hosts n3:/root/.ssh
    
    在每個節點測試免密碼登陸是否生效

配置時間同步

  1. 是執行 rpm -qa | grep chrony 檢查是否已經安裝chrony;若沒有,執行 yum -y install chrony 安裝
  2. vim /etc/chrony.conf 修改如下
  3. 同步各節點的 chrony.conf 配置
  4. 啟動chrony服務並設為開機啟動
    systemctl enable chronyd.service
    systemctl start chronyd.service
    systemctl status chronyd.service
    
  5. 檢查是否已經同步:timedatectl(NTP synchronized)

Java環境變數

export JAVA_HOME=/home/atlas/jdk8
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HOME/.local/bin:$HOME/bin
export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPAT

配置本地yum源

  1. 在n1節點建立 /etc/yum.repos.d/base.repo 檔案,並增加如下內容源到repo檔案中
    [Local_ISO]
    name=Loacal ISO
    baseurl=file:///mnt
    gpgcheck=0
    enabled=1
    
  2. 在n1節點執行mount /dev/sr0 /mnt掛載系統光碟到/mnt目錄
  3. 上傳 createrepo-0.9.9-28.el7.noarch.rpm 檔案到n1節點的 /root/files/ 中,並執行yum -y localinstall /root/files/createrepo-0.9.9-28.el7.noarch.rpm,所需要的兩個依賴包可以在系統光碟中找到
  4. 在c1節點建立/root/rpms路徑,將需要的rpm包都上傳到該路徑下
  5. 向 /etc/yum.repos.d/base.repo 檔案新增如下內容
    [Local_RPM]
    name=Loacal RPM
    baseurl=http://cm:10040/rpms
    gpgcheck=0
    enabled=1
    
  6. 在n1節點 /root 目錄中執行python -m SimpleHTTPServer 10040


編譯打包Atlas

編譯Atlas

mvn clean -DskipTests install -e

npm-6.13.7.tgz無法下載

自行下載 npm-6.13.7.tgz 後放入 /home/atlas/maven_packages/com/github/eirslett/npm/6.13.7/ 目錄,並重命名為 npm-6.13.7.tar.gz

提示資訊:Downloading http://registry.npmjs.org/npm/-/npm-6.13.7.tgz to /home/atlas/maven_packages/com/github/eirslett/npm/6.13.7/npm-6.13.7.tar.gz


node-sass無法安裝

在使用者home目錄下建立.npmrc,在該檔案內寫入國內映象源

registry=https://registry.npm.taobao.org/
sass_binary_site=https://npm.taobao.org/mirrors/node-sass
chromedriver_cdnurl=https://npm.taobao.org/mirrors/chromedriver
phantomjs_cdnurl=https://npm.taobao.org/mirrors/phantomjs
electron_mirror=https://npm.taobao.org/mirrors/electron

更多原因參見這裡


打包Atlas

# 不使用內建hbase和solr
mvn clean -DskipTests package -Pdist
# 使用內建hbase和solr
mvn clean -DskipTests package -Pdist,embedded-hbase-solr

打包完成後產生如下檔案


上傳編譯好的檔案

上傳 apache-atlas-2.1.0-server.tar.gz 檔案

tar -zxf apache-atlas-2.1.0-server.tar.gz
mv apache-atlas-2.1.0/ atlas-2.1.0/
cd atlas-2.1.0/


安裝必要元件

安裝Zookeeper-3.4.14

  1. 上傳 zookeeper-3.4.14.tar.gz 並解壓縮
  2. 建立 zookeeper-3.4.14/zkData 目錄
  3. 在 zookeeper-3.4.14/zkData 目錄中建立myid檔案
  4. 將 zookeeper-3.4.14 目錄分發到各節點
  5. 修改各節點 zookeeper-3.4.14/zkData/myid 的整數值,即節點編號,各節點唯一。
  6. 進入 zookeeper-3.4.14/conf 目錄,將zoo_sample.cfg重新命名為zoo.cfg
  7. 修改 zoo.cfg 的如下引數
    dataDir=/root/zookeeper-3.4.14/zkData
    server.1=n1:2888:3888
    server.2=n2:2888:3888
    server.3=n3:2888:3888
    
    server.A=B:C:D
    • A: 數字,表示第幾號伺服器。叢集模式下配置一個檔案myid,該檔案在dataDir目錄下,這個檔案裡面有一個數據就是A的值。Zookeeper啟動時讀取此檔案,拿到裡面的資料與zoo.cfg裡面的配置資訊比較從而判斷到底是哪個server。
    • B: 是這個伺服器的IP地址或域名
    • C: 是這個伺服器與叢集中的Leader伺服器交換資訊的埠
    • D: 執行選舉時伺服器間通訊埠
  8. 同步各節點的 zookeeper-3.4.14/conf/zoo.cfg 配置
  9. 啟停與狀態檢視,在各節點執行
    • 啟動:zookeeper-3.4.14/bin/zkServer.sh start
    • 停止:zookeeper-3.4.14/bin/zkServer.sh stop
    • 狀態:zookeeper-3.4.14/bin/zkServer.sh status

安裝kafka_2.11-2.0.0

  1. .bash_profile中增加如下變數
    export KAFKA_HOME=/root/kafka_2.11-2.0.0
    export PATH=$PATH:${KAFKA_HOME}/bin
    
  2. 建立 kafka_2.11-2.0.0/kfData 目錄,用於存放kafka資料
  3. 開啟config/server.properties,主要修改引數如下所示
    broker.id=1
    delete.topic.enable=true
    listeners=PLAINTEXT://:9092
    log.dirs=/root/kafka_2.11-2.0.0/kfData
    zookeeper.connect=n1:2181,n2:2181,n3:2181
    
    • broker.id:每個broker配置唯一的整數值
    • advertised.listeners:若只在內部使用kafka,則配置listeners即可。若需要內外網分開控制,則配置該引數
    • delete.topic.enable:允許刪除topic
    • log.dirs:kafka資料存放目錄
  4. 將 config/server.properties 檔案分發到各 broker,並修改 broker.id 的數值
  5. 在各節點執行 ./bin/kafka-server-start.sh -daemon ./config/ server.properties 啟動kafka。

安裝hadoop-3.1.1

  1. 配置系統環境變數
    .bash_profile中配置如下內容

    export HADOOP_HOME=/root/hadoop-3.1.1
    export PATH=$PATH:${HADOOP_HOME}/bin
    
  2. 核心配置檔案
    在 hadoop-3.1.1/etc/hadoop/core-site.xml 檔案中修改如下配置

    <!-- 指定HDFS中NameNode的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://n1:9000</value>
    </property>
    <!-- 指定Hadoop執行時產生檔案的儲存目錄 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/root/hadoop-3.1.1/data/tmp</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
    </property>
    
  3. HDFS配置檔案
    在 hadoop-3.1.1/etc/hadoop/hadoop-evn.sh 修改如下配置

    export JAVA_HOME=/root/jdk8
    export HDFS_NAMENODE_USER=root
    export HDFS_DATANODE_USER=root
    export HDFS_SECONDARYNAMENODE_USER=root
    export YARN_RESOURCEMANAGER_USER=root
    export YARN_NODEMANAGER_USER=root
    

    在 hadoop-3.1.1/etc/hadoop/hdfs-site.xml 修改如下配置

    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <!-- 指定Hadoop SecondaryNameNode -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>n3:50090</value>
    </property>
    <!-- NameNode本地存放namespace和transaction日誌路徑 -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/root/hadoop-3.1.1/data/namenode</value>
    </property>
    <!-- 32MB -->
    <property>
        <name>dfs.blocksize</name>
        <value>33554432</value>
    </property>
    <!-- DataNode本地存放路徑 -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/root/hadoop-3.1.1/data/datanode</value>
    </property>
    
  4. YARN配置檔案
    在 hadoop-3.1.1/etc/hadoop/yarn-site.xml 修改如下配置

    <!-- Reducer獲取資料的方式 -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!-- 指定Yarn的ResourceManager地址 -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>n2</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    
  5. MapReduce配置檔案
    在 hadoop-3.1.1/etc/hadoop/mapred-site.xml 修改如下配置

    <!-- 指定MR執行在Yarn上 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=/root/hadoop-3.1.1</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=/root/hadoop-3.1.1</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=/root/hadoop-3.1.1</value>
    </property>
    <!--jobhistory地址-->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>shucang-26:10020</value>
        <description>MapReduce JobHistory Server IPC host:port</description>
    </property>
    <!--通過瀏覽器訪問jobhistory的地址-->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>shucang-26:19888</value>
        <description>MapReduce JobHistory Server Web UI host:port</description>
    </property>
    <!--MapReduce作業執行完之後放到哪裡-->
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/job/history/done</value>
    </property>
    <!--正在執行中的放到哪-->
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/job/history/done_intermediate</value>
    </property>
    <!--每個Job Counter的數量-->
    <property>
        <name>mapreduce.job.counters.limit</name>
        <value>500</value>
    </property>
    <!--每個Map任務記憶體上限-->
    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>2048</value>
    </property>
    <!--每個Job Counter的數量,建議為mapreduce.map.memory.mb的80%-->
    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx1638m</value>
    </property>
    <!--每個Reduce任務記憶體上限-->
    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>2048</value>
    </property>
    <!--每個Job Counter的數量,建議為mapreduce.reduce.memory.mb的80%-->
    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx1638m</value>
    </property>
    
  6. workers配置檔案
    在hadoop-3.0.0/etc/hadoop/workers新增資料節點。該檔案中新增的內容結尾不允許有空格,檔案中不允許有空行。

    n1
    n2
    n3
    
  7. 將Hadoop分發到各節點

  8. 首次啟動進群需執行格式化

    hadoop-3.1.1/bin/hdfs namenode -format
    
  9. 在 n1 上執行/root/hadoop-3.1.1/sbin/start-dfs.sh啟動HDFS

  10. 在 n2 上執行/root/hadoop-3.1.1/sbin/start-yarn.sh啟動Yarn

  11. 在 n3 上執行/root/hadoop-3.1.1/bin/mapred --daemon start historyserver啟動MR Job History Server

  12. 執行如下命令測試HDFS和MapReduce

    hadoop fs -mkdir -p /tmp/input
    hadoop fs -put $HADOOP_HOME/README.txt /tmp/input
    export hadoop_version=`hadoop version | head -n 1 | awk '{print $2}'`
    hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-$hadoop_version.jar wordcount /tmp/input /tmp/output
    

安裝hbase-2.0.2

  1. 配置系統變數
    .bash_profile中配置下面的環境變數
    export HBASE_HOME=/root/hbase-2.0.2
    export PATH=$PATH:${HBASE_HOME}/bin
    
  2. hbase-env.sh 修改內容
    export JAVA_HOME=/root/jdk8
    # Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+
    # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
    # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
    export HBASE_MANAGES_ZK=false
    
  3. hbase-site.xml 修改內容
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://n1:9000/hbase</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <!-- 0.98後的新變動,之前版本沒有.port,預設埠是60000 -->
    <!-- 16000是預設值不配也可以,WEBUI埠是16010 -->
    <property>
        <name>hbase.master.port</name>
        <value>16000</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>n1,n2,n3</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>/root/zookeeper-3.4.14/zkData</value>
    </property>
    <property>
        <name>hbase.unsafe.stream.capability.enforce</name>
        <value>false</value>
    </property>
    
  4. regionservers 修改內容
    n1
    n2
    n3
    
  5. 將hbase分發到各節點
  6. 在各節點,軟連線hadoop配置檔案到hbase
    ln -s /root/hadoop-3.1.1/etc/hadoop/core-site.xml /root/hbase-2.0.2/conf/core-site.xml
    ln -s /root/hadoop-3.1.1/etc/hadoop/hdfs-site.xml /root/hbase-2.0.2/conf/hdfs-site.xml
    

安裝Solr-7.5.0

  1. 執行 tar -zxf solr-7.5.0.tgz
  2. 進入solr目錄,修改 bin/solr.in.sh 如下引數
    ZK_HOST="n1:2181,n2:2181,n3:2181"
    # 不同的節點配置不同的SOLR_HOST
    SOLR_HOST="n1"
    
  3. 將 /opt/solr 目錄分發到其他節點,並修改SOLR_HOST的值
  4. 在各節點 /etc/security/limits.conf 檔案中,新增如下內容,重啟後生效
    root    hard    nofile  65000
    root    soft    nofile  65000
    root    hard    nproc   65000
    root    soft    nproc   65000
    
  5. 在個節點執行 bin/solr start啟動solr
    /opt/solr/bin/solr start
    

MySQL-5.7.30

  1. 執行 rpm -qa | grep mariadb 檢查是否安裝了 mariadb。若存在則執行 rpm -e --nodeps xxx 進行刪除
  2. 將 mysql-5.7.26-1.el7.x86_64.rpm-bundle.tar 上傳至n1節點的 /root/rmps 目錄中,並解壓
  3. 執行createrepo -d /root/rpms/ && yum clean all
  4. 執行yum -y install mysql-community-server mysql-community-client
  5. 修改 /etc/my.cnf
    [mysqld]
    # Remove leading # and set to the amount of RAM for the most important data
    # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
    # innodb_buffer_pool_size = 128M
    #
    # Remove leading # to turn on a very important data integrity option: logging
    # changes to the binary log between backups.
    log_bin=/var/lib/mysql/mysql_binary_log
    #
    # Remove leading # to set options mainly useful for reporting servers.
    # The server defaults are faster for transactions and fast SELECTs.
    # Adjust sizes as needed, experiment to find the optimal values.
    # join_buffer_size = 128M
    # sort_buffer_size = 2M
    # read_rnd_buffer_size = 2M
    datadir=/var/lib/mysql
    socket=/var/lib/mysql/mysql.sock
    transaction-isolation = READ-COMMITTED
    # Disabling symbolic-links is recommended to prevent assorted security risks
    symbolic-links=0
    
    #In later versions of MySQL, if you enable the binary log and do not set
    ##a server_id, MySQL will not start. The server_id must be unique within
    ##the replicating group.
    server_id=1
    
    key_buffer_size = 32M
    max_allowed_packet = 32M
    thread_stack = 256K
    thread_cache_size = 64
    query_cache_limit = 8M
    query_cache_size = 64M
    query_cache_type = 1
    
    max_connections = 250
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
    character-set-server=utf8
    
    binlog_format = mixed
    read_buffer_size = 2M
    read_rnd_buffer_size = 16M
    sort_buffer_size = 8M
    join_buffer_size = 8M
    
    # InnoDB settings
    innodb_file_per_table = 1
    innodb_flush_log_at_trx_commit  = 2
    innodb_log_buffer_size = 64M
    innodb_buffer_pool_size = 4G
    innodb_thread_concurrency = 8
    innodb_flush_method = O_DIRECT
    innodb_log_file_size = 512M
    
    [mysqld_safe]
    log-error=/var/log/mysqld.log
    pid-file=/var/run/mysqld/mysqld.pid
    sql_mode=STRICT_ALL_TABLES
    
    [client]
    default-character-set=utf8
    
  6. 設定 MySQL 開機啟動
    systemctl enable mysqld.service
    systemctl start mysqld.service
    systemctl status mysqld.service
    
  7. 執行grep password /var/log/mysqld.log獲得初始密碼
  8. 執行mysql_secure_installation對MySQL做基礎配置
    Securing the MySQL server deployment.
    
    Enter password for user root: 輸入初始密碼
    
    The existing password for the user account root has expired. Please set a new password.
    
    New password: 輸入新密碼Root123!
    
    Re-enter new password: Root123!
    The 'validate_password' plugin is installed on the server.
    The subsequent steps will run with the existing configuration
    of the plugin.
    Using existing password for root.
    
    Estimated strength of the password: 100
    Change the password for root ? ((Press y|Y for Yes, any other key for No) : n
    Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : y
    By default, a MySQL installation has an anonymous user,
    allowing anyone to log into MySQL without having to have
    a user account created for them. This is intended only for
    testing, and to make the installation go a bit smoother.
    You should remove them before moving into a production
    environment.
    
    Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
    Success.
    
    
    Normally, root should only be allowed to connect from
    'localhost'. This ensures that someone cannot guess at
    the root password from the network.
    
    Disallow root login remotely? (Press y|Y for Yes, any other key for No) : y
    Success.
    
    By default, MySQL comes with a database named 'test' that
    anyone can access. This is also intended only for testing,
    and should be removed before moving into a production
    environment.
    
    
    Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y
    - Dropping test database...
    Success.
    
    - Removing privileges on test database...
    Success.
    
    Reloading the privilege tables will ensure that all changes
    made so far will take effect immediately.
    
    Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y
    Success.
    
    All done!
    
    登入MySQL,執行show variables like "%char%"; 檢查字符集是否為utf8

安裝Hive-3.1.0

  1. 配置系統變數
    .bash_profile中配置下面的環境變數

    export HIVE_HOME=/root/apache-hive-3.1.0-bin
    export PATH=$PATH:${HIVE_HOME}/bin
    
  2. 配置Hive環境變數
    在 apache-hive-3.1.0-bin/conf/hive-env.sh 檔案中修改如下內容

    HADOOP_HOME=${HADOOP_HOME}
    export HADOOP_HEAPSIZE=2048
    export HIVE_CONF_DIR=${HIVE_HOME}/conf
    
  3. 在MySQL建立庫及使用者

    CREATE DATABASE hive DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
    GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'Hive123!';
    flush privileges;
    
  4. 將 mysql-connector-java-5.1.47-bin.jar 拷貝至 apache-hive-3.1.0-bin/lib/ 目錄中

  5. 在 apache-hive-3.1.0-bin/conf/hive-site.xml 檔案中修改如下內容

    <property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/tmpdir</value>
    </property>
    <property>
        <name>system:user.name</name>
        <value>hive</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://n1:3306/hive?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEncoding=UTF-8&amp;useSSL=false</value>
        <description>
            JDBC connect string for a JDBC metastore.
            To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
            For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
        </description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
        <description>Username to use against metastore database</description>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>Hive123!</value>
        <description>password to use against metastore database</description>
    </property>
    <property>
        <name>hive.server2.authentication</name>
        <value>NONE</value>
        <description>
        Expects one of [nosasl, none, ldap, kerberos, pam, custom].
        Client authentication types.
            NONE: no authentication check
            LDAP: LDAP/AD based authentication
            KERBEROS: Kerberos/GSSAPI authentication
            CUSTOM: Custom authentication provider
                    (Use with property hive.server2.custom.authentication.class)
            PAM: Pluggable authentication module
            NOSASL:  Raw transport
        </description>
    </property>
    <!--這裡配置的使用者要求對inode="/tmp/hive" 有執行許可權-->
    <property>
        <name>hive.server2.thrift.client.user</name>
        <value>root</value>
        <description>Username to use against thrift client</description>
    </property>
    <property>
        <name>hive.server2.thrift.client.password</name>
        <value>Root23!</value>
        <description>Password to use against thrift client</description>
    </property>
    <property>
        <name>hive.metastore.db.type</name>
        <value>mysql</value>
        <description>
            Expects one of [derby, oracle, mysql, mssql, postgres].
            Type of database used by the metastore. Information schema &amp; JDBCStorageHandler depend on it.
        </description>
    </property>
    
  6. 執行 schematool -initSchema -dbType mysql 初始化MySQL

  7. 在MySQL的Hive庫中執行如下語句,避免Hive表、列、分割槽、索引等的中文註釋亂碼問題

    alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8;
    alter table TABLE_PARAMS modify column PARAM_VALUE mediumtext character set utf8;
    alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
    alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8;
    alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
    
  8. 執行 mkdir -p hive-3.1.0/logs

  9. 執行 cp hive-log4j2.properties.template hive-log4j2.properties ,並修改如下屬性

    property.hive.log.dir = /root/hive-3.1.0/logs
    
  10. 執行nohup hiveserver2 1>/dev/null 2>&1 & echo $! > /app/hive-3.1.0/logs/hiveserver2.pid啟動Hiveserver2

  11. 執行beeline -u jdbc:hive2://shucang-24:10000/default -n root -p Root123!啟動Beeline



配置Atlas

Atlas配置Solr

  1. 在 atlas-application.properties 中修改如下配置
    atlas.graph.index.search.backend=solr
    atlas.graph.index.search.solr.mode=cloud
    # ZK quorum setup for solr as comma separated value.
    atlas.graph.index.search.solr.zookeeper-url=n1:2181,n2:2181,n3:2181
    atlas.graph.index.search.solr.wait-searcher=true
    
  2. 將 atlas 的conf/solr目錄複製到各 solr server 節點的/root/solr-7.5.0目錄下,名重新命名為atlas_solr/
  3. 在 solr server 節點,建立collection
    ./solr create -c vertex_index -d /root/solr-7.5.0/atlas_solr -shards 1 -replicationFactor 3 -force
    ./solr create -c edge_index -d /root/solr-7.5.0/atlas_solr -shards 1 -replicationFactor 3 -force
    ./solr create -c fulltext_index -d /root/solr-7.5.0/atlas_solr -shards 1 -replicationFactor 3 -force
    
  4. 如需刪除 collection,請使用下面的語句,貼入瀏覽器位址列即可
    http://n1:8983/solr/admin/collections?action=DELETE&name=vertex_index
    http://n1:8983/solr/admin/collections?action=DELETE&name=edge_index
    http://n1:8983/solr/admin/collections?action=DELETE&name=fulltext_index
    

Atlas配置Hbase

  1. 在 atlas-2.1.0/conf/atlas-application.properties 中修改如下配置

    atlas.graph.storage.backend=hbase2
    atlas.graph.storage.hbase.table=atlas
    atlas.graph.storage.hostname=n1:2181,n2:2181,n3:2181
    
  2. 在 atlas-env.sh 中修改如下配置

    export HBASE_CONF_DIR=/root/hbase-2.0.2/conf
    
  3. 將hbase配置檔案複製到Atlas的 conf/hbase中

    cp /root/hbase-2.0.2/conf/* /root /atlas-2.1.0/conf/hbase/
    
  4. 刪除 core-site.xml 和 hdfs-site.xml 檔案,重新生成軟連線

    ln -s /root/hadoop-3.1.1/etc/hadoop/core-site.xml /root/atlas-2.1.0/conf/hbase/core-site.xml
    ln -s /root/hadoop-3.1.1/etc/hadoop/hdfs-site.xml /root/atlas-2.1.0/conf/hbase/hdfs-site.xml
    

Altas配置Kafka

  1. 在 atlas-application.properties 中修改如下配置

    atlas.notification.embedded=false
    atlas.kafka.data=/root/atlas-2.1.0/data/kafka
    atlas.kafka.zookeeper.connect=n1:2181,n2:2181,n3:2181
    atlas.kafka.bootstrap.servers=n1:9092,n2:9092,n3:9092
    atlas.kafka.zookeeper.session.timeout.ms=4000
    atlas.kafka.zookeeper.connection.timeout.ms=2000
    atlas.kafka.enable.auto.commit=true
    
  2. 建立topic

    kafka-topics.sh --zookeeper n1:2181,n2:2181,n3:2181 --create --topic ATLAS_HOOK --partitions 1 --replication-factor 3
    kafka-topics.sh --zookeeper n1:2181,n2:2181,n3:2181 --create --topic ATLAS_ENTITIES --partitions 1 --replication-factor 3
    

    topic的名稱可在 atlas-2.1.0/bin/atlas_config.py 中的 get_topics_to_create 方法找到,kafka設定指令碼為 atlas-2.1.0/bin/atlas_kafka_setup.py


配置LDAP

  1. 在 atlas-application.properties 中增加/修改如下配置

    atlas.authentication.method.ldap=true
    atlas.authentication.method.ldap.type=ldap
    atlas.authentication.method.ldap.url=ldap://xx.xx.xx.xx:389
    atlas.authentication.method.ldap.userDNpattern=uid={0},ou=employee,dc=xx,dc=xxxx,dc=com
    atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=employee,dc=xx,dc=xxxx,dc=com)
    atlas.authentication.method.ldap.groupRoleAttribute=cn
    atlas.authentication.method.ldap.base.dn=dc=xx,dc=xxxx,dc=com
    atlas.authentication.method.ldap.bind.dn=ou=employee,dc=xx,dc=xxxx,dc=com
    
  2. LDAP配置解釋,參見這裡


Atlas其他配置

  1. 在 atlas-application.properties 中修改如下配置

    atlas.rest.address=http://n1:21000
    atlas.server.run.setup.on.start=false
    atlas.audit.hbase.tablename=apache_atlas_entity_audit
    atlas.audit.hbase.zookeeper.quorum=n1:2181,n2:2181,n3:2181
    
  2. 將 atlas-log4j.xml 中如下內容取消註釋

    <appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender">
        <param name="file" value="${atlas.log.dir}/atlas_perf.log" />
        <param name="datePattern" value="'.'yyyy-MM-dd" />
        <param name="append" value="true" />
        <layout class="org.apache.log4j.PatternLayout">
            <param name="ConversionPattern" value="%d|%t|%m%n" />
        </layout>
    </appender>
    
    <logger name="org.apache.atlas.perf" additivity="false">
        <level value="debug" />
        <appender-ref ref="perf_appender" />
    </logger>
    


啟動Atlas

  1. 按如下順序啟動各元件
    順序 節點 元件
    1 n1 zookeeper
    2 n1 kafka
    3 n1 hdfs
    4 n2 yarn
    5 n3 jobhistoryserver
    6 n3 hbase
    7 n1 solr
    8 n1 msyql
    9 n1 hive
    10 n1 atlas
  2. 執行 bin/atlas_start.py
  3. 瀏覽器訪問http://n1:21000


配置Hive Hook

  1. 在 hive-site.xml 中修改如下配置項

    <property>
        <name>hive.exec.post.hooks</name>
        <value>org.apache.atlas.hive.hook.HiveHook</value>
    </property>
    
  2. 解壓 apache-atlas-2.1.0-hive-hook.tar.gz,並進入 apache-atlas-hive-hook-2.1.0 目錄

  3. 將 apache-atlas-hive-hook-2.1.0/hook/hive 中的全部內容複製到 atlas-2.1.0/hook/hive 中

  4. 在 hive-env.sh 中修改如下內容

    export HIVE_AUX_JARS_PATH=/root/atlas-2.1.0/hook/hive
    
  5. 在 atlas-application.properties 增加如下配置

    atlas.hook.hive.synchronous=false
    atlas.hook.hive.numRetries=3
    atlas.hook.hive.queueSize=10000
    atlas.cluster.name=primary
    atlas.kafka.zookeeper.connect=n1:2181,n2:2181,n3:2181
    atlas.kafka.zookeeper.connection.timeout.ms=30000
    atlas.kafka.zookeeper.session.timeout.ms=60000
    atlas.kafka.zookeeper.sync.time.ms=20
    
  6. 將 atlas-application.properties 複製到hive的conf目錄中

  7. 執行./hook-bin/import-hive.sh將hive元資料匯入atlas,使用者名稱密碼為登入atlas的使用者名稱和密碼

    ./hook-bin/import-hive.sh -d hive_testdb
    ……
    Enter username for atlas :- admin
    Enter password for atlas :-
    ……
    Hive Meta Data imported successfully!!!
    
  8. 進入/重新整理atlas頁面,在左側的search中可看見hive已經有相關資料

  9. 選擇 hive_db(1) 點選search,結果如下圖所示

  10. 查看錶血緣