Apache Atlas 2.1.0編譯部署手冊
環境準備
元件版本
元件 | 部署版本 | 原始碼版本 |
---|---|---|
os | CentOS 7.6.1810 | -- |
java | 1.8.0_252 | -- |
zookeeper | 3.4.14 | 3.4.6 |
kafka | 2.11-2.0.0 | 2.11-2.0.0 |
hadoop | 3.1.1 | 3.1.1 |
hbase | 2.0.2 | 2.0.2 |
solr | 7.5.0 | 7.5.0 |
hive | 3.1.0 | 3.1.0 |
atlas | 2.1.0 | 2.1.0 |
角色分配
元件 | n1 192.168.222.11 |
n2 192.168.222.12 |
n3 192.168.222.13 |
---|---|---|---|
JDK | √ | √ | √ |
zookeeper | √ | √ | √ |
kafka | √ | √ | √ |
NameNode | √ | -- | -- |
SecondaryNameNode | -- | -- | √ |
MR JobHistory Server | -- | -- | √ |
DataNode | √ | √ | √ |
ResourceManager | -- | √ | -- |
NodeManager | √ | √ | √ |
hbase | √ | √ | √(Master) |
solr | √ | √ | √ |
hive | √ | -- | -- |
MySQL | √ | -- | -- |
atlas | √ | -- | -- |
配置域名解析
在各節點 /etc/hosts 檔案中新增如下內容
192.168.222.11 n1
192.168.222.12 n2
192.168.222.13 n3
配置Maven
修改 conf/settings.xml 配置檔案如下內容
<!-- 修改Maven包存放路徑 --> <localRepository>/home/atlas/maven_packages</localRepository> <!-- 修改映象 --> <mirror> <id>mirrorId</id> <mirrorOf>repositoryId</mirrorOf> <name>Human Readable Name for this Mirror.</name> <url>http://my.repository.com/repo/path</url> </mirror> --> <mirror> <id>alimaven</id> <name>aliyun maven</name> <url>https://maven.aliyun.com/repository/public</url> <mirrorOf>central</mirrorOf> </mirror> <mirror> <id>Central</id> <mirrorOf>central</mirrorOf> <name>Central Maven</name> <url>https://repo1.maven.org/maven2</url> </mirror>
環境變數
export MAVEN_OPTS="-Xms4g -Xmx4g"
export MAVEN_HOME=/home/atlas/maven-3.6.3
export PATH=$MAVEN_HOME/bin:$PATH
配置SSH免密
- 在各節點執行
ssh-keygen -t rsa
,輸入三次回車完成配置 - 將n2、n3節點的/root/.ssh/id_rsa.pub複製到n1節點,並重命名成對應的節點名稱
scp n2:/root/.ssh/id_rsa.pub /root/n2 scp n3:/root/.ssh/id_rsa.pub /root/n3
- 在n1節點上,將所有節點的 id_rsa.pub 內容寫入至n1節點的 /root/.ssh/authorized_keys 檔案中
cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys cat /root/n2 >> /root/.ssh/authorized_keys cat /root/n3 >> /root/.ssh/authorized_keys
- 在n1節點使用ssh登陸各節點(包含本機),填充 known_hosts 檔案
- 將n1節點上的 authorized_keys 和 known_hosts 複製到其餘各節點的 /root/.ssh/ 目錄中
在每個節點測試免密碼登陸是否生效scp /root/.ssh/authorized_keys n2:/root/.ssh scp /root/.ssh/authorized_keys n3:/root/.ssh scp /root/.ssh/known_hosts n2:/root/.ssh scp /root/.ssh/known_hosts n3:/root/.ssh
配置時間同步
- 是執行 rpm -qa | grep chrony 檢查是否已經安裝chrony;若沒有,執行 yum -y install chrony 安裝
- vim /etc/chrony.conf 修改如下
- 同步各節點的 chrony.conf 配置
- 啟動chrony服務並設為開機啟動
systemctl enable chronyd.service systemctl start chronyd.service systemctl status chronyd.service
- 檢查是否已經同步:timedatectl(NTP synchronized)
Java環境變數
export JAVA_HOME=/home/atlas/jdk8
export JRE_HOME=$JAVA_HOME/jre
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HOME/.local/bin:$HOME/bin
export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPAT
配置本地yum源
- 在n1節點建立 /etc/yum.repos.d/base.repo 檔案,並增加如下內容源到repo檔案中
[Local_ISO] name=Loacal ISO baseurl=file:///mnt gpgcheck=0 enabled=1
- 在n1節點執行
mount /dev/sr0 /mnt
掛載系統光碟到/mnt目錄 - 上傳 createrepo-0.9.9-28.el7.noarch.rpm 檔案到n1節點的 /root/files/ 中,並執行
yum -y localinstall /root/files/createrepo-0.9.9-28.el7.noarch.rpm
,所需要的兩個依賴包可以在系統光碟中找到 - 在c1節點建立/root/rpms路徑,將需要的rpm包都上傳到該路徑下
- 向 /etc/yum.repos.d/base.repo 檔案新增如下內容
[Local_RPM] name=Loacal RPM baseurl=http://cm:10040/rpms gpgcheck=0 enabled=1
- 在n1節點 /root 目錄中執行
python -m SimpleHTTPServer 10040
編譯打包Atlas
編譯Atlas
mvn clean -DskipTests install -e
npm-6.13.7.tgz無法下載
自行下載 npm-6.13.7.tgz 後放入 /home/atlas/maven_packages/com/github/eirslett/npm/6.13.7/ 目錄,並重命名為 npm-6.13.7.tar.gz
提示資訊:Downloading http://registry.npmjs.org/npm/-/npm-6.13.7.tgz to /home/atlas/maven_packages/com/github/eirslett/npm/6.13.7/npm-6.13.7.tar.gz
node-sass無法安裝
在使用者home目錄下建立.npmrc
,在該檔案內寫入國內映象源
registry=https://registry.npm.taobao.org/
sass_binary_site=https://npm.taobao.org/mirrors/node-sass
chromedriver_cdnurl=https://npm.taobao.org/mirrors/chromedriver
phantomjs_cdnurl=https://npm.taobao.org/mirrors/phantomjs
electron_mirror=https://npm.taobao.org/mirrors/electron
更多原因參見這裡
打包Atlas
# 不使用內建hbase和solr
mvn clean -DskipTests package -Pdist
# 使用內建hbase和solr
mvn clean -DskipTests package -Pdist,embedded-hbase-solr
打包完成後產生如下檔案
上傳編譯好的檔案
上傳 apache-atlas-2.1.0-server.tar.gz 檔案
tar -zxf apache-atlas-2.1.0-server.tar.gz
mv apache-atlas-2.1.0/ atlas-2.1.0/
cd atlas-2.1.0/
安裝必要元件
安裝Zookeeper-3.4.14
- 上傳 zookeeper-3.4.14.tar.gz 並解壓縮
- 建立 zookeeper-3.4.14/zkData 目錄
- 在 zookeeper-3.4.14/zkData 目錄中建立myid檔案
- 將 zookeeper-3.4.14 目錄分發到各節點
- 修改各節點 zookeeper-3.4.14/zkData/myid 的整數值,即節點編號,各節點唯一。
- 進入 zookeeper-3.4.14/conf 目錄,將zoo_sample.cfg重新命名為zoo.cfg
- 修改 zoo.cfg 的如下引數
server.A=B:C:DdataDir=/root/zookeeper-3.4.14/zkData server.1=n1:2888:3888 server.2=n2:2888:3888 server.3=n3:2888:3888
- A: 數字,表示第幾號伺服器。叢集模式下配置一個檔案myid,該檔案在dataDir目錄下,這個檔案裡面有一個數據就是A的值。Zookeeper啟動時讀取此檔案,拿到裡面的資料與zoo.cfg裡面的配置資訊比較從而判斷到底是哪個server。
- B: 是這個伺服器的IP地址或域名
- C: 是這個伺服器與叢集中的Leader伺服器交換資訊的埠
- D: 執行選舉時伺服器間通訊埠
- 同步各節點的 zookeeper-3.4.14/conf/zoo.cfg 配置
- 啟停與狀態檢視,在各節點執行
- 啟動:zookeeper-3.4.14/bin/zkServer.sh start
- 停止:zookeeper-3.4.14/bin/zkServer.sh stop
- 狀態:zookeeper-3.4.14/bin/zkServer.sh status
安裝kafka_2.11-2.0.0
- 在
.bash_profile
中增加如下變數export KAFKA_HOME=/root/kafka_2.11-2.0.0 export PATH=$PATH:${KAFKA_HOME}/bin
- 建立 kafka_2.11-2.0.0/kfData 目錄,用於存放kafka資料
- 開啟
config/server.properties
,主要修改引數如下所示broker.id=1 delete.topic.enable=true listeners=PLAINTEXT://:9092 log.dirs=/root/kafka_2.11-2.0.0/kfData zookeeper.connect=n1:2181,n2:2181,n3:2181
- broker.id:每個broker配置唯一的整數值
- advertised.listeners:若只在內部使用kafka,則配置listeners即可。若需要內外網分開控制,則配置該引數
- delete.topic.enable:允許刪除topic
- log.dirs:kafka資料存放目錄
- 將 config/server.properties 檔案分發到各 broker,並修改 broker.id 的數值
- 在各節點執行
./bin/kafka-server-start.sh -daemon ./config/ server.properties
啟動kafka。
安裝hadoop-3.1.1
-
配置系統環境變數
在.bash_profile
中配置如下內容export HADOOP_HOME=/root/hadoop-3.1.1 export PATH=$PATH:${HADOOP_HOME}/bin
-
核心配置檔案
在 hadoop-3.1.1/etc/hadoop/core-site.xml 檔案中修改如下配置<!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://n1:9000</value> </property> <!-- 指定Hadoop執行時產生檔案的儲存目錄 --> <property> <name>hadoop.tmp.dir</name> <value>/root/hadoop-3.1.1/data/tmp</value> </property> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property>
-
HDFS配置檔案
在 hadoop-3.1.1/etc/hadoop/hadoop-evn.sh 修改如下配置export JAVA_HOME=/root/jdk8 export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root
在 hadoop-3.1.1/etc/hadoop/hdfs-site.xml 修改如下配置
<property> <name>dfs.replication</name> <value>3</value> </property> <!-- 指定Hadoop SecondaryNameNode --> <property> <name>dfs.namenode.secondary.http-address</name> <value>n3:50090</value> </property> <!-- NameNode本地存放namespace和transaction日誌路徑 --> <property> <name>dfs.namenode.name.dir</name> <value>/root/hadoop-3.1.1/data/namenode</value> </property> <!-- 32MB --> <property> <name>dfs.blocksize</name> <value>33554432</value> </property> <!-- DataNode本地存放路徑 --> <property> <name>dfs.datanode.data.dir</name> <value>/root/hadoop-3.1.1/data/datanode</value> </property>
-
YARN配置檔案
在 hadoop-3.1.1/etc/hadoop/yarn-site.xml 修改如下配置<!-- Reducer獲取資料的方式 --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定Yarn的ResourceManager地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>n2</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property>
-
MapReduce配置檔案
在 hadoop-3.1.1/etc/hadoop/mapred-site.xml 修改如下配置<!-- 指定MR執行在Yarn上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=/root/hadoop-3.1.1</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=/root/hadoop-3.1.1</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=/root/hadoop-3.1.1</value> </property> <!--jobhistory地址--> <property> <name>mapreduce.jobhistory.address</name> <value>shucang-26:10020</value> <description>MapReduce JobHistory Server IPC host:port</description> </property> <!--通過瀏覽器訪問jobhistory的地址--> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>shucang-26:19888</value> <description>MapReduce JobHistory Server Web UI host:port</description> </property> <!--MapReduce作業執行完之後放到哪裡--> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/job/history/done</value> </property> <!--正在執行中的放到哪--> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/job/history/done_intermediate</value> </property> <!--每個Job Counter的數量--> <property> <name>mapreduce.job.counters.limit</name> <value>500</value> </property> <!--每個Map任務記憶體上限--> <property> <name>mapreduce.map.memory.mb</name> <value>2048</value> </property> <!--每個Job Counter的數量,建議為mapreduce.map.memory.mb的80%--> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx1638m</value> </property> <!--每個Reduce任務記憶體上限--> <property> <name>mapreduce.reduce.memory.mb</name> <value>2048</value> </property> <!--每個Job Counter的數量,建議為mapreduce.reduce.memory.mb的80%--> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx1638m</value> </property>
-
workers配置檔案
在hadoop-3.0.0/etc/hadoop/workers新增資料節點。該檔案中新增的內容結尾不允許有空格,檔案中不允許有空行。n1 n2 n3
-
將Hadoop分發到各節點
-
首次啟動進群需執行格式化
hadoop-3.1.1/bin/hdfs namenode -format
-
在 n1 上執行
/root/hadoop-3.1.1/sbin/start-dfs.sh
啟動HDFS -
在 n2 上執行
/root/hadoop-3.1.1/sbin/start-yarn.sh
啟動Yarn -
在 n3 上執行
/root/hadoop-3.1.1/bin/mapred --daemon start historyserver
啟動MR Job History Server -
執行如下命令測試HDFS和MapReduce
hadoop fs -mkdir -p /tmp/input hadoop fs -put $HADOOP_HOME/README.txt /tmp/input export hadoop_version=`hadoop version | head -n 1 | awk '{print $2}'` hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-$hadoop_version.jar wordcount /tmp/input /tmp/output
安裝hbase-2.0.2
- 配置系統變數
在.bash_profile
中配置下面的環境變數export HBASE_HOME=/root/hbase-2.0.2 export PATH=$PATH:${HBASE_HOME}/bin
- hbase-env.sh 修改內容
export JAVA_HOME=/root/jdk8 # Configure PermSize. Only needed in JDK7. You can safely remove it for JDK8+ # export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m" # export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m" export HBASE_MANAGES_ZK=false
- hbase-site.xml 修改內容
<property> <name>hbase.rootdir</name> <value>hdfs://n1:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <!-- 0.98後的新變動,之前版本沒有.port,預設埠是60000 --> <!-- 16000是預設值不配也可以,WEBUI埠是16010 --> <property> <name>hbase.master.port</name> <value>16000</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>n1,n2,n3</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/root/zookeeper-3.4.14/zkData</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property>
- regionservers 修改內容
n1 n2 n3
- 將hbase分發到各節點
- 在各節點,軟連線hadoop配置檔案到hbase
ln -s /root/hadoop-3.1.1/etc/hadoop/core-site.xml /root/hbase-2.0.2/conf/core-site.xml ln -s /root/hadoop-3.1.1/etc/hadoop/hdfs-site.xml /root/hbase-2.0.2/conf/hdfs-site.xml
安裝Solr-7.5.0
- 執行 tar -zxf solr-7.5.0.tgz
- 進入solr目錄,修改 bin/solr.in.sh 如下引數
ZK_HOST="n1:2181,n2:2181,n3:2181" # 不同的節點配置不同的SOLR_HOST SOLR_HOST="n1"
- 將 /opt/solr 目錄分發到其他節點,並修改SOLR_HOST的值
- 在各節點 /etc/security/limits.conf 檔案中,新增如下內容,重啟後生效
root hard nofile 65000 root soft nofile 65000 root hard nproc 65000 root soft nproc 65000
- 在個節點執行 bin/solr start啟動solr
/opt/solr/bin/solr start
MySQL-5.7.30
- 執行 rpm -qa | grep mariadb 檢查是否安裝了 mariadb。若存在則執行 rpm -e --nodeps xxx 進行刪除
- 將 mysql-5.7.26-1.el7.x86_64.rpm-bundle.tar 上傳至n1節點的 /root/rmps 目錄中,並解壓
- 執行
createrepo -d /root/rpms/ && yum clean all
- 執行
yum -y install mysql-community-server mysql-community-client
- 修改 /etc/my.cnf
[mysqld] # Remove leading # and set to the amount of RAM for the most important data # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%. # innodb_buffer_pool_size = 128M # # Remove leading # to turn on a very important data integrity option: logging # changes to the binary log between backups. log_bin=/var/lib/mysql/mysql_binary_log # # Remove leading # to set options mainly useful for reporting servers. # The server defaults are faster for transactions and fast SELECTs. # Adjust sizes as needed, experiment to find the optimal values. # join_buffer_size = 128M # sort_buffer_size = 2M # read_rnd_buffer_size = 2M datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock transaction-isolation = READ-COMMITTED # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 #In later versions of MySQL, if you enable the binary log and do not set ##a server_id, MySQL will not start. The server_id must be unique within ##the replicating group. server_id=1 key_buffer_size = 32M max_allowed_packet = 32M thread_stack = 256K thread_cache_size = 64 query_cache_limit = 8M query_cache_size = 64M query_cache_type = 1 max_connections = 250 log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid character-set-server=utf8 binlog_format = mixed read_buffer_size = 2M read_rnd_buffer_size = 16M sort_buffer_size = 8M join_buffer_size = 8M # InnoDB settings innodb_file_per_table = 1 innodb_flush_log_at_trx_commit = 2 innodb_log_buffer_size = 64M innodb_buffer_pool_size = 4G innodb_thread_concurrency = 8 innodb_flush_method = O_DIRECT innodb_log_file_size = 512M [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid sql_mode=STRICT_ALL_TABLES [client] default-character-set=utf8
- 設定 MySQL 開機啟動
systemctl enable mysqld.service systemctl start mysqld.service systemctl status mysqld.service
- 執行
grep password /var/log/mysqld.log
獲得初始密碼 - 執行
mysql_secure_installation
對MySQL做基礎配置
登入MySQL,執行show variables like "%char%"; 檢查字符集是否為utf8Securing the MySQL server deployment. Enter password for user root: 輸入初始密碼 The existing password for the user account root has expired. Please set a new password. New password: 輸入新密碼Root123! Re-enter new password: Root123! The 'validate_password' plugin is installed on the server. The subsequent steps will run with the existing configuration of the plugin. Using existing password for root. Estimated strength of the password: 100 Change the password for root ? ((Press y|Y for Yes, any other key for No) : n Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : y By default, a MySQL installation has an anonymous user, allowing anyone to log into MySQL without having to have a user account created for them. This is intended only for testing, and to make the installation go a bit smoother. You should remove them before moving into a production environment. Remove anonymous users? (Press y|Y for Yes, any other key for No) : y Success. Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot guess at the root password from the network. Disallow root login remotely? (Press y|Y for Yes, any other key for No) : y Success. By default, MySQL comes with a database named 'test' that anyone can access. This is also intended only for testing, and should be removed before moving into a production environment. Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y - Dropping test database... Success. - Removing privileges on test database... Success. Reloading the privilege tables will ensure that all changes made so far will take effect immediately. Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y Success. All done!
安裝Hive-3.1.0
-
配置系統變數
在.bash_profile
中配置下面的環境變數export HIVE_HOME=/root/apache-hive-3.1.0-bin export PATH=$PATH:${HIVE_HOME}/bin
-
配置Hive環境變數
在 apache-hive-3.1.0-bin/conf/hive-env.sh 檔案中修改如下內容HADOOP_HOME=${HADOOP_HOME} export HADOOP_HEAPSIZE=2048 export HIVE_CONF_DIR=${HIVE_HOME}/conf
-
在MySQL建立庫及使用者
CREATE DATABASE hive DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci; GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'Hive123!'; flush privileges;
-
將 mysql-connector-java-5.1.47-bin.jar 拷貝至 apache-hive-3.1.0-bin/lib/ 目錄中
-
在 apache-hive-3.1.0-bin/conf/hive-site.xml 檔案中修改如下內容
<property> <name>system:java.io.tmpdir</name> <value>/tmp/tmpdir</value> </property> <property> <name>system:user.name</name> <value>hive</value> </property> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://n1:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEncoding=UTF-8&useSSL=false</value> <description> JDBC connect string for a JDBC metastore. To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL. For example, jdbc:postgresql://myhost/db?ssl=true for postgres database. </description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>Username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>Hive123!</value> <description>password to use against metastore database</description> </property> <property> <name>hive.server2.authentication</name> <value>NONE</value> <description> Expects one of [nosasl, none, ldap, kerberos, pam, custom]. Client authentication types. NONE: no authentication check LDAP: LDAP/AD based authentication KERBEROS: Kerberos/GSSAPI authentication CUSTOM: Custom authentication provider (Use with property hive.server2.custom.authentication.class) PAM: Pluggable authentication module NOSASL: Raw transport </description> </property> <!--這裡配置的使用者要求對inode="/tmp/hive" 有執行許可權--> <property> <name>hive.server2.thrift.client.user</name> <value>root</value> <description>Username to use against thrift client</description> </property> <property> <name>hive.server2.thrift.client.password</name> <value>Root23!</value> <description>Password to use against thrift client</description> </property> <property> <name>hive.metastore.db.type</name> <value>mysql</value> <description> Expects one of [derby, oracle, mysql, mssql, postgres]. Type of database used by the metastore. Information schema & JDBCStorageHandler depend on it. </description> </property>
-
執行 schematool -initSchema -dbType mysql 初始化MySQL
-
在MySQL的Hive庫中執行如下語句,避免Hive表、列、分割槽、索引等的中文註釋亂碼問題
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8; alter table TABLE_PARAMS modify column PARAM_VALUE mediumtext character set utf8; alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8; alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8; alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8;
-
執行 mkdir -p hive-3.1.0/logs
-
執行 cp hive-log4j2.properties.template hive-log4j2.properties ,並修改如下屬性
property.hive.log.dir = /root/hive-3.1.0/logs
-
執行
nohup hiveserver2 1>/dev/null 2>&1 & echo $! > /app/hive-3.1.0/logs/hiveserver2.pid
啟動Hiveserver2 -
執行
beeline -u jdbc:hive2://shucang-24:10000/default -n root -p Root123!
啟動Beeline
配置Atlas
Atlas配置Solr
- 在 atlas-application.properties 中修改如下配置
atlas.graph.index.search.backend=solr atlas.graph.index.search.solr.mode=cloud # ZK quorum setup for solr as comma separated value. atlas.graph.index.search.solr.zookeeper-url=n1:2181,n2:2181,n3:2181 atlas.graph.index.search.solr.wait-searcher=true
- 將 atlas 的
conf/solr
目錄複製到各 solr server 節點的/root/solr-7.5.0
目錄下,名重新命名為atlas_solr/
- 在 solr server 節點,建立collection
./solr create -c vertex_index -d /root/solr-7.5.0/atlas_solr -shards 1 -replicationFactor 3 -force ./solr create -c edge_index -d /root/solr-7.5.0/atlas_solr -shards 1 -replicationFactor 3 -force ./solr create -c fulltext_index -d /root/solr-7.5.0/atlas_solr -shards 1 -replicationFactor 3 -force
- 如需刪除 collection,請使用下面的語句,貼入瀏覽器位址列即可
http://n1:8983/solr/admin/collections?action=DELETE&name=vertex_index http://n1:8983/solr/admin/collections?action=DELETE&name=edge_index http://n1:8983/solr/admin/collections?action=DELETE&name=fulltext_index
Atlas配置Hbase
-
在 atlas-2.1.0/conf/atlas-application.properties 中修改如下配置
atlas.graph.storage.backend=hbase2 atlas.graph.storage.hbase.table=atlas atlas.graph.storage.hostname=n1:2181,n2:2181,n3:2181
-
在 atlas-env.sh 中修改如下配置
export HBASE_CONF_DIR=/root/hbase-2.0.2/conf
-
將hbase配置檔案複製到Atlas的 conf/hbase中
cp /root/hbase-2.0.2/conf/* /root /atlas-2.1.0/conf/hbase/
-
刪除 core-site.xml 和 hdfs-site.xml 檔案,重新生成軟連線
ln -s /root/hadoop-3.1.1/etc/hadoop/core-site.xml /root/atlas-2.1.0/conf/hbase/core-site.xml ln -s /root/hadoop-3.1.1/etc/hadoop/hdfs-site.xml /root/atlas-2.1.0/conf/hbase/hdfs-site.xml
Altas配置Kafka
-
在 atlas-application.properties 中修改如下配置
atlas.notification.embedded=false atlas.kafka.data=/root/atlas-2.1.0/data/kafka atlas.kafka.zookeeper.connect=n1:2181,n2:2181,n3:2181 atlas.kafka.bootstrap.servers=n1:9092,n2:9092,n3:9092 atlas.kafka.zookeeper.session.timeout.ms=4000 atlas.kafka.zookeeper.connection.timeout.ms=2000 atlas.kafka.enable.auto.commit=true
-
建立topic
kafka-topics.sh --zookeeper n1:2181,n2:2181,n3:2181 --create --topic ATLAS_HOOK --partitions 1 --replication-factor 3 kafka-topics.sh --zookeeper n1:2181,n2:2181,n3:2181 --create --topic ATLAS_ENTITIES --partitions 1 --replication-factor 3
topic的名稱可在 atlas-2.1.0/bin/atlas_config.py 中的 get_topics_to_create 方法找到,kafka設定指令碼為 atlas-2.1.0/bin/atlas_kafka_setup.py
配置LDAP
-
在 atlas-application.properties 中增加/修改如下配置
atlas.authentication.method.ldap=true atlas.authentication.method.ldap.type=ldap atlas.authentication.method.ldap.url=ldap://xx.xx.xx.xx:389 atlas.authentication.method.ldap.userDNpattern=uid={0},ou=employee,dc=xx,dc=xxxx,dc=com atlas.authentication.method.ldap.groupSearchFilter=(member=uid={0},ou=employee,dc=xx,dc=xxxx,dc=com) atlas.authentication.method.ldap.groupRoleAttribute=cn atlas.authentication.method.ldap.base.dn=dc=xx,dc=xxxx,dc=com atlas.authentication.method.ldap.bind.dn=ou=employee,dc=xx,dc=xxxx,dc=com
-
LDAP配置解釋,參見這裡
Atlas其他配置
-
在 atlas-application.properties 中修改如下配置
atlas.rest.address=http://n1:21000 atlas.server.run.setup.on.start=false atlas.audit.hbase.tablename=apache_atlas_entity_audit atlas.audit.hbase.zookeeper.quorum=n1:2181,n2:2181,n3:2181
-
將 atlas-log4j.xml 中如下內容取消註釋
<appender name="perf_appender" class="org.apache.log4j.DailyRollingFileAppender"> <param name="file" value="${atlas.log.dir}/atlas_perf.log" /> <param name="datePattern" value="'.'yyyy-MM-dd" /> <param name="append" value="true" /> <layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="%d|%t|%m%n" /> </layout> </appender> <logger name="org.apache.atlas.perf" additivity="false"> <level value="debug" /> <appender-ref ref="perf_appender" /> </logger>
啟動Atlas
- 按如下順序啟動各元件
順序 節點 元件 1 n1 zookeeper 2 n1 kafka 3 n1 hdfs 4 n2 yarn 5 n3 jobhistoryserver 6 n3 hbase 7 n1 solr 8 n1 msyql 9 n1 hive 10 n1 atlas - 執行 bin/atlas_start.py
- 瀏覽器訪問http://n1:21000
配置Hive Hook
-
在 hive-site.xml 中修改如下配置項
<property> <name>hive.exec.post.hooks</name> <value>org.apache.atlas.hive.hook.HiveHook</value> </property>
-
解壓 apache-atlas-2.1.0-hive-hook.tar.gz,並進入 apache-atlas-hive-hook-2.1.0 目錄
-
將 apache-atlas-hive-hook-2.1.0/hook/hive 中的全部內容複製到 atlas-2.1.0/hook/hive 中
-
在 hive-env.sh 中修改如下內容
export HIVE_AUX_JARS_PATH=/root/atlas-2.1.0/hook/hive
-
在 atlas-application.properties 增加如下配置
atlas.hook.hive.synchronous=false atlas.hook.hive.numRetries=3 atlas.hook.hive.queueSize=10000 atlas.cluster.name=primary atlas.kafka.zookeeper.connect=n1:2181,n2:2181,n3:2181 atlas.kafka.zookeeper.connection.timeout.ms=30000 atlas.kafka.zookeeper.session.timeout.ms=60000 atlas.kafka.zookeeper.sync.time.ms=20
-
將 atlas-application.properties 複製到hive的conf目錄中
-
執行
./hook-bin/import-hive.sh
將hive元資料匯入atlas,使用者名稱密碼為登入atlas的使用者名稱和密碼./hook-bin/import-hive.sh -d hive_testdb …… Enter username for atlas :- admin Enter password for atlas :- …… Hive Meta Data imported successfully!!!
-
進入/重新整理atlas頁面,在左側的search中可看見hive已經有相關資料
-
選擇 hive_db(1) 點選search,結果如下圖所示
-
查看錶血緣