1. 程式人生 > >CDH叢集部署與設定

CDH叢集部署與設定

1. ctdn-1
vi /etc/host
#127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.11.8.20 ctdn-1
10.11.8.28 ctdn-2
10.11.8.31 ctdn-3
10.11.8.16 ctdn-4
10.11.8.32 ctdn-5
10.11.8.35 ctdn-6

2.ALL
ssh-keygen -t rsa
cd .ssh

cat id_rsa.pub

3.ctdn-1
copy各臺id_rsa.pub中的內容至本臺的檔案authorized_keys

4.ctdn-1
將authorized_keys檔案scp到各臺伺服器
scp authorized_keys [email protected]:/root/.ssh
scp authorized_keys [email protected]:/root/.ssh
...
scp authorized_keys [email protected]:/root/.ssh
scp /etc/hosts [email protected]:/etc/
...
scp /etc/hosts [email protected]
:/etc/

5.ctdn-1
vi /etc/sysconfig/network
+NETWORKING_IPV6=no
scp /etc/sysconfig/network [email protected]:/etc/sysconfig
...
scp /etc/sysconfig/network [email protected]:/etc/sysconfig

6.關閉防火牆 ALL

檢查iptables和firewalld 服務均沒有安裝

7.設定時間同步
ALL:
yum -y install chrony
systemctl start chronyd
ctdn-1:
vi /etc/chrony.conf
allow 10.11/24
# Listen for commands only on localhost.
bindcmdaddress 127.0.0.1
bindcmdaddress ::1
# Serve time even if not synchronized to any NTP server.
local stratum 10
others:
vi /etc/chrony.conf
+server 10.11.8.20 iburst
ALL:

systemctl restart chronyd.service

8.系統優化 ALL
禁用交換分割槽
sysctl -w vm.swappiness=0
禁用透明大頁面
echo never > /sys/kernel/mm/transparent_hugepage/defrag

9.刪除自帶jdk
檢查沒有任何Java
ctdn-1:
下載jdk-8u121-linux-x64.tar.gz到/opt
scp jdk-8u121-linux-x64.tar.gz [email protected]:/opt
...
scp jdk-8u121-linux-x64.tar.gz [email protected]:/opt

10.安裝JDK ALL
tar zxvf jdk-8u121-linux-x64.tar.gz
ln -s /opt/jdk1.8.0_121 /opt/jdk
mkdir /usr/java

ln -s /opt/jdk /usr/java/default

11.設定Java環境變數 ctdn-1
vi /etc/profile
+
export JAVA_HOME=/opt/jdk
export PATH="$JAVA_HOME/bin:$PATH"

scp /etc/profile [email protected]:/etc
...
scp /etc/profile [email protected]:/etc

ALL:
source /etc/profile

12.ctdn-1:
刪除MariaDB
rpm -qa|grep mariadb
rpm -e mariadb-libs-5.5.44-2.el7.centos.x86_64 --nodeps
wget http://ftp.tu-chemnitz.de/pub/linux/dag/redhat/el6/en/x86_64/rpmforge/RPMS/axel-2.4-1.el6.rf.x86_64.rpm
rpm -ivh axel-2.4-1.el6.rf.x86_64.rpm
axel -n 20 https://cdn.mysql.com//Downloads/MySQL-5.7/mysql-5.7.20-1.el7.x86_64.rpm-bundle.tar
tar -xvf mysql-5.7.20-1.el7.x86_64.rpm-bundle.tar

rpm -ivh mysql-community-common-5.7.20-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-5.7.20-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-5.7.20-1.el7.x86_64.rpm
yum install -y libaio
yum install -y libaio-devel
rpm -ivh mysql-community-server-5.7.20-1.el7.x86_64.rpm
systemctl start mysqld.service
systemctl enable mysqld.service
修改root管理員密碼
獲得臨時祕密:
grep 'temporary password' /var/log/mysqld.log
mysql -uroot -p
mysql> SET PASSWORD FOR 'root'@'localhost' = PASSWORD('[email protected]');
[ ctdn-5:
mysql> SET PASSWORD FOR 'root'@'localhost' = PASSWORD('[email protected]');
]
為hive oozie建立資料庫:
mysql> CREATE DATABASE hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
mysql> CREATE DATABASE oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
mysql> CREATE DATABASE hue DEFAULT CHARSET utf8 COLLATE utf8_general_ci;  
mysql> CREATE DATABASE amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci;
mysql> grant all privileges on *.* to 'cdh'@'%' identified by '[email protected]' with grant option;
mysql> flush privileges;

13.建立scm使用者 all
useradd --system --home=/opt/cm-5.5.1/run/cloudera-scm-server/ --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm

14.建立 ctdn-1
# rm -rf /user/hive/warehouse
# mkdir -p /user/hive/warehouse
# chown cloudera-scm:cloudera-scm /user/hive/warehouse
# rm -rf /var/lib/cloudera-host-monitor
# mkdir -p /var/lib/cloudera-host-monitor
# chown cloudera-scm:cloudera-scm /var/lib/cloudera-host-monitor
# rm -rf  /var/lib/cloudera-service-monitor
# mkdir -p /var/lib/cloudera-service-monitor

# chown cloudera-scm:cloudera-scm /var/lib/cloudera-service-monitor

15.安裝包 all
yum install -y psmisc libxlst libxslt-python

16.ctdn-1
下載cloudera-manager-centos7-cm5.8.0_x86_64.tar.gz
axel -n 50 http://archive.cloudera.com/cm5/cm/5/cloudera-manager-centos7-cm5.8.0_x86_64.tar.gz
下載CDH Percel包
axel -n 20 http://archive.cloudera.com/cdh5/parcels/5.8.0/CDH-5.8.0-1.cdh5.8.0.p0.42-el7.parcel
axel -n 10 http://archive.cloudera.com/cdh5/parcels/5.8.0/CDH-5.8.0-1.cdh5.8.0.p0.42-el7.parcel.sha1
wget http://archive.cloudera.com/cdh5/parcels/5.8.0/manifest.json
下載mysqljdbc驅動
wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.44.tar.gz

17.安裝
tar zxvf cloudera-manager-centos7-cm5.8.0_x86_64.tar.gz
tar zxvf mysql-connector-java-5.1.44.tar.gz
cd mysql-connector-java-5.1.44
cp mysql-connector-java-5.1.44-bin.jar /opt/cm-5.8.0/share/cmf/lib/
初始化Cloudera Manager資料庫
cd /opt/cm-5.8.0/share/cmf/schema
./scm_prepare_database.sh mysql cm -hlocalhost -ucdh -p'[email protected]' scm '[email protected]'
修改配置
cd /opt/cm-5.8.0/etc/cloudera-scm-agent
vi config.ini
+server_host=10.11.8.20
將cm-5.8.0目錄同步到其他伺服器:
scp -r cm-5.8.0 [email protected]:/opt/
......
scp -r cm-5.8.0 [email protected]:/opt/

18.parcel ctdn-1:
cd cloudera/parcel-repo/
cp /opt/CDH-5.8.0-1.cdh5.8.0.p0.42-el7.parcel .
cp /opt/CDH-5.8.0-1.cdh5.8.0.p0.42-el7.parcel.sha1 .
cp /opt/manifest.json .
mv CDH-5.8.0-1.cdh5.8.0.p0.42-el7.parcel.sha1 CDH-5.8.0-1.cdh5.8.0.p0.42-el7.parcel.sha

19.啟動CM服務 
ctdn-1:
/opt/cm-5.8.0/etc/init.d/cloudera-scm-server start
hostnamectl set-hostname ctdn-1
vim /etc/cloud/cloud.cfg
+# - set_hostname
+# - update_hostname
mkdir /opt/cm-5.8.0/run/cloudera-scm-agent
/opt/cm-5.8.0/etc/init.d/cloudera-scm-server restart
/opt/cm-5.8.0/etc/init.d/cloudera-scm-agent restart
ctdn-2~ctdn-6:
hostnamectl set-hostname ctdn-*
vim /etc/cloud/cloud.cfg
+# - set_hostname
+# - update_hostname
mkdir -p /opt/cm-5.8.0/run/cloudera-scm-agent
/opt/cm-5.8.0/etc/init.d/cloudera-scm-agent start

20.安裝叢集,登入
http://10.11.8.20:7180
使用者名稱/密碼:admin/[email protected]

21
正在安裝選定 Parcel 步驟時無法分配,採取如下措施:
ALL 修改/etc/hosts檔案,去掉如下所在行的#註釋,如下:
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

22.檢查主機正確性,根據提示做如下修改:
ALL:
echo 'vm.swappiness=10'>> /etc/sysctl.conf
echo never > /sys/kernel/mm/transparent_hugepage/defrag

23群集設定
dfs.data.dir, dfs.datanode.data.dir:/data/dfs/dn
dfs.name.dir, dfs.namenode.name.dir:/data/dfs/nn
fs.checkpoint.dir, dfs.namenode.checkpoint.dir:/data/dfs/snn
Oozie 伺服器資料目錄:/var/lib/oozie/data
ZooKeeper Znode:/solr
HDFS 資料目錄:/solr
NodeManager 本地目錄 yarn.nodemanager.local-dirs:/data/yarn/nm

24.啟動叢集時遇到如下問題,修改時鐘同步後,這樣錯誤沒有出現
Can't open /opt/cm-5.8.0/run/cloudera-scm-agent/process/46-hbase-MASTER/supervisor.conf: Permission denied.
ctdn-2
cd /opt/cm-5.8.0/run/cloudera-scm-agent/process/46-hbase-MASTER
發現:
-rw------- 1 root  root  3406 Nov 21 01:47 supervisor.conf
操作:
chown -R hbase:hbase *
問題沒有解決,重新生成新的48-hbase-MASTER,找到hbase.sh
/opt/cm-5.8.0/lib64/cmf/service/hbase/hbase.sh

補充:
cp mysql-connector-java-5.1.44-bin.jar /opt/cloudera/parcels/CDH/lib/hive/lib
cp mysql-connector-java-5.1.44-bin.jar /opt/cloudera/parcels/CDH/lib/hadoop
cp mysql-connector-java-5.1.44-bin.jar /var/lib/oozie

1.all
sudo timedatectl set-timezone 'Asia/Shanghai'
yum install ntp -y
ctdn-1
systemctl restart ntpd.service
vi /etc/ntp.conf
+
server 0.cn.pool.ntp.org
server 0.asia.pool.ntp.org
server 3.asia.pool.ntp.org

# allow update time by the upper server
# 允許上層時間伺服器主動修改本機時間
restrict 0.cn.pool.ntp.org nomodify notrap noquery
restrict 0.asia.pool.ntp.org nomodify notrap noquery
restrict 3.asia.pool.ntp.org nomodify notrap noquery

# Undisciplined Local Clock. This is a fake driver intended for backup
# and when no outside source of synchronized time is available.
# 外部時間伺服器不可用時,以本地時間作為時間服務
server  127.127.1.0     # local clock
fudge   127.127.1.0 stratum 10

systemctl restart ntpd.service
chkconfig --level 35 ntpd on
netstat -tlunp | grep ntp (檢查123埠,且協議為udp,需要開啟此協議)

ctdn-2~ctdn-6:
vi /etc/ntp.conf
+ server 10.51.120.12 prefer
chkconfig ntpd on
systemctl restart ntpd.service
ntpdate -u 10.51.120.12
hwclock --systohc #把系統時間同步到硬體BIO
2. datanode啟動失敗
檢視stderr
Can't open /opt/cm-5.8.0/run/cloudera-scm-agent/process/345-hdfs-DATANODE/supervisor.conf: Permission denied.
一直以為訪問許可權有問題,但是無論chmod還是chown均不能解決問題,而且有的成功有的失敗。
cd /var/log/hadoop-hdfs
tail -f -n 2000 hadoop-cmf-hdfs-DATANODE-ctdn-1.log.out
檢視/var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-master1.log.out 
WARN  Failed to add storage directory [DISK]file:/data/dfs/dn/
java.io.IOException: Incompatible clusterIDs in /data/dfs/dn: namenode clusterID = cluster34; datanode clusterID = cluster21
FATAL Initialization failed for Block pool <registering> (Datanode Uuid 357950bd-2e5b-4e89-b731-f58694461c55) service to ctdn-2/10.11.8.28:8022. Exiting.
解決:
3. namenode
沒有格式化
hadoop namenode -format
chown -R hdfs:hadoop /data/dfs/
4.NFS Gateway不能啟動
錯誤:No portmap or rpcbind service is running on this host
all
yum install rpcbind -y

hdfs-site.xml
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
否則:
Permission denied: user=mapred, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
+++++++++++++++++++++++++++++++++++++++++++++++
pip3 install flask-appbuilder
+++++++++++++++++++++++++++++++++++++++++
ctdn01:
hadoop distcp hdfs://ctdn/code hdfs://10.11.8.31:8022/ success
hadoop distcp hdfs://ctdn/data hdfs://10.11.8.31:8022/success
hadoop distcp hdfs://ctdn/hbase/data hdfs://10.11.8.31:8022/ doing
執行至:hadoop distcp hdfs://ctdn/hbase/data/default/sinacommentstotal hdfs://10.11.8.31:8022/hbase/data/default
全部遷移完成後,執行
sudo -u hbase hbase hbck  -fixAssignments -fixMeta
hadoop distcp hdfs://ctdn/nash hdfs://10.11.8.31:8022/success
hadoop distcp hdfs://ctdn/root hdfs://10.11.8.31:8022/success
hadoop distcp hdfs://ctdn/test hdfs://10.11.8.31:8022/success
hadoop distcp hdfs://ctdn/usr hdfs://10.11.8.28:8022/quit
hadoop distcp hdfs://ctdn/var hdfs://10.11.8.28:8022/ quit
hadoop distcp hdfs://ctdn/key hdfs://10.11.8.31:8022/success
hadoop distcp hdfs://ctdn/tmp hdfs://10.11.8.28:8022/ quit
hadoop distcp hdfs://ctdn/user/hive/warehouse hdfs://10.11.8.31:8022/ failed,retry
hadoop distcp hdfs://ctdn/user/fanxing hdfs://10.11.8.31:8022/user/ success
hadoop distcp hdfs://ctdn/user/ctdn hdfs://10.11.8.31:8022/ success
drwxr-xr-x   - hbase hadoop          0 2017-11-06 02:39 /hbase/archive
drwxr-xr-x   - hbase hadoop          0 2017-04-01 14:39 /hbase/corrupt
drwxr-xr-x   - hbase hadoop          0 2017-04-01 14:39 /hbase/data
hdfs haadmin -getServiceState namenode94
hdfs haadmin -getServiceState namenode131
hdfs haadmin -transitionToActive --forcemanual namenode131
ctdn01:
mysqldump -uxhhlhive [email protected]!X19 -h 10.10.0.144 -t hive > hive.sql
[email protected]!X19i4n9g
mysqldump -urecomm -p -h 10.11.8.20 -t recomm --table project_inverst_bak > project_inverst.sql
ctdn-1:
mysql>source /opt/hive.sql
配置NameNode HA:
進入HDFS介面,在右上角的“操作”中,選擇點選“啟用High Availability”
輸入NameService名稱,這裡設定為:ctdn,點選繼續按鈕。
設定另一個NameNode節點,這裡設定為:cdh-node3.grc。設定JournalNode節點,這裡設定為:cdh-node[2-4].grc,一共3個節點。注意:NournalNode必須設定>=3個節點。
設定JournalNode目錄,cdh-node[2-4]上,這裡全部設定為:/data/dfs/jn
啟用HDFS的High Availability
如果發現有如下錯誤資訊,NameNode格式化失敗,可以忽略。 
成功啟用HA
更新Hive Metastore NameNodes 
重啟叢集失敗,報錯:Journal Storage Directory /data/dfs/jn/ctdn not formatted,進入NameNode配置介面,在右上角操作中,選擇點選“初始化共享編輯目錄”
journalnode同步
hdfs namenode -bootstrapStand

Hue:
使用者名稱/密碼:ctdnadhoc/Aaddhmoi

配置hue支援DB查詢
1. ctdn-5:
cd /etc/hue/conf
vi hue.ini
[librdbms]
[[databases]]
[[[mysql]]] (放開)
nice_name="My SQL DB"(放開)
name=ctdn (放開)
engine=mysql(放開)
host=10.9.130.142 (修改)
port=3306(放開)
user=root (修改)
password=IhNtPz6E2V34 (修改)
2.cloudera manager重啟hue
3.報錯:JournalError loading MySQLdb module: libmysqlclient.so.18: cannot open shared object file: No such file or directory
4.檢查:ctdn-6上沒有安裝MySQL lib
axel -n 50 https://cdn.mysql.com/archives/mysql-5.7/mysql-5.7.12-1.el7.x86_64.rpm-bundle.tar
tar -xvf mysql-5.7.12-1.el7.x86_64.rpm-bundle.tar
rpm -qa|grep mariadb
rpm -e mariadb-libs-5.5.44-2.el7.centos.x86_64 --nodeps
wget ftp://mirror.switch.ch/pool/4/mirror/mysql/Downloads/MySQL-5.5/MySQL-shared-5.5.57-1.el7.x86_64.rpm
rpm -ivh MySQL-shared-5.5.57-1.el7.x86_64.rpm
[[email protected] lib64]# find / -name libmysqlclient.so.18
/usr/lib64/libmysqlclient.so.18
5.hue:rdbms沒有配置
轉到cloudera manager
hue->配置
範圍->Hue Server
類別->高階
hue_safety_valve_server.ini 的 Hue Server 高階配置程式碼段(安全閥):
[librdbms]
# The RDBMS app can have any number of databases configured in the databases
# section. A database is known by its section name
# (IE sqlite, mysql, psql, and oracle in the list below).
[[databases]]
# sqlite configuration.
## [[[sqlite]]]
# Name to show in the UI.
## nice_name=SQLite
# For SQLite, name defines the path to the database.
## name=/tmp/sqlite.db
# Database backend to use.
## engine=sqlite
# Database options to send to the server when connecting.
# https://docs.djangoproject.com/en/1.4/ref/databases/
## options={}
# mysql, oracle, or postgresql configuration.
[[[mysql]]]
# Name to show in the UI.
nice_name="My SQL DB"
# For MySQL and PostgreSQL, name is the name of the database.
# For Oracle, Name is instance of the Oracle server. For express edition
# this is 'xe' by default.
name=ctdn
# Database backend to use. This can be:
# 1. mysql
# 2. postgresql
# 3. oracle
engine=mysql
# IP or hostname of the database to connect to.
host=10.9.130.142
# Port the database server is listening to. Defaults are:
# 1. MySQL: 3306
# 2. PostgreSQL: 5432
# 3. Oracle Express Edition: 1521
port=3306
# Username to authenticate with when connecting to the database.
user=root
# Password matching the username to authenticate with when
# connecting to the database.
password=IhNtPz6E
# Database options to send to the server when connecting.
# https://docs.djangoproject.com/en/1.4/ref/databases/
## options={}
6.cloudera manager重啟hue

hue支援spark
1.ctdn-5
cd /etc/yum.repos.d/
curl https://bintray.com/sbt/rpm/rpm > bintray-sbt-rpm.repo
yum -y install sbt
2. 
cd /data/tools
git clone https://github.com/ooyala/spark-jobserver.git
cd spark-jobserver
sbt
第二方案
1. wget http://archive.cloudera.com/beta/livy/livy-server-0.2.0.zip
unzip livy-server-0.2.0.zip
2. vi /etc/profile
JAVA_HOME SPARK_HOME 
3. su hdfs
cd /data/tools/livy-server-0.2.0
nohup bin/livy-server &
4. 退回到root使用者
vi /etc/hue/conf/hue.ini
[spark]
  # Host address of the Livy Server.
   livy_server_host=ctdn-5
+++++++++++++++++++++++++++++++++++++++++++
Kafka·
伺服器:ctdn-2~ctdn-6
1. 建立持久化目錄
mkdir /data/kafkaLogs
2. ctdn-2:
cd /opt
wget http://mirrors.hust.edu.cn/apache/kafka/1.0.0/kafka_2.11-1.0.0.tgz
tar zxvf kafka_2.11-1.0.0.tgz
mv kafka_2.11-1.0.0 kafka
cd kafka/config
vi server.properties
確定(配置broker id) broker.id=0
開啟監聽埠(取消這一行註釋):listeners=PLAINTEXT://:9092
設定zookeeper.connect:zookeeper.connect=ctdn-3:2181,ctdn-4:2181,ctdn-5:2181
確定超時設定:zookeeper.connection.timeout.ms=6000
新增啟用刪除topic配置:delete.topic.enable=true
關閉自動建立topic:auto.create.topics.enable=false
修改 log 的目錄:log.dirs=/data/kafkaLogs
一個topic預設1個分割槽數(確定):num.partitions=1 
num.recovery.threads.per.data.dir
3.同步到其他節點
scp -r kafka [email protected]:/opt/
......
scp -r kafka [email protected]:/opt/
4. ctdn-2~ctdn-6:
vi /etc/profile
+export KAFKA_HOME=/opt/kafka
+export PATH="$JAVA_HOME/bin:$KAFKA_HOME/bin:$PATH"
source /etc/profile
5.ctdn-3~ctdn-6:
cd /opt/kafka/config
vi server.properties
+broker.id=1(ctdn-3:1,...,ctdn-6:4)
6.啟動Kafka(ctdn-2~ctdn-6)
cd /opt/kafka
kafka-server-start.sh server.properties &
(可選:JMX_PORT=9997 bin/kafka-server-start.sh -daemon config/server.properties &)
關閉 
jps
kill -9 ****
7.測試叢集
建立topic test(ctdn-2)
bin/kafka-topics.sh --create --zookeeper ctdn-3:2181,ctdn-4:2181,ctdn-5:2181 --replication-factor 1 --partitions 1 --topic test
檢視所有topic
bin/kafka-topics.sh --list --zookeeper ctdn-3:2181,ctdn-4:2181,ctdn-5:2181
傳送訊息,生產者(ctdn-2)
bin/kafka-console-producer.sh --broker-list ctdn-2:9092 --topic test
啟動消費者(ctdn-3、ctdn-4)
bin/kafka-console-consumer.sh --zookeeper ctdn-3:2181,ctdn-4:2181,ctdn-5:2181 --topic test --from-beginning 
8.python支援
http://pykafka.readthedocs.io/en/latest/usage.html
pip3 install pykafka
------生產者pd.py-------
# -* coding:utf8 *-
from pykafka import KafkaClient
host = '10.11.8.16' #生產者伺服器IP
client = KafkaClient(hosts="%s:9092" % host)
print(client.topics)
topicdocu = client.topics[b'test']
producer = topicdocu.get_producer()
for i in range(4):
     print(i)
     producer.produce('test message '.encode('utf-8') + str(i ** 2).encode('utf-8'))
producer.stop()
------消費者cm.py-------
# -* coding:utf8 *-
from pykafka import KafkaClient
host = '10.11.8.16'  #消費者伺服器IP
client = KafkaClient(hosts="%s:9092" % host)
print(client.topics)
topic = client.topics[b'test']
consumer = topic.get_simple_consumer(consumer_group=b'test', auto_commit_enable=True, consumer_id=b'test')
for message in consumer:
    if message is not None:
        print(message.offset, message.value)
 
python3 cm.py
python3 pd.py

9.KSQL
ctdn-5
https://github.com/confluentinc/ksql
http://geek.csdn.net/news/detail/235801
bin/kafka-topics.sh --create --zookeeper ctdn-3:2181,ctdn-4:2181,ctdn-5:2181 --replication-factor 1 --partitions 1 --topic pageviews
bin/kafka-topics.sh --create --zookeeper ctdn-3:2181,ctdn-4:2181,ctdn-5:2181 --replication-factor 1 --partitions 1 --topic users

1. git clone https://github.com/confluentinc/ksql.git
2. cd ksql
mvn clean compile install -DskipTests
3. 進入KSQL環境
單機模式Standalone,KSQL客戶端和伺服器端在同一臺伺服器上,共用同一JVM
./bin/ksql-cli local
./bin/ksql-cli local --bootstrap-server kafka-broker-1:9092 \
                       --properties-file path/to/ksql-cli.properties
   
CS模式Client-server,在遠端伺服器、VM或容器上啟動KSQL池,CLI通過HTTP連線它們
啟動伺服器節點
./bin/ksql-server-start
./bin/ksql-server-start ksql-server.properties
ksql-server.properties內容如下:
# You must set at least the following two properties
bootstrap.servers=kafka-broker-1:9092
# Note: `application.id` is not really needed but you must set it
#       because of a known issue in the KSQL Developer Preview
application.id=app-id-setting-is-ignored
# Optional settings below, only for illustration purposes
# The hostname/port on which the server node will listen for client connections
listeners=http://0.0.0.0:8090

啟動客戶端,指定KSQL伺服器地址
./bin/ksql-cli remote http://my-ksql-server:8090
10. KSQL實驗
生產資料
topics:pageviews、users
為pageviews生產資料
java -jar ksql-examples/target/ksql-examples-4.1.0-SNAPSHOT-standalone.jar quickstart=pageviews format=delimited topic=pageviews maxInterval=10000
為users生產資料
java -jar ksql-examples/target/ksql-examples-4.1.0-SNAPSHOT-standalone.jar quickstart=users format=json topic=users maxInterval=10000
命令列方式:
kafka-console-producer --broker-list localhost:9092  \
                         --topic t1 \
                         --property parse.key=true \
                         --property key.separator=:
建表
ksql> 
CREATE STREAM pageviews_original (viewtime bigint, userid varchar, pageid varchar) WITH (kafka_topic='pageviews', value_format='DELIMITED');
DESCRIBE pageviews_original;
CREATE TABLE users_original (registertime bigint, gender varchar, regionid varchar, userid varchar) WITH (kafka_topic='users', value_format='JSON', key = 'userid');
DESCRIBE users_original;

SHOW STREAMS;
SHOW TABLES;

查詢
ksql> SELECT pageid FROM pageviews_original LIMIT 3;
ksql> CREATE STREAM pageviews_female AS SELECT users_original.userid AS userid, pageid, regionid, gender FROM pageviews_original LEFT JOIN users_original ON pageviews_original.userid = users_original.userid WHERE gender = 'FEMALE';
ksql> DESCRIBE pageviews_female;
ksql> SELECT * FROM pageviews_female;
ksql> CREATE STREAM pageviews_female_like_89 WITH (kafka_topic='pageviews_enriched_r8_r9', value_format='DELIMITED') AS SELECT * FROM pageviews_female WHERE regionid LIKE '%_8' OR regionid LIKE '%_9';
ksql> CREATE TABLE pageviews_regions WITH (value_format='avro') AS SELECT gender, regionid , COUNT(*) AS numusers FROM pageviews_female WINDOW TUMBLING (size 30 second) GROUP BY gender, regionid HAVING COUNT(*) > 1;
ksql> DESCRIBE pageviews_regions;
ksql> SHOW QUERIES;
bin/ksql-cli local --exec "SELECT * FROM pageviews_original LIMIT 5;"
KSQL配置
SET 'auto.offset.reset'='earliest';
預設值latest(最新的),從當前的offset讀取資料,可修改如上
SET 'commit.interval.ms'='5000';
預設值2000
++++++++++++++定時清理日誌++++++++++++++++++++
ctdn-2~ctdn-6:
1. 
cd /root/nash
vi clean_cloudslog.sh
+
#!/bin/bash
###Description:This script is used to clear kafka logs, not message file.
#####1.kafka
# log file dir.
logDir=/opt/kafka/logs
# Reserved 7 files.
COUNT=7
ls -t $logDir/server.log* | tail -n +$[$COUNT+1] | xargs rm -f
ls -t $logDir/controller.log* | tail -n +$[$COUNT+1] | xargs rm -f
ls -t $logDir/state-change.log* | tail -n +$[$COUNT+1] | xargs rm -f
ls -t $logDir/log-cleaner.log* | tail -n +$[$COUNT+1] | xargs rm –f
#####2.hbase
hbaseDir=/var/log/hbase
2.
crontab -e
+
0 0 * * 0 /root/nash/clean_cloudslog.sh
################kafka manager################################
安裝sbt
cd /etc/yum.repos.d/
curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
下載Yahoo kafka manager
cd /opt
git clone https://github.com/yahoo/kafka-manager.git
cd kafka-manager
sbt clean dist
看到:[info] Your package is ready in /opt/kafka-manager/target/universal/kafka-manager-1.3.3.14.zip
成功打包
cd /opt/kafka-manager/target/universal
cp kafka-manager-1.3.3.14.zip ~/
unzip -oq kafka-manager-1.3.3.14.zip
cd kafka-manager-1.3.3.14
vim conf/application.conf
+
#kafka-manager.zkhosts="kafka-manager-zookeeper:2181"
#kafka-manager.zkhosts=${?ZK_HOSTS}
kafka-manager.zkhosts="10.11.8.31:2181:10.11.8.16:2181:10.11.8.32:2181"
啟動
bin/kafka-manager 
This application is already running (Or delete /root/kafka-manager-1.3.3.14/RUNNING_PID file).
你也可以在啟動時指定配置檔案和監聽埠:
# bin/kafka-manager -Dconfig.file=/root/kafka-manager-1.3.3.14/conf/application.conf -Dhttp.port=8088
+++++++++++++++++python ctdn-1~ctdn-6++++++++++++
開始時只在ctdn-6上部署Python,執行pyspark,報錯:pyspark: ImportError: No module named numpy
經檢查,不是本臺伺服器沒有安裝,而是其他節點沒有部署Python環境。
https://www.python.org/ftp/python
yum -y install gcc-c++
yum -y install gcc
mkdir -p /usr/local/python3
cd /usr/local/python3
wget https://www.python.org/ftp/python/3.4.4/Python-3.4.4.tar.xz
tar xvf Python-3.4.4.tar
cd Python-3.4.4
./configure --prefix=/usr/local/python3/python344
make
make install
ln -s /usr/local/python3/python344/bin/python3 /usr/local/bin/python3
ln -s /usr/local/python3/python344/bin/pip3    /usr/local/bin/pip3
pip3 install numpy
不要做:pip3 install --upgrade pip
pip3 install pandas
pip3 install scipy-1.0.0-cp34-cp34m-manylinux1_x86_64.whl
pip3 install scikit_learn-0.19.1-cp34-cp34m-manylinux1_x86_64.whl
pip3 install matplotlib
cd /opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark/python
cp -r pyspark /usr/local/python3/python344/lib/python3.4/site-packages/
上述操作依舊沒有解決問題,設定環境變數如下,解決:
vi /etc/profile
+
export PYTHONPATH=/usr/local/python3/python344/lib/python3.4
export PYSPARK_PYTHON=/usr/local/bin/python3
但是python不能使用,
在/etc/profile中刪除PYTHONPATH也不管用,如下解決:
unset PYTHONPATH
+++++++++++++++CDH升級Spark2++++++++++++++++++++++++++++
1.所需軟體
http://archive.cloudera.com/spark2/csd/
下載SPARK2_ON_YARN-2.1.0.cloudera1.jar 
axel -n 20 http://archive.cloudera.com/spark2/csd/SPARK2_ON_YARN-2.1.0.cloudera1.jar
http://archive.cloudera.com/spark2/parcels/2.1.0.cloudera1/
下載SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel 
SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel.sha1
manifest.json
axel -n 50 http://archive.cloudera.com/spark2/parcels/2.1.0.cloudera1/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel
wget http://archive.cloudera.com/spark2/parcels/2.1.0.cloudera1/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel.sha1
wget http://archive.cloudera.com/spark2/parcels/2.1.0.cloudera1/manifest.json
2.ctdn-1 
cp SPARK2_ON_YARN-2.1.0.cloudera1.jar /data/cloudera/csd
cp SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel /data/cloudera/parcel-repo
cp SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel.sha1 /data/cloudera/parcel-repo
cd /data/cloudera/parcel-repo/
mv SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel.sha1 SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel.sha
mv manifest.json manifest.json.bak
cd /opt
cp manifest.json /opt/cloudera/parcel-repo/

ctdn-2~ctdn-6:
cd /opt/cloudera 
mkdir csd
mkdir parcel-repo

ctdn-1:
scp SPARK2_ON_YARN-2.1.0.cloudera1.jar [email protected]:/opt/cloudera/csd
......
scp SPARK2_ON_YARN-2.1.0.cloudera1.jar [email protected]:/opt/cloudera/csd
scp SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel [email protected]:/opt/cloudera/parcel-repo/
......
scp SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel [email protected]:/opt/cloudera/parcel-repo/
scp SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel.sha [email protected]:/opt/cloudera/parcel-repo/
......
scp SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904-el7.parcel.sha [email protected]:/opt/cloudera/parcel-repo/
修改檔案的使用者和組,我沒有做,沒有出問題,需要跟同目錄下其他檔案一樣。

3.停掉CM和叢集
然後重啟cm

4.登入cm
主機->Parcel,左側列表找到Spark2,點選,右上依次點選分配、啟用

5.返回主頁
叢集->新增服務,新增spark2服務。
http://blog.csdn.net/u010936936/article/details/73650417
cd /opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/etc/spark2/conf.dist/
cp /etc/spark/conf/spark-env.sh .
cp /etc/spark/conf/classpath.txt .

vi spark-env.sh
+
#export SPARK_HOME=/opt/cloudera/parcels/CDH-5.8.0-1.cdh5.8.0.p0.42/lib/spark
export SPARK_HOME=/opt/cloudera/parcels/SPARK2-2.1.0.cloudera1-1.cdh5.7.0.p0.120904/lib/spark2
export PYSPARK_PYTHON=/usr/local/bin/python3
export PYSPARK_DRIVER_PYTHON=python3
補充:
spark_master: ctdn-4:7077修改為spark_master: yarn  (基於yarn的設定為yarn, standonly(master-slave)的設定為masterIp:7077)
+++++++++++根目錄滿+++++++++
檢視inode使用率: df -i
df -lh
cd /
du -h -x --max-depth=1
發現/opt 5G
cd /opt
du -h -x --max-depth=1
/var/lib/cloudera-service-monitor/ts
+++++++++++agent假死+++++++++
[[email protected] init.d]# /opt/cm-5.8.0/etc/init.d/cloudera-scm-agent stop
Usage: grep [OPTION]... PATTERN [FILE]...                  [FAILED]
Try 'grep --help' for more information.
[[email protected] init.d]# /opt/cm-5.8.0/etc/init.d/cloudera-scm-agent start
cloudera-scm-agent is already running
[[email protected] init.d]# /opt/cm-5.8.0/etc/init.d/cloudera-scm-agent status
cloudera-scm-agent dead but pid file exists
[[email protected] cloudera-scm-agent]# find / -name cloudera-scm-agent.pid
find: ‘/proc/28696’: No such file or directory
/opt/cm-5.8.0/run/cloudera-scm-agent/cloudera-scm-agent.pid
[[email protected] cloudera-scm-agent]# cd /opt/cm-5.8.0/run/cloudera-scm-agent/
[[email protected] cloudera-scm-agent]# ll
total 4
drwxr-x--x   2 root         root            6 Nov 20 19:02 cgroups
-rw-r--r--   1 root         root            1 Dec  1 10:27 cloudera-scm-agent.pid
[[email protected] cloudera-scm-agent]# rm -f cloudera-scm-agent.pid
+++++++++++++++++++++++++HUE+++++++++++++++++++++++++
Could not start SASL: Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found
解決方案
yum -y install cyrus-sasl-plain
++++++++++++++++++++++++percona+++++++++++++++++++++++++++++++++
yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-release-0.1-4.noarch.rpm
yum install percona-toolkit -y

相關推薦

CDH叢集部署設定

1. ctdn-1vi /etc/host#127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4#::1         localhost localhost.local

Postgres-XL叢集部署管理指南

Postgres-XL是一個基於PostgreSQL資料庫的橫向擴充套件開源SQL資料庫叢集,具有足夠的靈活性來處理不同的資料庫工作負載,架構如下圖所示: Web 2.0 操作資料儲存 GIS的地理空間 混合業務工作環境 OLTP 寫頻繁的業務 多租戶服務提供商託管環境 完全A

Zookeeper叢集部署配置(三)

在上一篇部落格中我們講解了《Zookeeper的單機配置》,此篇部落格將繼續介紹Zookeeper的叢集部署與配置。 環境 叢集配置的環境與單機配置的環境相同,唯一不同的就是叢集是在多臺伺服器之間配置,當然也有偽叢集的配置,也就是在同一臺機器上配置多臺服

Swarm容器叢集部署節點管理

使用Swarm前提: Docker版本1.12+ 叢集節點之間保證TCP 2377(叢集管理)、TCP/UDP 7946(容器網路發現)和UDP 4789(Overlay網路)埠通訊 節點規劃: 作業系統:CentOS7.4_x64 Manager W

docke swarm叢集部署節點管理

管理節點初始化Swarm docker swarm init --advertise-addr <管理節點IP> --advertise-addr 引數定義Manager節點使用的IP。如果成功的話,會出現 docker swarm join  --token

部署Hadoop環境(四)HBase分散式叢集部署設計

HBase是一個高可靠、高效能、面向列、可伸縮的分散式儲存系統,利用Hbase技術可在廉價PC Server上搭建 大規模結構化儲存叢集。 HBase 是Google Bigtable 的開源實現,與Google Bigtable 利用GFS作為其檔案儲存系統類似, HBase 利用Hado

kafka叢集部署配置手冊

本文中包含了一套kafka叢集的部署、配置、除錯和壓測的技術方法。 在三個主機節點上進行部署。 server1:192.168.10.1 server2:192.168.10.2 server3:192.168.10.3 1、jdk7u80的安裝與配置 rpm -iv

Kafka(二): Kafka 叢集部署使用

一、Kafka 叢集部署                                                               Kafka是一種分散式的釋出(producer)/訂

Kafka叢集部署shell命令列操作

1、kafka簡介 在流式計算中,Kafka一般用來快取資料,Storm通過消費Kafka的資料進行計算。 KAFKA + STORM +REDIS 1、Apache Kafka是一個開源訊息系統,由Scala寫成。是由Apache軟體基金會開發的一個開源

rocketmq學習(二) rocketmq叢集部署圖形化控制檯安裝

1.rocketmq圖形化控制檯安裝   雖然rocketmq為使用者提供了使用命令列管理主題、消費組以及broker配置的功能,但對於不夠熟練的非運維人員來說,命令列的管理介面還是較難使用的。為此,我們可以使用圖形化的管理介面來簡化管理操作。   rocketmq官方推薦的圖形化控制檯目前還處在不成熟的孵化

一步一步完成如何在現有的CDH叢集部署一個CDH版本不同的spark

首先當然是下載一個spark原始碼,在http://archive.cloudera.com/cdh5/cdh/5/中找到屬於自己的原始碼,自己編譯打包,有關如何編譯打包可以參考一下我原來寫的文章: http://blog.csdn.net/xiao_jun_0820/ar

多節點高可用Eureka叢集配置部署

前言 上一節講的是動態擴容Eureka服務,說實話,一般情況這種操作並不多,一般多用在,由於大量服務節點部署後給Eureka造成壓力突然積增,而解決的辦法。這節講的是一次啟動或部署,直接就是叢集多節點的,多用於服務節點相對穩定的場景。還有筆者這裡有實際部署和應用的經驗分享給大家,就是,我目前25

基於centos7.2最小化環境, cdh manager 及 cdh 叢集部署過程常見問題整理

注:此篇文章主要面向對hadoop有一定了解的開發和運維人員,若是初次接觸hadoop叢集,具體安裝過程請更多參考Ambari的安裝部署教程:http://blog.csdn.net/balabalayi/article/details/64920537 CDH Manager的部署與安裝與Am

0013-如何在Kerberos非Kerberos的CDH叢集BDR不可用時複製資料

溫馨提示:要看高清無碼套圖,請使用手機開啟並單擊圖片放大檢視。 1.概述 本文件描述了在Kerberos與非Kerberos的CDH叢集之間BDR不可用的情況下實現資料互導。文件主要講述 1.測試叢集環境描述 2.CDH的BDR功能驗證 3.叢集之間資料複製要求和限制 4.叢集之間資料複

Zookeeper叢集設計安裝部署(最完整版)

首先準備好3個節點分別為hadoop01、hadoop02、hadoop03,接下來帶著大家一起搭建最小規模的Zookeeper分散式叢集。 1.叢集規劃 1.1主機規劃 使用準備的3個節點,搭建一個最小規模的Zookeeper分散式叢集。 1.2軟體規劃

大資料學習之---CDH叢集版本部署

1、軟體環境和IP規劃 RHEL6 角色 jdk-8u45apache-maven-3.3.9 hive-1.1.0-cdh5.7.1-src.tar.gz  hadoop-2.8.1.tar.gz mysql-connector-java-6.0.6.tar.gz

Redis安裝叢集部署

安裝 當前安裝環境為: - 系統:Ubuntu 14.04.4 LTS - 單節:虛擬機器單節點 - 地址:192.168.0.18 - 硬體:1核1G apt-get包管理安裝 安裝 apt-get install re

Kubernetes(k8s)有狀態叢集服務部署管理_Kubernetes中文社群

2016年12月2日-3日,ArchSummit2016全球架構師峰會在北京國際會議中心如期舉行。時速雲架構師張壽紅應邀參加,並在微服務與容器實踐專場做了《Kubernetes有狀態叢集服務部署與管理》的乾貨分享。 ▼Tips: 關注時速雲公眾號(tenxcloud2),回覆 “1206 “即

Hadoop叢集部署實戰(cdh發行版)

投稿作者:趙海軍 現就職於一家創業公司任職運維兼DBA,曾就職於獵豹移動,負責資料庫團隊,運維前線作者之一。 一、概要 由於工作需要,最近一段時間開始接觸學習hadoop相關的東西,目前公司的實時任務和離線任務都跑在一個hadoop叢集,離線任務的特點就是每天定時跑,任務跑完了資源就空閒了,

hbase高可用叢集部署cdh

作者簡介:趙海軍 現就職於某創業公司任職運維兼DBA,曾就職於獵豹移動,負責資料庫團隊,運維前線作者之一。 一、概要 本文記錄hbase高可用叢集部署過程,在部署hbase之前需要事先部署好hadoop叢集,因為hbase的資料需要存放在hdfs上,hadoop叢集的部署後續會有一篇文章記錄,本文假