rabbitmq線上遇到的問題以及叢集部署遇到的坑
作業系統發行版:CentOS7
RabbitMQ版本:3.6.11
伺服器主機規劃:
10.168.17.102 mq07.mq-cluster.mall.lt.com
10.168.17.98 mq08.mq-cluster.mall.lt.com
10.168.17.64 mq09.mq-cluster.mall.lt.com
1,在三臺伺服器上分別編輯以下檔案:
vim /etc/rabbitmq/rabbitmq-env.conf
vim /etc/rabbitmq/rabbitmq-env.conf
vim /etc/rabbitmq/rabbitmq-env.conf
這裡最好配置一下NODENAME。
2,新增解析,修改配置檔案/etc/hosts
10.168.17.102 mq07.mq-cluster.mall.lt.com mq07-mq-cluster
10.168.17.98 mq08.mq-cluster.mall.lt.com mq08-mq-cluster
10.168.17.64 mq09.mq-cluster.mall.lt.com mq09-mq-cluster
注意:hosts中配置的這幾條後面的簡稱主機名必須跟上面的NODENAME變數中@後面的那個字串一致
3,/usr/lib/systemd/system/rabbitmq-server.service
務必注意,centos7上的rabbitmq和es之類的service檔案中必須指定下面標黃的兩個引數,不然systemd不會去讀取/etc/security/limits.conf配置,也就是不生效,rabbitmq的disk節點一旦打滿會導致整個叢集掛掉;今天就是遇到了這個線上的問題,可開啟檔案描述符耗盡,導致rabbitmq叢集掛掉,而且重啟後立即掛掉,因為業務比較繁忙,所以導致重啟後的rabbitmq會立即耗盡1024。
說明:預設安裝rabbitmq之後,直接啟動,檔案描述符為1024,proc也是1024,即使你修改了/etc/security/limits.conf以及limits.conf.d目錄下的子檔案為65536,依然如此,這一點務必注意;
[Unit]
Description=RabbitMQ broker
After=syslog.target network.target
[Service]
Type=notify
User=rabbitmq
Group=rabbitmq
LimitNOFILE=65536
LimitNPROC=65535
WorkingDirectory=/var/lib/rabbitmq
ExecStart=/usr/sbin/rabbitmq-server
ExecStop=/usr/sbin/rabbitmqctl stop
ExecStop=/bin/sh -c "while ps -p $MAINPID >/dev/null 2>&1; do sleep 1; done"
NotifyAccess=all
TimeoutStartSec=3600
[Install]
WantedBy=multi-user.target
4,配置檔案
預設是0.4,現在改成是0.8,機器的記憶體為64G。
建立或修改配置檔案:
/etc/rabbitmq/rabbitmq.config
[
{rabbit,
[
{vm_memory_high_watermark, 0.8}
%% {vm_memory_high_watermark, {absolute, "40G"}}
]
}
].
注意:最後面的點結尾“.”
5,問題:
[[email protected] ~]# journalctl -xe
Oct 19 19:48:04 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: Error: Failed to initialize erlang distribution: {{shutdown,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {failed_to_start_child,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: auth,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {"Cookie file ./.erlang.cookie must be accessible by owner only",
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{auth,init_cookie,0,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"auth.erl"},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,286}]},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {auth,init,1,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"auth.erl"},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,140}]},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {gen_server,init_it,2,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: "gen_server.erl"},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,365}]},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {gen_server,init_it,6,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: "gen_server.erl"},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,333}]},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {proc_lib,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: init_p_do_apply,3,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"proc_lib.erl"},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,247}]}]}}},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {child,undefined,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: net_sup_dynamic,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {erl_distribution,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: start_link,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [['rabbitmq-cli-27',
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: shortnames],
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: false]},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: permanent,1000,supervisor,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [erl_distribution]}}.
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service: control process exited, code=exited status=75
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: Failed to start RabbitMQ broker.
-- Subject: Unit rabbitmq-server.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit rabbitmq-server.service has failed.
--
-- The result is failed.
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: Unit rabbitmq-server.service entered failed state.
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service failed.
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com polkitd[1055]: Unregistered Authentication Agent for unix-process:4237:24929114 (system bus name :1.6179, object path /org/freedesktop/PolicyKit1/AuthenticationAgen
解決辦法:
chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
chmod 600 /var/lib/rabbitmq/.erlang.cookie
6,建立賬號
rabitmqctl enable rabbitmq_management
rabbitmqctl add_user limu 123456
rabbitmqctl set_user_tags limu administrator
rabbitmqctl set_permissions -p / limu ".*" ".*" ".*"
7,問題
[[email protected] ~]# systemctl status rabbitmq-server.service
● rabbitmq-server.service - RabbitMQ broker
Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2018-10-19 20:02:17 CST; 9s ago
Process: 20821 ExecStop=/bin/sh -c while ps -p $MAINPID >/dev/null 2>&1; do sleep 1; done (code=exited, status=0/SUCCESS)
Process: 20481 ExecStop=/usr/sbin/rabbitmqctl stop (code=exited, status=0/SUCCESS)
Process: 20202 ExecStart=/usr/sbin/rabbitmq-server (code=exited, status=1/FAILURE)
Main PID: 20202 (code=exited, status=1/FAILURE)
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: attempted to contact: ['[email protected]']
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: [email protected]:
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: * unable to connect to epmd (port 4369) on mq07-mq-cluster: address (cannot connect to host/port)
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: current node details:
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - node name: '[email protected]'
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - home dir: .
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - cookie hash: 5lJVl9Km+lOXAsr8i4xIVA==
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: Failed to start RabbitMQ broker.
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: Unit rabbitmq-server.service entered failed state.
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service failed.
最終問題:
這個報錯資訊的意思是:無法解析mq07-mq-cluster主機名,或者解析了該域名得到的IP地址不是本機的。
解決辦法:
1,場景一:本機機器IP為10.168.17.102,但/etc/hosts錯配置成了10.168.17.10 mq07.mq-cluster.mall.lt.com mq07-mq-cluster。
修正IP即可10.168.17.102 mq07.mq-cluster.mall.lt.com mq07-mq-cluster
2,場景二:/etc/rabbitmq/rabbitmq-env.conf檔案中[email protected],但是/etc/hosts中配置的是
10.168.17.64 mq09.mq-cluster.mall.lt.com mq09-cluster
解決辦法:把/etc/hosts中的mq09-cluster改成mq09-mq-cluster
8,新增映象佇列的策略
因為策略是針對vhost新增的,所以每新增一個vhost,都要執行一下新增映象佇列的這條命令
rabbitmqctl set_policy -p /admin "ha-allqueue" '{"ha-mode":"all","ha-sync-mode":"automatic"}