1. 程式人生 > >rabbitmq線上遇到的問題以及叢集部署遇到的坑

rabbitmq線上遇到的問題以及叢集部署遇到的坑

作業系統發行版:CentOS7

RabbitMQ版本:3.6.11

伺服器主機規劃:

10.168.17.102 mq07.mq-cluster.mall.lt.com

10.168.17.98 mq08.mq-cluster.mall.lt.com

10.168.17.64 mq09.mq-cluster.mall.lt.com

 

1,在三臺伺服器上分別編輯以下檔案:

vim /etc/rabbitmq/rabbitmq-env.conf

[email protected]

 

vim /etc/rabbitmq/rabbitmq-env.conf

[email protected]

 

vim /etc/rabbitmq/rabbitmq-env.conf

[email protected]

這裡最好配置一下NODENAME。

 

2,新增解析,修改配置檔案/etc/hosts

10.168.17.102 mq07.mq-cluster.mall.lt.com mq07-mq-cluster

10.168.17.98 mq08.mq-cluster.mall.lt.com mq08-mq-cluster

10.168.17.64 mq09.mq-cluster.mall.lt.com mq09-mq-cluster

注意:hosts中配置的這幾條後面的簡稱主機名必須跟上面的NODENAME變數中@後面的那個字串一致

 

 

3,/usr/lib/systemd/system/rabbitmq-server.service

務必注意,centos7上的rabbitmq和es之類的service檔案中必須指定下面標黃的兩個引數,不然systemd不會去讀取/etc/security/limits.conf配置,也就是不生效,rabbitmq的disk節點一旦打滿會導致整個叢集掛掉;今天就是遇到了這個線上的問題,可開啟檔案描述符耗盡,導致rabbitmq叢集掛掉,而且重啟後立即掛掉,因為業務比較繁忙,所以導致重啟後的rabbitmq會立即耗盡1024。

說明:預設安裝rabbitmq之後,直接啟動,檔案描述符為1024,proc也是1024,即使你修改了/etc/security/limits.conf以及limits.conf.d目錄下的子檔案為65536,依然如此,這一點務必注意;

[Unit]

Description=RabbitMQ broker

After=syslog.target network.target

 

[Service]

Type=notify

User=rabbitmq

Group=rabbitmq

LimitNOFILE=65536

LimitNPROC=65535

WorkingDirectory=/var/lib/rabbitmq

ExecStart=/usr/sbin/rabbitmq-server

ExecStop=/usr/sbin/rabbitmqctl stop

ExecStop=/bin/sh -c "while ps -p $MAINPID >/dev/null 2>&1; do sleep 1; done"

NotifyAccess=all

TimeoutStartSec=3600

 

[Install]

WantedBy=multi-user.target

 

4,配置檔案

預設是0.4,現在改成是0.8,機器的記憶體為64G。

建立或修改配置檔案:

/etc/rabbitmq/rabbitmq.config

[

{rabbit,

[

{vm_memory_high_watermark, 0.8}

%% {vm_memory_high_watermark, {absolute, "40G"}}

]

}

].

 

注意:最後面的點結尾“.”

 

5,問題:

[[email protected] ~]# journalctl -xe

Oct 19 19:48:04 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: Error: Failed to initialize erlang distribution: {{shutdown,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {failed_to_start_child,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: auth,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {"Cookie file ./.erlang.cookie must be accessible by owner only",

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{auth,init_cookie,0,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"auth.erl"},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,286}]},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {auth,init,1,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"auth.erl"},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,140}]},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {gen_server,init_it,2,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: "gen_server.erl"},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,365}]},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {gen_server,init_it,6,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: "gen_server.erl"},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,333}]},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {proc_lib,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: init_p_do_apply,3,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"proc_lib.erl"},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,247}]}]}}},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {child,undefined,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: net_sup_dynamic,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {erl_distribution,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: start_link,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [['rabbitmq-cli-27',

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: shortnames],

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: false]},

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: permanent,1000,supervisor,

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [erl_distribution]}}.

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service: control process exited, code=exited status=75

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: Failed to start RabbitMQ broker.

-- Subject: Unit rabbitmq-server.service has failed

-- Defined-By: systemd

-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

--

-- Unit rabbitmq-server.service has failed.

--

-- The result is failed.

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: Unit rabbitmq-server.service entered failed state.

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service failed.

Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com polkitd[1055]: Unregistered Authentication Agent for unix-process:4237:24929114 (system bus name :1.6179, object path /org/freedesktop/PolicyKit1/AuthenticationAgen

 

解決辦法:

chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie

chmod 600 /var/lib/rabbitmq/.erlang.cookie

 

6,建立賬號

rabitmqctl enable rabbitmq_management

rabbitmqctl add_user limu 123456

rabbitmqctl set_user_tags limu administrator

rabbitmqctl set_permissions -p / limu ".*" ".*" ".*"

 

7,問題

[[email protected] ~]# systemctl status rabbitmq-server.service

● rabbitmq-server.service - RabbitMQ broker

Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: disabled)

Active: failed (Result: exit-code) since Fri 2018-10-19 20:02:17 CST; 9s ago

Process: 20821 ExecStop=/bin/sh -c while ps -p $MAINPID >/dev/null 2>&1; do sleep 1; done (code=exited, status=0/SUCCESS)

Process: 20481 ExecStop=/usr/sbin/rabbitmqctl stop (code=exited, status=0/SUCCESS)

Process: 20202 ExecStart=/usr/sbin/rabbitmq-server (code=exited, status=1/FAILURE)

Main PID: 20202 (code=exited, status=1/FAILURE)

 

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: attempted to contact: ['[email protected]']

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: [email protected]:

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: * unable to connect to epmd (port 4369) on mq07-mq-cluster: address (cannot connect to host/port)

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: current node details:

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - node name: '[email protected]'

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - home dir: .

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - cookie hash: 5lJVl9Km+lOXAsr8i4xIVA==

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: Failed to start RabbitMQ broker.

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: Unit rabbitmq-server.service entered failed state.

Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service failed.

 

最終問題:

這個報錯資訊的意思是:無法解析mq07-mq-cluster主機名,或者解析了該域名得到的IP地址不是本機的。

解決辦法:

1,場景一:本機機器IP為10.168.17.102,但/etc/hosts錯配置成了10.168.17.10 mq07.mq-cluster.mall.lt.com mq07-mq-cluster。

修正IP即可10.168.17.102 mq07.mq-cluster.mall.lt.com mq07-mq-cluster

 

2,場景二:/etc/rabbitmq/rabbitmq-env.conf檔案中[email protected],但是/etc/hosts中配置的是

10.168.17.64 mq09.mq-cluster.mall.lt.com mq09-cluster

解決辦法:把/etc/hosts中的mq09-cluster改成mq09-mq-cluster

 

8,新增映象佇列的策略

因為策略是針對vhost新增的,所以每新增一個vhost,都要執行一下新增映象佇列的這條命令

rabbitmqctl set_policy -p /admin "ha-allqueue" '{"ha-mode":"all","ha-sync-mode":"automatic"}