1. 程式人生 > 其它 >[OPENSTACK] 新建例項無限SCHEDULING

[OPENSTACK] 新建例項無限SCHEDULING

症狀:

  • 建立例項時一直處在build狀態,無限scheduling,且無論多長時間Horizon都不顯示任何錯誤
  • neutron-server.log無ERROR或WARNING

出現這種情況還有可能是其他原因導致的,請根據自己的情況和以下的日誌來判斷。

原因:

  • nova-conductor無法連線到rabbitmq訊息佇列,導致其無法與其他元件聯絡,無法控制nova-compute部署例項。
  • nova-conductor雖然程序顯示配置檔案為/etc/nova/nova.conf,但是實際上不從nova.conf中讀取配置,nova-conductor直接從mysql資料庫中讀取自己的配置。

診斷過程:

一般來說我們遇到Horizon建立例項的時候,如果長時間卡在Scheduling或者Spawning,最後會扔一個錯誤,在我接觸過的案例中,十有八九都是neutron-server的鍋,然而這次不是。

  • 按照我以往的經驗去尋找neutron-server.log中的錯誤訊息,結果一無所獲,只找到了以下這一條訊息:

2018-01-23 16:16:20.657 18341 WARNING keystonemiddleware.auth_token [-] A valid token was submitted as a service token, but it was not a valid service token. This is incorrect but backwards compatible behaviour. This will be removed in future releases.

但這並不是問題的原因,這條警告訊息是OpenStack的一個bug,可自行Google,在devstack中已沒有這條警告訊息。

此時已經可以排除neutron-server的問題了,沒有更多的警告訊息,且當前警告訊息無用

  • 接下來能做的就是尋找其他元件的日誌,來尋找問題
  • 實際上我在這裡犯了個錯誤
  • 因為是遠端除錯,延遲比較大,所以我就直接tail了nova各個元件的日誌,又因為我更改systemctl restart nova-*.service,所以沒有看到任何日誌。
  • nova-conductor這個元件又比較有意思,他啟動的時候不會驗證當前配置,他啟動的時候只會打印出以下日誌,所以我忽略了

# ...
2018-01-23 06:40:16.981 8090 INFO nova.service [-] Starting conductor node (version 16.0.3)
2018-01-23 06:40:16.991 8086 INFO nova.service [-] Starting conductor node (version 16.0.3)
2018-01-23 06:40:16.995 8091 INFO nova.service [-] Starting conductor node (version 16.0.3)
2018-01-23 06:40:16.996 8092 INFO nova.service [-] Starting conductor node (version 16.0.3)
# End

後來一個一個log檔案cat,的確是看到這個了,去檢查了一遍nova.conf和neutron.conf,發現rabbitmq憑據配置無誤,就跳過了

最後重新啟動一個例項,重新tail nova* 才發現了nova-conductor又報這個錯誤,這就很有意思了,錯誤如下:

2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server [req-eb9ca3b8-58d8-4103-b8b0-d5b9a3d706b1 244f2e998d2e424d8da726ccec82588b 74ff9f41f424407d99df3e6607de120f - default default] Exception during message handling: AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile.
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/conductor/manager.py", line 1172, in schedule_and_build_instances
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server limits=host['limits'])
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/nova/compute/rpcapi.py", line 1124, in build_and_run_instance
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server limits=limits)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 147, in cast
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server self.transport._send(self.target, msg_ctxt, msg, retry=self.retry)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 123, in _send
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server timeout=timeout, retry=retry)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 578, in send
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server retry=retry)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 539, in _send
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server with self._get_connection(rpc_common.PURPOSE_SEND) as conn:
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 494, in _get_connection
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server purpose=purpose)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/common.py", line 399, in __init__
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server self.connection = connection_pool.get()
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/pool.py", line 107, in get
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server return self.create()
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/pool.py", line 144, in create
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server return self.connection_cls(self.conf, self.url, purpose)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py", line 614, in __init__
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server self.ensure_connection()
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py", line 713, in ensure_connection
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server self.ensure(method=self.connection.connect)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py", line 807, in ensure
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server ret, channel = autoretry_method()
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 494, in _ensured
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server return fun(*args, **kwargs)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 569, in __call__
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server self.revive(self.connection.default_channel)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 819, in default_channel
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server self.connection
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 802, in connection
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server self._connection = self._establish_connection()
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 757, in _establish_connection
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server conn = self.transport.establish_connection()
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 130, in establish_connection
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server conn.connect()
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 300, in connect
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server self.drain_events(timeout=self.connect_timeout)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 464, in drain_events
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server return self.blocking_read(timeout)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 469, in blocking_read
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server return self.on_inbound_frame(frame)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 68, in on_frame
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server callback(channel, method_sig, buf, None)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 473, in on_inbound_method
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server method_sig, payload, content,
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 142, in dispatch_method
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server listener(*args)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/amqp/connection.py", line 595, in _on_close
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server (class_id, method_id), ConnectionError)
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile.
2018-01-23 06:26:18.952 7478 ERROR oslo_messaging.rpc.server

nova-conductor在連線到RabbitMQ訊息佇列的時候,被拒絕訪問了

AccessRefused: (0, 0): (403) ACCESS_REFUSED - Login was refused using authentication mechanism AMQPLAIN. For details see the broker logfile.

根據這個提示我們來找rabbitmq的日誌,[email protected]

發現其中有如下錯誤資訊

=ERROR REPORT==== 23-Jan-2018::04:11:23 ===
closing AMQP connection <0.6917.0> (10.0.0.30:59652 -> 10.0.0.30:5672):
{handshake_error,starting,0,
{amqp_error,access_refused,
"AMQPLAIN login refused: user 'openstack' - invalid credentials",
'connection.start_ok'}}

=INFO REPORT==== 23-Jan-2018::04:29:59 ===
accepting AMQP connection <0.7038.0> (10.0.0.30:34068 -> 10.0.0.30:5672)

=INFO REPORT==== 23-Jan-2018::04:30:00 ===
accepting AMQP connection <0.7055.0> (10.0.0.30:34096 -> 10.0.0.30:5672)

=ERROR REPORT==== 23-Jan-2018::04:30:03 ===
closing AMQP connection <0.7055.0> (10.0.0.30:34096 -> 10.0.0.30:5672):
{handshake_error,starting,0,
{amqp_error,access_refused,
"AMQPLAIN login refused: user 'openstack' - invalid credentials",
'connection.start_ok'}}

到這裡問題就很明瞭了,但是我們仍然不知道為什麼會拒絕連線,上面說到過的,檢查nova.conf和neutron.conf沒有任何問題,rabbitmq連線測試正常

已經沒有招了。

問題根源/解決問題:

如果你systemctl status nova-conductor.service, 你會發現他有指向配置檔案--config-file=/etc/nova/nova.conf

● nova-conductor.service - OpenStack Compute Conductor
Loaded: loaded (/lib/systemd/system/nova-conductor.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-01-23 06:40:14 PST; 1 day 13h ago
Main PID: 8036 (nova-conductor)
Tasks: 25
Memory: 1.8G
CPU: 1h 32min 57.730s
CGroup: /system.slice/nova-conductor.service
├─8036 /usr/bin/python /usr/bin/nova-conductor --config-file=/etc/nova/nova.conf --log-file=/var/log/nova/nova-conductor.log
├─......
└─8098 /usr/bin/python /usr/bin/nova-conductor --config-file=/etc/nova/nova.conf --log-file=/var/log/nova/nova-conductor.log

Jan 23 06:40:14 controller systemd[1]: Starting OpenStack Compute Conductor...
Jan 23 06:40:14 controller systemd[1]: Started OpenStack Compute Conductor.

但是他並不從nova.conf中讀取transport_url!!!!

他只從nova.conf中讀取connection = mysql+pymysql:欄位,獲取到連線資料庫的憑據,然後從資料庫讀取配置!!

WTF???(我相信大部分人和我的反應是一樣的)

實際上如果小心翼翼的按照教程一步一步來或許就不會出現這樣的問題,在配置nova的時候,會有一段往mysql資料庫中寫入資料,然後sync_db的配置步驟

這一步中,有一句

su -s /bin/bash nova -c "nova-manage cell_v2 create_cell --name cell1 \
--database_connection mysql+pymysql://nova:[email protected]/nova \
--transport-url rabbit://openstack:[email protected]:5672"

如果這裡寫錯了transport-url,那就完蛋了,因為他直接把這個寫到資料庫裡了

但是幸運的是,他明文儲存的密碼和使用者名稱,那就讓我們來跑一下聯合查詢,先以root身份登陸mysql server,然後執行以下語句

此語句是從資料庫中查詢’openstack’欄位,當然這已經不算是個聯合查詢了,因為我已經知道了nova_api.cell_mappings了,直接從這裡面找即可

SELECT * FROM `nova_api`.`cell_mappings` WHERE `created_at` LIKE BINARY '%openstack%' OR `updated_at` LIKE BINARY '%openstack%' OR `id` LIKE BINARY '%openstack%' OR `uuid` LIKE BINARY '%openstack%' OR `name` LIKE BINARY '%openstack%' OR `transport_url` LIKE BINARY '%openstack%' OR `database_connection` LIKE BINARY '%openstack%' ;

永遠不要嘗試在大規模生產環境中執行聯合查詢!!

結果如下

+---------------------+------------+----+--------------------------------------+-------+--------------------------------------------------------+----------------------------------------------------------+
| created_at | updated_at | id | uuid | name | transport_url | database_connection |
+---------------------+------------+----+--------------------------------------+-------+--------------------------------------------------------+----------------------------------------------------------+
| 2018-01-23 08:37:50 | NULL | 2 | 4952f0f2-***** | cell1 | rabbit://openstack:******@10.0.0.30:5672 | mysql+pymysql://nova:********@10.0.0.30/nova |
+---------------------+------------+----+--------------------------------------+-------+--------------------------------------------------------+----------------------------------------------------------+
1 row in set (0.00 sec)

發現rabbit://openstack:******@10.0.0.30:5672的密碼是錯誤的,修改即可

來源:https://blog.silversky.moe/works/openstack-lanuch-instance-infinite-scheduling