【openstack】【nova】nova 刪除雲主機流程及程式碼分析
阿新 • • 發佈:2019-02-20
正常情況下刪除雲主機的操作,主要採用如下兩個方式:1. is_local_delete = True 採用local_delete()2. is_local_delete = False 採用compute_rpcapi.terminate_instance()雲主機在如下的狀態下vm_statesvm_states.SHELVED, vm_states.SHELVED_OFFLOADED下,會採用其他方式。當發生如下兩種常見場景時,會採用上述的刪除流程:1)nova-compute已經down了一段時間,nova_api檢測服務狀態時(nova service-list),發現nova-compute已經down(但該節點的所有云主機程序還在 )2)nova api 檢測服務狀態時(nova service-list),發現nova-compute正常(但nova_compute服務可能正常,也可能down)。檢測方式:nova-compute每隔10s向資料庫寫入一次更新時間,nova-api檢測時會根據該更新時間來作判斷。當超過service_down_time=60s,則會認為nova-compute服務已經down兩種刪除場景,會出現如下問題(後端都採用ceph儲存):第一種場景:當雲主機是從cinder volume建立時,刪除該雲主機會存在系統盤殘留。原因:系統盤對應的rbd image被加鎖第二種場景:當nova_api檢測時,發現compute是正常的,但是實際已經down了。此時,刪除雲主機,會一直處於deleting狀態,直達nova_compute正常時,才會正常刪除註釋:ceph建立rbd時預設是不加鎖的,只有當被使用時,才會對其加鎖。第一種場景時,系統盤建立後會被attach到雲主機,因此會被加鎖。 正常情況下,硬碟掛載到雲主機時,是不會加鎖的,直到在雲主機內將其mount程式碼重點分析(liberty):nova/compute/api.pydef _delete_instance(self, context, instance): self._delete(context, instance, 'delete', self._do_delete, task_state=task_states.DELETING)#####cb=self._do_delete(),instance_attrs= task_states.DELETINGdef _delete(self, context, instance, delete_type, cb, **instance_attrs): if instance.disable_terminate: LOG.info(_LI('instance termination disabled'), instance=instance) return bdms = objects.BlockDeviceMappingList.get_by_instance_uuid( context, instance.uuid) project_id, user_id = quotas_obj.ids_from_instance(context, instance) # At these states an instance has a snapshot associate. if instance.vm_state in (vm_states.SHELVED, vm_states.SHELVED_OFFLOADED): snapshot_id = instance.system_metadata.get('shelved_image_id') LOG.info(_LI("Working on deleting snapshot %s " "from shelved instance..."), snapshot_id, instance=instance) try: self.image_api.delete(context, snapshot_id) except (exception.ImageNotFound, exception.ImageNotAuthorized) as exc: LOG.warning(_LW("Failed to delete snapshot " "from shelved instance (%s)."), exc.format_message(), instance=instance) except Exception: LOG.exception(_LE("Something wrong happened when trying to " "delete snapshot from shelved instance."), instance=instance) original_task_state = instance.task_state quotas = None try: # NOTE(maoy): no expected_task_state needs to be set instance.update(instance_attrs) instance.progress = 0 # NOTE(comstud): If we delete the instance locally, we'll # commit the reservations here. Otherwise, the manager side # will commit or rollback the reservations based on success. quotas = self._create_reservations(context, instance, original_task_state, project_id, user_id) //配額管理 if self.cell_type == 'api': # NOTE(comstud): If we're in the API cell, we need to # skip all remaining logic and just call the callback, # which will cause a cast to the child cell. Also, # commit reservations here early until we have a better # way to deal with quotas with cells. cb(context, instance, bdms, reservations=None) quotas.commit() return shelved_offloaded = (instance.vm_state == vm_states.SHELVED_OFFLOADED) if not instance.host and not shelved_offloaded: try: compute_utils.notify_about_instance_usage( self.notifier, context, instance, "%s.start" % delete_type) instance.destroy() compute_utils.notify_about_instance_usage( self.notifier, context, instance, "%s.end" % delete_type, system_metadata=instance.system_metadata) quotas.commit() return except exception.ObjectActionError: instance.refresh() if instance.vm_state == vm_states.RESIZED: self._confirm_resize_on_deleting(context, instance) is_local_delete = True try: if not shelved_offloaded: service = objects.Service.get_by_compute_host( context.elevated(), instance.host) is_local_delete = not self.servicegroup_api.service_is_up( service)//檢查服務狀態 if not is_local_delete: if original_task_state in (task_states.DELETING, task_states.SOFT_DELETING): LOG.info(_LI('Instance is already in deleting state, ' 'ignoring this request'), instance=instance) quotas.rollback() return self._record_action_start(context, instance, instance_actions.DELETE) # NOTE(snikitin): If instance's vm_state is 'soft-delete', # we should not count reservations here, because instance # in soft-delete vm_state have already had quotas # decremented. More details: # https://bugs.launchpad.net/nova/+bug/1333145 if instance.vm_state == vm_states.SOFT_DELETED: quotas.rollback() cb(context, instance, bdms, reservations=quotas.reservations)//對應第二場景,具體方法:def _do_delete(): except exception.ComputeHostNotFound: pass if is_local_delete: # If instance is in shelved_offloaded state or compute node # isn't up, delete instance from db and clean bdms info and # network info self._local_delete(context, instance, bdms, delete_type, cb)//對應第一種場景方法 quotas.commit() except exception.InstanceNotFound: # NOTE(comstud): Race condition. Instance already gone. if quotas: quotas.rollback() except Exception: with excutils.save_and_reraise_exception(): if quotas: quotas.rollback()# 場景二: def _do_delete(self, context, instance, bdms, reservations=None, local=False): if local: instance.vm_state = vm_states.DELETED instance.task_state = None instance.terminated_at = timeutils.utcnow() instance.save() else: self.compute_rpcapi.terminate_instance(context, instance, bdms, reservations=reservations, delete_type='delete')##通過訊息佇列將請求轉發到對應節點,然後呼叫driver(libvirt)刪除雲主機。# 場景一: def _local_delete(self, context, instance, bdms, delete_type, cb): if instance.vm_state == vm_states.SHELVED_OFFLOADED: LOG.info(_LI("instance is in SHELVED_OFFLOADED state, cleanup" " the instance's info from database."), instance=instance) else: LOG.warning(_LW("instance's host %s is down, deleting from " "database"), instance.host, instance=instance) if instance.info_cache is not None: instance.info_cache.delete() else: # NOTE(yoshimatsu): Avoid AttributeError if instance.info_cache # is None. When the root cause that instance.info_cache becomes # None is fixed, the log level should be reconsidered. LOG.warning(_LW("Info cache for instance could not be found. " "Ignore."), instance=instance) compute_utils.notify_about_instance_usage( self.notifier, context, instance, "%s.start" % delete_type) elevated = context.elevated() if self.cell_type != 'api': # NOTE(liusheng): In nova-network multi_host scenario,deleting # network info of the instance may need instance['host'] as # destination host of RPC call. If instance in SHELVED_OFFLOADED # state, instance['host'] is None, here, use shelved_host as host # to deallocate network info and reset instance['host'] after that. # Here we shouldn't use instance.save(), because this will mislead # user who may think the instance's host has been changed, and # actually, the instance.host is always None. orig_host = instance.host try: if instance.vm_state == vm_states.SHELVED_OFFLOADED: sysmeta = getattr(instance, obj_base.get_attrname('system_metadata')) instance.host = sysmeta.get('shelved_host') self.network_api.deallocate_for_instance(elevated, instance) finally: instance.host = orig_host # cleanup volumes for bdm in bdms: if bdm.is_volume: # NOTE(vish): We don't have access to correct volume # connector info, so just pass a fake # connector. This can be improved when we # expose get_volume_connector to rpc. connector = {'ip': '127.0.0.1', 'initiator': 'iqn.fake'} try: self.volume_api.terminate_connection(context, bdm.volume_id, connector) self.volume_api.detach(elevated, bdm.volume_id) if bdm.delete_on_termination: self.volume_api.delete(context, bdm.volume_id)//刪除系統盤,這裡無法刪除 except Exception as exc: err_str = _LW("Ignoring volume cleanup failure due to %s") LOG.warn(err_str % exc, instance=instance) bdm.destroy() cb(context, instance, bdms, local=True) sys_meta = instance.system_metadata instance.destroy() compute_utils.notify_about_instance_usage( self.notifier, context, instance, "%s.end" % delete_type, system_metadata=sys_meta)具體可以在刪除的程式碼中解除鎖定break_lock:cinder/volume/drivers/rbd.py def delete_volume(self, volume): """Deletes a logical volume.""" # NOTE(dosaboy): this was broken by commit cbe1d5f. Ensure names are # utf-8 otherwise librbd will barf. volume_name = utils.convert_str(volume['name']) ....... ....... def _try_remove_volume(client, volume_name): with RBDVolumeProxy(self, volume_name) as volume: locker_info = volume.list_lockers() if locker_info: LOG.debug(_("Unlock the rbd volume %s firstly."), volume_name) locker_client, locker_cookie, locker_address = locker_info["lockers"][0] volume.break_lock(client=locker_client, cookie=locker_cookie) self.RBDProxy().remove(client.ioctx, volume_name)
instance.save()//寫入資料庫
註釋:雖然可以刪除系統盤,但是當computer服務up後,相關的qemu-kvm程序會存在,但是nova-compute服務會週期性執行任務去cleanup此類instances。
之後會分析nova相關的定時任務