1. 程式人生 > >openstack原始碼解析之虛機建立

openstack原始碼解析之虛機建立

本文講的是openstack原始碼解析 虛擬機器建立流程 版本是icehouse版

首先先看架構圖,請求從nova-api發起,然後到nova-conductor,再到scheduler進行排程,排程選中某臺機器後,通過rpc請求,傳送到某臺機器上執行建立機器方法,期間會訪問glance獲取映象生成磁碟檔案,也會訪問neutron獲取網路相關資訊,最後呼叫libvirt,生成虛機,後面會逐個通過原始碼給大家講解。


nova-api

建立虛機,這裡從nova層面開始分析。通過http請求,帶著引數訪問到nova-api。

   nova/api/openstack/compute/servers.py

def create(self, req, body):
    if body and 'servers' in body:
        context = req.environ['nova.context']
        servers = body['servers']
        return self.create_servers(context, req, servers)

    順著create_server方法進去,進到_create這個方法中,這個方法會獲取引數資訊,比如機器名字,套餐等,做一些基本驗證,並且    呼叫compute_api的create方法

def _create(self, context, body, password):
            return self.compute_api.create(context,..

   compute_api指的是nova.compute.api.API

   nova/compute/api.py

def create(self, context, instance_type,
               image_href, kernel_id=None, ramdisk_id=None,
               min_count=None, max_count=None,
               display_name=None, display_description=None,
               key_name=None, key_data=None, security_group=None,
               availability_zone=None, user_data=None, metadata=None,
               injected_files=None, admin_password=None,
               block_device_mapping=None, access_ip_v4=None,
               access_ip_v6=None, requested_networks=None, config_drive=None,
               auto_disk_config=None, scheduler_hints=None, legacy_bdm=True):
               ...
               return self._create_instance(
                               context, instance_type,
                               image_href, kernel_id, ramdisk_id,
                               min_count, max_count,
                               display_name, display_description,
                               key_name, key_data, security_group,
                               availability_zone, user_data, metadata,
                               injected_files, admin_password,
                               access_ip_v4, access_ip_v6,
                               requested_networks, config_drive,
                               block_device_mapping, auto_disk_config,
                               scheduler_hints=scheduler_hints,
                               legacy_bdm=legacy_bdm)
   繼續跟進到_create_instance方法裡面,會做一系列引數驗證和封裝,進而插入資料庫instance的記錄,然後呼叫rpc請求
 def _create_instance(self, context, instance_type,
               image_href, kernel_id, ramdisk_id,
               min_count, max_count,
               display_name, display_description,
               key_name, key_data, security_groups,
               availability_zone, user_data, metadata,
               injected_files, admin_password,
               access_ip_v4, access_ip_v6,
               requested_networks, config_drive,
               block_device_mapping, auto_disk_config,
               reservation_id=None, scheduler_hints=None,
               legacy_bdm=True):
               ...
               for instance in instances:
                 self._record_action_start(context, instance,
                                      instance_actions.CREATE)
               self.compute_task_api.build_instances...
這個請求會跑到nova/conductor/rpcapi.py中的
def build_instances(self, context, instances, image, filter_properties,
            admin_password, injected_files, requested_networks,
            security_groups, block_device_mapping, legacy_bdm=True):
        image_p = jsonutils.to_primitive(image)
        cctxt = self.client.prepare(version='1.5')
        cctxt.cast(context, 'build_instances',
                   instances=instances, image=image_p,
                   filter_properties=filter_properties,
                   admin_password=admin_password,
                   injected_files=injected_files,
                   requested_networks=requested_networks,
                   security_groups=security_groups,
                   block_device_mapping=block_device_mapping,
                   legacy_bdm=legacy_bdm)

這個時候傳送了rpc請求,我們用的是 zmq點對點,傳送到conductor節點上,進到cctxt.cast這個方法裡面,看下nova/conductor/rpcapi.py這個檔案

def __init__(self):
        super(ComputeTaskAPI, self).__init__()
        target = messaging.Target(topic=CONF.conductor.topic,
                                  namespace='compute_task',
                                  version='1.0')
        serializer = objects_base.NovaObjectSerializer()
        self.client = rpc.get_client(target, serializer=serializer)

nova-conductor

進入到nova/conductor/manager.py這個檔案的build_instances方法

def build_instances(self, context, instances, image, filter_properties,
            admin_password, injected_files, requested_networks,
            security_groups, block_device_mapping, legacy_bdm=True):
            ...
            self.scheduler_rpcapi.new_run_instance(context,
                    request_spec=request_spec, admin_password=admin_password,
                    injected_files=injected_files,
                    requested_networks=requested_networks, is_first_time=True,
                    filter_properties=filter_properties,
                    legacy_bdm_in_spec=legacy_bdm)
我們這裡改造了下,直接用了new_run_instance這個方法,進去再看下  nova/scheduler/rpcapi.py
def new_run_instance(self, ctxt, request_spec, admin_password,
            injected_files, requested_networks, is_first_time,
            filter_properties, legacy_bdm_in_spec=True):

        msg_kwargs = {'request_spec': request_spec,
                      'admin_password': admin_password,
                      'injected_files': injected_files,
                      'requested_networks': requested_networks,
                      'is_first_time': is_first_time,
                      'filter_properties': filter_properties,
                      'legacy_bdm_in_spec': legacy_bdm_in_spec}
        cctxt = self.client.prepare()
        cctxt.cast(ctxt, 'new_run_instance', **msg_kwargs)

這個時候是傳送了zmq請求到了scheduler上了,具體的傳送過程,看下這個類的__init__方法即可。

 nova-scheduler

方法到了nova/scheduler/manager.py這個檔案中,我們看SchedulerManager這個類的new_run_instance方法

def new_run_instance(self, context, request_spec, admin_password,
            injected_files, requested_networks, is_first_time,
            filter_properties, legacy_bdm_in_spec=True):
            ...
            return self.driver.new_schedule_run_instance(context,
                        request_spec, admin_password, injected_files,
                        requested_networks, is_first_time, filter_properties,
                        legacy_bdm_in_spec)
這個用到了driver, 這個driver指的就是你使用的過濾器,可能是記憶體有限過濾,或者CPU, 或者硬碟。這塊我們選的驅動是nova.scheduler.filter_scheduler.FilterScheduler,我們進到這個驅動,看下new_schedule_run_instance這個方法。nova/scheduler/filter_scheduler.py, 類FilterScheduler下的new_schedule_run_instance方法:
def new_schedule_run_instance(self, context, request_spec,
                              admin_password, injected_files,
                              requested_networks, is_first_time,
                              filter_properties, legacy_bdm_in_spec):
        ...
        try:
            self._new_schedule_run_instance(context, request_spec,
                    admin_password, injected_files,
                    requested_networks, is_first_time,
                    filter_properties, legacy_bdm_in_spec)
        ..

這裡說說下host_queue, 這個是定時載入的,預設時間是10s, 在nova/scheduler/manager.py中

    @periodic_task.periodic_task(spacing=CONF.new_scheduler_build_queue_period,
                                 run_immediately=True)
    def build_queue(self, context):
        current = host_queue.QueueManager()
        current.init_host_queue(context)
        current.build_queue()
看下build_queue的具體實現 nova/scheduler/host_queue.py
def build_queue(self):
    ...
    # 從資料庫讀取compute節點
    self.compute_nodes = db.compute_node_get_all(elevated)
    for compute in self.compute_nodes:
        # 獲取extra_resources資訊
        extra_resources = compute.get('extra_resources')
        # 獲取hostname
        hostname = compute.get('hypervisor_hostname')
        # 獲取queue_name, 預設是kvm
        queue_name = extra_resources.get('queue_name')
        new_queue = []
        if not queue_name:
            queue_name = CONF.default_queue
        ...
        # 過濾掉disabled的機器
        if service['disabled']:
            LOG.warn("Compute service disabled %s", hostname)
            continue
        ...
        # 獲取磁碟,cpu, 記憶體超售比,這些值都是計算節點通過定時任務,彙報自己配置檔案資訊到資料庫中,具體的方法就是resource_tracker
        disk_allocation_ratio = extra_resources.get('disk_allocation_ratio', 1.0)
        cpu_allocation_ratio = extra_resources.get('cpu_allocation_ratio', 1.0)
        ram_allocation_ratio = extra_resources.get('ram_allocation_ratio', 1.0)
        ...
        # 獲取cpu總量,使用量,空閒量
        vcpus = compute['vcpus'] * cpu_allocation_ratio
        vcpus_used = compute['vcpus_used']
        free_vcpus = vcpus - compute['vcpus_used']
        limits['vcpu'] = vcpus

        local_gb = compute['local_gb'] * disk_allocation_ratio
        free_local_gb = local_gb - \
                        (compute['local_gb'] - compute['free_disk_gb'])
        limits['disk_gb'] = local_gb

        # memory_mb
        memory_mb = compute['memory_mb'] * ram_allocation_ratio
        free_memory_mb = memory_mb - \
                        (compute['memory_mb'] - compute['free_ram_mb'])
        limits['memory_mb'] = memory_mb
        ...
        # 生成物件值,放入QueueManager.host_info中
        QueueManager.host_info[hostname] = BaseQueue(
                     hostname=hostname,
                     vcpus=vcpus, vcpus_used=vcpus_used, free_vcpus=free_vcpus,
                     memory_mb=memory_mb,
                     free_memory_mb=free_memory_mb, local_gb=local_gb,
                     free_local_gb=free_local_gb, net_bandwidth=net_bandwidth,
                     net_bandwidth_used=net_bandwidth_used,
                     free_net_bandwidth=free_net_bandwidth,
                     disk_bandwidth=disk_bandwidth,
                     disk_bandwidth_used=disk_bandwidth_used,
                     free_disk_bandwidth=free_disk_bandwidth,
                     multi_disk_info=multi_disk_info,
                     updated_at=updated_at, queue_name=queue_name,
                     limits=limits)

我們再回過頭繼續看排程這塊,既然host_queue都有了,我們繼續往下看。nova/scheduler/filter_scheduler.py
def _new_schedule_run_instance(self, context, request_spec,
                              admin_password, injected_files,
                              requested_networks, is_first_time,
                              filter_properties, legacy_bdm_in_spec):
        ## 獲取引數
        ..
        ## 這裡引數中如果指定了scheduler_host,直接排程到指定物理機中去建立機器。
        if scheduler_host:
            self.schedule_instance_to_assigned_host(context, request_spec,
            admin_password, injected_files,
            requested_networks, is_first_time,
            filter_properties, legacy_bdm_in_spec,
            scheduler_host, disk_shares,
            instance_uuids, scheduler_hints)
         return
         ..
         ## 預設的queue_name叫kvm, 獲取佇列名字下的機器,這個是在host_queue檔案初始化的時候構建的。
         host_queue = self.get_host_queue(queue_name)

         # 如果有值,這個用的是正則匹配,匹配機器名字中含有scheduler_host_match值的機器
         if scheduler_host_match:
                host_queue = self._get_matched_host_queue(host_queue, scheduler_host_match)
                LOG.debug("matched host queue (%s): %s length is: %d", scheduler_host_match,
                        queue_name, len(host_queue))
          ...
          # 這裡設定了一個值,requested_disk值就是虛機根分割槽的大小,加上使用者分割槽,再加上swap空間大小,這個在後面比對會用上
          req_res['requested_disk'] = 1024 * (instance_type['root_gb'] +
                        instance_type['ephemeral_gb']) + \
                        instance_type['swap']
          # 這個方法就是直接排程獲取到匹配傳遞的引數的機器,這個在下面的方法中講解
          host = self._new_schedule(context, host_queue,
                            req_res, request_spec,
                            copy_filter_properties,
                            instance_uuid, retry,
                            different_host_flag,
                            different_host, disk_shares,
                            try_different_host, sign, boundary_host)

          # 獲取到機器了,這個時候就繼續傳送點對點請求,給對應的機器,去建立虛擬機器
          self.pool.spawn(self.compute_rpcapi.new_run_instance,
                            context, instance_uuid, host.hostname,
                            request_spec, copy_filter_properties,
                            requested_networks, injected_files,
                            admin_password, is_first_time,
                            host.hostname, legacy_bdm_in_spec, self._disk_info)
我們繼續來看_new_scheduler, 還是在這個檔案中
def _new_schedule(self, context, host_queue, req_res,
        request_spec, filter_properties,
        instance_uuid, retry=None,
        different_host_flag=None,
        different_host=None,
        disk_shares=None,
        try_different_host=None,
        sign=1,
        boundary_host=None):
        ..
        # 這個含義是,如果設定了different_host為true, 則虛機的排程,要排程到不同的物理機上。
          這裡的實現是通過check_host_different_from_uuids方法,每次選中的host放到數組裡面,
          然後下一次選中的host, 驗證下是否在這個數組裡面。
        if different_host:
            LOG.debug('instance %s different_host: %s', instance_uuid,
                different_host)
            if not self.check_host_different_from_uuids(context,
                instance_uuid, host, different_host):
                self._find_pos = self._find_pos + sign * 1
                continue
        # 這裡檢視資源是否充足
        resource_check = self.check_host_resource(context,
        host=host,
        req_res=req_res,
        disk_shares=disk_shares)
        # 如果匹配,返回host

我們繼續深入方法裡面,看下check_host_resource方法做了什麼(依然還在這個檔案中)

def check_host_resource(self, context, host, req_res,
                    disk_shares=0):
        ...
        # 檢查要申請的磁碟空間是否比物理機上空閒的磁碟大,如果大,就返回False, 告知check不通過
        usable_disk_mb = host.free_local_gb * 1024
        if not usable_disk_mb >= req_res['requested_disk']:
            return False

        # check 記憶體
		if req_res['requested_ram'] > 0:
		    usable_ram = host.free_memory_mb
		    if not usable_ram >= req_res['requested_ram']:
		        return False

		# check vcpus
		if req_res['requested_vcpus'] > 0:
		    if host.free_vcpus < req_res['requested_vcpus']:
		        return False
		return True

 nova-compute

通過rpc呼叫到對應的host節點,執行new_run_instance方法

def new_run_instance(self, context, instance_uuid, request_spec,
                 filter_properties, requested_networks,
                 injected_files, admin_password,
                 is_first_time, node, legacy_bdm_in_spec,
                 disk_info=None):


    # 一方面更新資料庫狀態,另外一方面,更新資源使用量
    if disk_info:
        instance = self._instance_update(
                    context, instance_uuid,
                    disk_shares=disk_info['disk_shares'],
                    selected_dir=disk_info['selected_dir'])
    else:
        instance = self._instance_update(context,
                instance_uuid)


    self.run_instance(context, instance, request_spec,
                filter_properties, requested_networks, injected_files,
                admin_password, is_first_time, node,
                legacy_bdm_in_spec)
繼續往下看,進入run_instance方法
def _run_instance(self, context, request_spec,
                  filter_properties, requested_networks, injected_files,
                  admin_password, is_first_time, node, instance,
                  legacy_bdm_in_spec):

      # 首先檢查下機器名字是否存在,然後更新下資料庫,改成building狀態
      self._prebuild_instance(context, instance)
    
      ...
      instance, network_info = self._build_instance(context,
              request_spec, filter_properties, requested_networks,
              injected_files, admin_password, is_first_time, node,
              instance, image_meta, legacy_bdm_in_spec)

這個時候,進入方法裡面,通過neutron服務獲取mac和IP資訊(這塊就不細說了),直接看程式碼

def _build_instance(self, context, request_spec, filter_properties,
            requested_networks, injected_files, admin_password, is_first_time,
            node, instance, image_meta, legacy_bdm_in_spec):
    ..
    # 查詢這個例項上掛了多少盤
    bdms = block_device_obj.BlockDeviceMappingList.get_by_instance_uuid(
        context, instance['uuid'])
    ..
    # 更新資源使用量,cpu 記憶體 硬碟
    with rt.instance_claim(context, instance, limits):
      ...
      # neutron將會為VM分配MAC和IP
      network_info = self._allocate_network(context, instance,
                        requested_networks, macs, security_groups,
                        dhcp_options)

     instance = self._spawn(context, instance, image_meta,
                                       network_info, block_device_info,
                                       injected_files, admin_password,
                                       set_access_ip=set_access_ip)
spawn方法是相對比較底層的,裡面涉及映象和建立虛機
def spawn(self, context, instance, image_meta, injected_files,
          admin_password, network_info=None, block_device_info=None):
    disk_info = blockinfo.get_disk_info(CONF.libvirt.virt_type,
                                        instance,
                                        block_device_info,
                                        image_meta)
    # 需要注入的檔案內容,最後會以字串的形式寫入injected_files中
    if CONF.libvirt.inject_nifcfg_file:
        self._mk_inject_files(image_meta, network_info, injected_files)
    
    # 建立磁碟映象檔案 disk disk.local等
    self._create_image(context, instance,
                       disk_info['mapping'],
                       network_info=network_info,
                       block_device_info=block_device_info,
                       files=injected_files,
                       admin_pass=admin_password)
    # 生成libvirt.xml檔案
    xml = self.to_xml(context, instance, network_info,
                      disk_info, image_meta,
                      block_device_info=block_device_info,
                      write_to_disk=True)

    # 建立真正的虛機例項domain
    self._create_domain_and_network(context, xml, instance, network_info,
                                    block_device_info)

    LOG.debug(_("Instance is running"), instance=instance)

    # 監控狀態是否ok, ok的話返回
    def _wait_for_boot():
        """Called at an interval until the VM is running."""
        state = self.get_info(instance)['state']

        if state == power_state.RUNNING:
            LOG.info(_("Instance spawned successfully."),
                     instance=instance)
            raise loopingcall.LoopingCallDone()

    timer = loopingcall.FixedIntervalLoopingCall(_wait_for_boot)
    timer.start(interval=0.5).wait()
至此整個流程講完了,有很多細的地方沒有講到,後續會在原始碼分析的其他章節講解。