openstack原始碼解析之虛機建立
本文講的是openstack原始碼解析 虛擬機器建立流程 版本是icehouse版
首先先看架構圖,請求從nova-api發起,然後到nova-conductor,再到scheduler進行排程,排程選中某臺機器後,通過rpc請求,傳送到某臺機器上執行建立機器方法,期間會訪問glance獲取映象生成磁碟檔案,也會訪問neutron獲取網路相關資訊,最後呼叫libvirt,生成虛機,後面會逐個通過原始碼給大家講解。
nova-api
建立虛機,這裡從nova層面開始分析。通過http請求,帶著引數訪問到nova-api。
nova/api/openstack/compute/servers.py
def create(self, req, body):
if body and 'servers' in body:
context = req.environ['nova.context']
servers = body['servers']
return self.create_servers(context, req, servers)
順著create_server方法進去,進到_create這個方法中,這個方法會獲取引數資訊,比如機器名字,套餐等,做一些基本驗證,並且 呼叫compute_api的create方法
def _create(self, context, body, password):
return self.compute_api.create(context,..
compute_api指的是nova.compute.api.API
nova/compute/api.py
繼續跟進到_create_instance方法裡面,會做一系列引數驗證和封裝,進而插入資料庫instance的記錄,然後呼叫rpc請求def create(self, context, instance_type, image_href, kernel_id=None, ramdisk_id=None, min_count=None, max_count=None, display_name=None, display_description=None, key_name=None, key_data=None, security_group=None, availability_zone=None, user_data=None, metadata=None, injected_files=None, admin_password=None, block_device_mapping=None, access_ip_v4=None, access_ip_v6=None, requested_networks=None, config_drive=None, auto_disk_config=None, scheduler_hints=None, legacy_bdm=True): ... return self._create_instance( context, instance_type, image_href, kernel_id, ramdisk_id, min_count, max_count, display_name, display_description, key_name, key_data, security_group, availability_zone, user_data, metadata, injected_files, admin_password, access_ip_v4, access_ip_v6, requested_networks, config_drive, block_device_mapping, auto_disk_config, scheduler_hints=scheduler_hints, legacy_bdm=legacy_bdm)
def _create_instance(self, context, instance_type,
image_href, kernel_id, ramdisk_id,
min_count, max_count,
display_name, display_description,
key_name, key_data, security_groups,
availability_zone, user_data, metadata,
injected_files, admin_password,
access_ip_v4, access_ip_v6,
requested_networks, config_drive,
block_device_mapping, auto_disk_config,
reservation_id=None, scheduler_hints=None,
legacy_bdm=True):
...
for instance in instances:
self._record_action_start(context, instance,
instance_actions.CREATE)
self.compute_task_api.build_instances...
這個請求會跑到nova/conductor/rpcapi.py中的def build_instances(self, context, instances, image, filter_properties,
admin_password, injected_files, requested_networks,
security_groups, block_device_mapping, legacy_bdm=True):
image_p = jsonutils.to_primitive(image)
cctxt = self.client.prepare(version='1.5')
cctxt.cast(context, 'build_instances',
instances=instances, image=image_p,
filter_properties=filter_properties,
admin_password=admin_password,
injected_files=injected_files,
requested_networks=requested_networks,
security_groups=security_groups,
block_device_mapping=block_device_mapping,
legacy_bdm=legacy_bdm)
這個時候傳送了rpc請求,我們用的是 zmq點對點,傳送到conductor節點上,進到cctxt.cast這個方法裡面,看下nova/conductor/rpcapi.py這個檔案
def __init__(self):
super(ComputeTaskAPI, self).__init__()
target = messaging.Target(topic=CONF.conductor.topic,
namespace='compute_task',
version='1.0')
serializer = objects_base.NovaObjectSerializer()
self.client = rpc.get_client(target, serializer=serializer)
nova-conductor
進入到nova/conductor/manager.py這個檔案的build_instances方法
def build_instances(self, context, instances, image, filter_properties,
admin_password, injected_files, requested_networks,
security_groups, block_device_mapping, legacy_bdm=True):
...
self.scheduler_rpcapi.new_run_instance(context,
request_spec=request_spec, admin_password=admin_password,
injected_files=injected_files,
requested_networks=requested_networks, is_first_time=True,
filter_properties=filter_properties,
legacy_bdm_in_spec=legacy_bdm)
我們這裡改造了下,直接用了new_run_instance這個方法,進去再看下 nova/scheduler/rpcapi.py
def new_run_instance(self, ctxt, request_spec, admin_password,
injected_files, requested_networks, is_first_time,
filter_properties, legacy_bdm_in_spec=True):
msg_kwargs = {'request_spec': request_spec,
'admin_password': admin_password,
'injected_files': injected_files,
'requested_networks': requested_networks,
'is_first_time': is_first_time,
'filter_properties': filter_properties,
'legacy_bdm_in_spec': legacy_bdm_in_spec}
cctxt = self.client.prepare()
cctxt.cast(ctxt, 'new_run_instance', **msg_kwargs)
這個時候是傳送了zmq請求到了scheduler上了,具體的傳送過程,看下這個類的__init__方法即可。
nova-scheduler
方法到了nova/scheduler/manager.py這個檔案中,我們看SchedulerManager這個類的new_run_instance方法
def new_run_instance(self, context, request_spec, admin_password,
injected_files, requested_networks, is_first_time,
filter_properties, legacy_bdm_in_spec=True):
...
return self.driver.new_schedule_run_instance(context,
request_spec, admin_password, injected_files,
requested_networks, is_first_time, filter_properties,
legacy_bdm_in_spec)
這個用到了driver, 這個driver指的就是你使用的過濾器,可能是記憶體有限過濾,或者CPU, 或者硬碟。這塊我們選的驅動是nova.scheduler.filter_scheduler.FilterScheduler,我們進到這個驅動,看下new_schedule_run_instance這個方法。nova/scheduler/filter_scheduler.py, 類FilterScheduler下的new_schedule_run_instance方法:
def new_schedule_run_instance(self, context, request_spec,
admin_password, injected_files,
requested_networks, is_first_time,
filter_properties, legacy_bdm_in_spec):
...
try:
self._new_schedule_run_instance(context, request_spec,
admin_password, injected_files,
requested_networks, is_first_time,
filter_properties, legacy_bdm_in_spec)
..
這裡說說下host_queue, 這個是定時載入的,預設時間是10s, 在nova/scheduler/manager.py中
@periodic_task.periodic_task(spacing=CONF.new_scheduler_build_queue_period,
run_immediately=True)
def build_queue(self, context):
current = host_queue.QueueManager()
current.init_host_queue(context)
current.build_queue()
看下build_queue的具體實現 nova/scheduler/host_queue.py
def build_queue(self):
...
# 從資料庫讀取compute節點
self.compute_nodes = db.compute_node_get_all(elevated)
for compute in self.compute_nodes:
# 獲取extra_resources資訊
extra_resources = compute.get('extra_resources')
# 獲取hostname
hostname = compute.get('hypervisor_hostname')
# 獲取queue_name, 預設是kvm
queue_name = extra_resources.get('queue_name')
new_queue = []
if not queue_name:
queue_name = CONF.default_queue
...
# 過濾掉disabled的機器
if service['disabled']:
LOG.warn("Compute service disabled %s", hostname)
continue
...
# 獲取磁碟,cpu, 記憶體超售比,這些值都是計算節點通過定時任務,彙報自己配置檔案資訊到資料庫中,具體的方法就是resource_tracker
disk_allocation_ratio = extra_resources.get('disk_allocation_ratio', 1.0)
cpu_allocation_ratio = extra_resources.get('cpu_allocation_ratio', 1.0)
ram_allocation_ratio = extra_resources.get('ram_allocation_ratio', 1.0)
...
# 獲取cpu總量,使用量,空閒量
vcpus = compute['vcpus'] * cpu_allocation_ratio
vcpus_used = compute['vcpus_used']
free_vcpus = vcpus - compute['vcpus_used']
limits['vcpu'] = vcpus
local_gb = compute['local_gb'] * disk_allocation_ratio
free_local_gb = local_gb - \
(compute['local_gb'] - compute['free_disk_gb'])
limits['disk_gb'] = local_gb
# memory_mb
memory_mb = compute['memory_mb'] * ram_allocation_ratio
free_memory_mb = memory_mb - \
(compute['memory_mb'] - compute['free_ram_mb'])
limits['memory_mb'] = memory_mb
...
# 生成物件值,放入QueueManager.host_info中
QueueManager.host_info[hostname] = BaseQueue(
hostname=hostname,
vcpus=vcpus, vcpus_used=vcpus_used, free_vcpus=free_vcpus,
memory_mb=memory_mb,
free_memory_mb=free_memory_mb, local_gb=local_gb,
free_local_gb=free_local_gb, net_bandwidth=net_bandwidth,
net_bandwidth_used=net_bandwidth_used,
free_net_bandwidth=free_net_bandwidth,
disk_bandwidth=disk_bandwidth,
disk_bandwidth_used=disk_bandwidth_used,
free_disk_bandwidth=free_disk_bandwidth,
multi_disk_info=multi_disk_info,
updated_at=updated_at, queue_name=queue_name,
limits=limits)
我們再回過頭繼續看排程這塊,既然host_queue都有了,我們繼續往下看。nova/scheduler/filter_scheduler.py
def _new_schedule_run_instance(self, context, request_spec,
admin_password, injected_files,
requested_networks, is_first_time,
filter_properties, legacy_bdm_in_spec):
## 獲取引數
..
## 這裡引數中如果指定了scheduler_host,直接排程到指定物理機中去建立機器。
if scheduler_host:
self.schedule_instance_to_assigned_host(context, request_spec,
admin_password, injected_files,
requested_networks, is_first_time,
filter_properties, legacy_bdm_in_spec,
scheduler_host, disk_shares,
instance_uuids, scheduler_hints)
return
..
## 預設的queue_name叫kvm, 獲取佇列名字下的機器,這個是在host_queue檔案初始化的時候構建的。
host_queue = self.get_host_queue(queue_name)
# 如果有值,這個用的是正則匹配,匹配機器名字中含有scheduler_host_match值的機器
if scheduler_host_match:
host_queue = self._get_matched_host_queue(host_queue, scheduler_host_match)
LOG.debug("matched host queue (%s): %s length is: %d", scheduler_host_match,
queue_name, len(host_queue))
...
# 這裡設定了一個值,requested_disk值就是虛機根分割槽的大小,加上使用者分割槽,再加上swap空間大小,這個在後面比對會用上
req_res['requested_disk'] = 1024 * (instance_type['root_gb'] +
instance_type['ephemeral_gb']) + \
instance_type['swap']
# 這個方法就是直接排程獲取到匹配傳遞的引數的機器,這個在下面的方法中講解
host = self._new_schedule(context, host_queue,
req_res, request_spec,
copy_filter_properties,
instance_uuid, retry,
different_host_flag,
different_host, disk_shares,
try_different_host, sign, boundary_host)
# 獲取到機器了,這個時候就繼續傳送點對點請求,給對應的機器,去建立虛擬機器
self.pool.spawn(self.compute_rpcapi.new_run_instance,
context, instance_uuid, host.hostname,
request_spec, copy_filter_properties,
requested_networks, injected_files,
admin_password, is_first_time,
host.hostname, legacy_bdm_in_spec, self._disk_info)
我們繼續來看_new_scheduler, 還是在這個檔案中
def _new_schedule(self, context, host_queue, req_res,
request_spec, filter_properties,
instance_uuid, retry=None,
different_host_flag=None,
different_host=None,
disk_shares=None,
try_different_host=None,
sign=1,
boundary_host=None):
..
# 這個含義是,如果設定了different_host為true, 則虛機的排程,要排程到不同的物理機上。
這裡的實現是通過check_host_different_from_uuids方法,每次選中的host放到數組裡面,
然後下一次選中的host, 驗證下是否在這個數組裡面。
if different_host:
LOG.debug('instance %s different_host: %s', instance_uuid,
different_host)
if not self.check_host_different_from_uuids(context,
instance_uuid, host, different_host):
self._find_pos = self._find_pos + sign * 1
continue
# 這裡檢視資源是否充足
resource_check = self.check_host_resource(context,
host=host,
req_res=req_res,
disk_shares=disk_shares)
# 如果匹配,返回host
我們繼續深入方法裡面,看下check_host_resource方法做了什麼(依然還在這個檔案中)
def check_host_resource(self, context, host, req_res,
disk_shares=0):
...
# 檢查要申請的磁碟空間是否比物理機上空閒的磁碟大,如果大,就返回False, 告知check不通過
usable_disk_mb = host.free_local_gb * 1024
if not usable_disk_mb >= req_res['requested_disk']:
return False
# check 記憶體
if req_res['requested_ram'] > 0:
usable_ram = host.free_memory_mb
if not usable_ram >= req_res['requested_ram']:
return False
# check vcpus
if req_res['requested_vcpus'] > 0:
if host.free_vcpus < req_res['requested_vcpus']:
return False
return True
nova-compute
通過rpc呼叫到對應的host節點,執行new_run_instance方法
def new_run_instance(self, context, instance_uuid, request_spec,
filter_properties, requested_networks,
injected_files, admin_password,
is_first_time, node, legacy_bdm_in_spec,
disk_info=None):
# 一方面更新資料庫狀態,另外一方面,更新資源使用量
if disk_info:
instance = self._instance_update(
context, instance_uuid,
disk_shares=disk_info['disk_shares'],
selected_dir=disk_info['selected_dir'])
else:
instance = self._instance_update(context,
instance_uuid)
self.run_instance(context, instance, request_spec,
filter_properties, requested_networks, injected_files,
admin_password, is_first_time, node,
legacy_bdm_in_spec)
繼續往下看,進入run_instance方法
def _run_instance(self, context, request_spec,
filter_properties, requested_networks, injected_files,
admin_password, is_first_time, node, instance,
legacy_bdm_in_spec):
# 首先檢查下機器名字是否存在,然後更新下資料庫,改成building狀態
self._prebuild_instance(context, instance)
...
instance, network_info = self._build_instance(context,
request_spec, filter_properties, requested_networks,
injected_files, admin_password, is_first_time, node,
instance, image_meta, legacy_bdm_in_spec)
這個時候,進入方法裡面,通過neutron服務獲取mac和IP資訊(這塊就不細說了),直接看程式碼
def _build_instance(self, context, request_spec, filter_properties,
requested_networks, injected_files, admin_password, is_first_time,
node, instance, image_meta, legacy_bdm_in_spec):
..
# 查詢這個例項上掛了多少盤
bdms = block_device_obj.BlockDeviceMappingList.get_by_instance_uuid(
context, instance['uuid'])
..
# 更新資源使用量,cpu 記憶體 硬碟
with rt.instance_claim(context, instance, limits):
...
# neutron將會為VM分配MAC和IP
network_info = self._allocate_network(context, instance,
requested_networks, macs, security_groups,
dhcp_options)
instance = self._spawn(context, instance, image_meta,
network_info, block_device_info,
injected_files, admin_password,
set_access_ip=set_access_ip)
spawn方法是相對比較底層的,裡面涉及映象和建立虛機
def spawn(self, context, instance, image_meta, injected_files,
admin_password, network_info=None, block_device_info=None):
disk_info = blockinfo.get_disk_info(CONF.libvirt.virt_type,
instance,
block_device_info,
image_meta)
# 需要注入的檔案內容,最後會以字串的形式寫入injected_files中
if CONF.libvirt.inject_nifcfg_file:
self._mk_inject_files(image_meta, network_info, injected_files)
# 建立磁碟映象檔案 disk disk.local等
self._create_image(context, instance,
disk_info['mapping'],
network_info=network_info,
block_device_info=block_device_info,
files=injected_files,
admin_pass=admin_password)
# 生成libvirt.xml檔案
xml = self.to_xml(context, instance, network_info,
disk_info, image_meta,
block_device_info=block_device_info,
write_to_disk=True)
# 建立真正的虛機例項domain
self._create_domain_and_network(context, xml, instance, network_info,
block_device_info)
LOG.debug(_("Instance is running"), instance=instance)
# 監控狀態是否ok, ok的話返回
def _wait_for_boot():
"""Called at an interval until the VM is running."""
state = self.get_info(instance)['state']
if state == power_state.RUNNING:
LOG.info(_("Instance spawned successfully."),
instance=instance)
raise loopingcall.LoopingCallDone()
timer = loopingcall.FixedIntervalLoopingCall(_wait_for_boot)
timer.start(interval=0.5).wait()
至此整個流程講完了,有很多細的地方沒有講到,後續會在原始碼分析的其他章節講解。