1. 程式人生 > 實用技巧 >Skywalking Php註冊不上問題排查

Skywalking Php註冊不上問題排查

Skywalking是一款分散式追蹤應用,具體介紹可以參考skywalking

最近公司的一個Php應用在Skywalking後臺查不到資料了:

登入到某臺伺服器上發現註冊不上,啟動時就報錯了:

先來整理下Skywalking php的整個流程,php擴充套件在系統啟動時註冊應用和例項,然後在每次請求攔截相關呼叫,將相關呼叫情況儲存下來;註冊相關程式碼在skywalking.c的module_init中:

static void module_init() {

    application_instance = -100000;
    application_id = -100000;

    
int i = 0; do { application_id = serviceRegister(SKYWALKING_G(grpc), SKYWALKING_G(app_code)); if(application_id == -100000) { sleep(1); } i++; } while (application_id == -100000 && i <= 1); if (application_id == -100000) { sky_close = 1;
return; } char *ipv4s = _get_current_machine_ip(); char hostname[100] = {0}; if (gethostname(hostname, sizeof(hostname)) < 0) { strcpy(hostname, ""); } char *l_millisecond = get_millisecond(); long millisecond = zend_atol(l_millisecond, strlen(l_millisecond)); efree(l_millisecond); i
= 0; do { application_instance = serviceInstanceRegister(SKYWALKING_G(grpc), application_id, millisecond, SKY_OS_NAME, hostname, getpid(), ipv4s); if(application_instance == -100000) { sleep(2); } i++; } while (application_instance == -100000 && i <= 3); if (application_instance == -100000) { sky_close = 1; php_error(E_WARNING, "skywalking: register service error"); return; } php_error(E_WARNING, "skywalking: register service success"); }

可以看到,註冊應用是呼叫serviceRegister函式註冊,然後呼叫serviceInstanceRegister來註冊例項的,後者會呼叫GreeterClient::serviceInstanceRegister以下函式完成註冊:

int serviceInstanceRegister(int applicationid, long registertime, char *osname, char *hostname, int processno,
                                char *ipv4s) {
        ServiceInstances request;
        ServiceInstance *s = request.add_instances();

        if (uuid == NULL) {
            std::string uuid_str = boost::uuids::to_string(boost_uuid);
            uuid = (char *) malloc(uuid_str.size() + 1);
            bzero(uuid, uuid_str.size() + 1);
            strncpy(uuid, uuid_str.c_str(), uuid_str.size() + 1);
        }

        s->set_serviceid(applicationid);
        s->set_instanceuuid(std::string(uuid));
        s->set_time(registertime);

        KeyStringValuePair *os = s->add_properties();
        KeyStringValuePair *host = s->add_properties();
        KeyStringValuePair *process = s->add_properties();
        KeyStringValuePair *ipv4 = s->add_properties();
        KeyStringValuePair *language = s->add_properties();

        os->set_key("os_name");
        os->set_value(osname);
        host->set_key("host_name");
        host->set_value(hostname);
        process->set_key("process_no");
        process->set_value(std::to_string(processno));
        ipv4->set_key("ipv4");
        ipv4->set_value(ipv4s);
        language->set_key("language");
        language->set_value("php");

        ServiceInstanceRegisterMapping reply;

        ClientContext context;

        Status status = stub_->doServiceInstanceRegister(&context, request, &reply);

        if (status.ok()) {
            for (int i = 0; i < reply.serviceinstances_size(); i++) {
                const KeyIntValuePair &kv = reply.serviceinstances(i);
//                std::cout << "Register Instance:"<< std::endl;
//                std::cout << kv.key() << ": " << kv.value() << std::endl;

                if (kv.key() == uuid) {
                    return kv.value();
                }
            }
        }

        return -100000;

    }

通過gdb的斷點,發現註冊應用是成功的,註冊例項失敗了,然後在GreeterClient::serviceInstanceRegister加上相應的日誌:

if (status.ok()) {
            std::cout << "size:" << reply.serviceinstances_size() << std::endl;
            for (int i = 0; i < reply.serviceinstances_size(); i++) {
                const KeyIntValuePair &kv = reply.serviceinstances(i);
                std::cout << "Register Instance:"<< std::endl;
                std::cout << kv.key() << ": " << kv.value() << std::endl;

                if (kv.key() == uuid) {
                    return kv.value();
                }
            }
        }else{
                printf("instance register error");
        }

客戶端已經沒有線索了,只好從服務端入手,因為服務端是Java實現的,不大方便除錯,因此在本地搭了個環境想除錯下,哪知服務端跑起來了,Php客戶端死活編譯不上,因為Skywalking依賴protobuf、grpc等元件,這些元件之間有版本依賴關係的,官方文件也沒有說明,一時陷入困境。

因之前服務端維護的同學走了,只好自己硬著頭皮看程式碼,發現註冊入口程式碼在RegisterServiceHandler::doServiceInstanceRegister中:

@Override 
    public void doServiceInstanceRegister(ServiceInstances request,
        StreamObserver<ServiceInstanceRegisterMapping> responseObserver) {

        ServiceInstanceRegisterMapping.Builder builder = ServiceInstanceRegisterMapping.newBuilder();

        request.getInstancesList().forEach(instance -> {
            ServiceInventory serviceInventory = serviceInventoryCache.get(instance.getServiceId());

            JsonObject instanceProperties = new JsonObject();
            List<String> ipv4s = new ArrayList<>();

            for (KeyStringValuePair property : instance.getPropertiesList()) {
                String key = property.getKey();
                switch (key) {
                    case HOST_NAME:
                        instanceProperties.addProperty(HOST_NAME, property.getValue());
                        break;
                    case OS_NAME:
                        instanceProperties.addProperty(OS_NAME, property.getValue());
                        break;
                    case LANGUAGE:
                        instanceProperties.addProperty(LANGUAGE, property.getValue());
                        break;
                    case "ipv4":
                        ipv4s.add(property.getValue());
                        break;
                    case PROCESS_NO:
                        instanceProperties.addProperty(PROCESS_NO, property.getValue());
                        break;
                }
            }
            instanceProperties.addProperty(IPV4S, ServiceInstanceInventory.PropertyUtil.ipv4sSerialize(ipv4s));

            String instanceName = serviceInventory.getName();
            if (instanceProperties.has(PROCESS_NO)) {
                instanceName += "-pid:" + instanceProperties.get(PROCESS_NO).getAsString();
            }
            if (instanceProperties.has(HOST_NAME)) {
                instanceName += "@" + instanceProperties.get(HOST_NAME).getAsString();
            }

            int serviceInstanceId = serviceInstanceInventoryRegister.getOrCreate(instance.getServiceId(), instanceName, instance.getInstanceUUID(), instance.getTime(), instanceProperties);

            if (serviceInstanceId != Const.NONE) {
                logger.info("register service instance id={} [UUID:{}]", serviceInstanceId, instance.getInstanceUUID());
                builder.addServiceInstances(KeyIntValuePair.newBuilder().setKey(instance.getInstanceUUID()).setValue(serviceInstanceId));
            }
        });

        responseObserver.onNext(builder.build());
        responseObserver.onCompleted();
    }

關鍵是這行程式碼來生成例項id的:

int serviceInstanceId = serviceInstanceInventoryRegister.getOrCreate(instance.getServiceId(), instanceName, instance.getInstanceUUID(), instance.getTime(), instanceProperties);

再跟進去:

@Override public int getOrCreate(int serviceId, String serviceInstanceName, String uuid, long registerTime,
        JsonObject properties) {
        if (logger.isDebugEnabled()) {
            logger.debug("Get or create service instance by service instance name, service id: {}, service instance name: {},uuid: {}, registerTime: {}", serviceId, serviceInstanceName, uuid, registerTime);
        }

        int serviceInstanceId = getServiceInstanceInventoryCache().getServiceInstanceId(serviceId, uuid);

        if (serviceInstanceId == Const.NONE) {
            ServiceInstanceInventory serviceInstanceInventory = new ServiceInstanceInventory();
            serviceInstanceInventory.setServiceId(serviceId);
            serviceInstanceInventory.setName(serviceInstanceName);
            serviceInstanceInventory.setInstanceUUID(uuid);
            serviceInstanceInventory.setIsAddress(BooleanUtils.FALSE);
            serviceInstanceInventory.setAddressId(Const.NONE);

            serviceInstanceInventory.setRegisterTime(registerTime);
            serviceInstanceInventory.setHeartbeatTime(registerTime);

            serviceInstanceInventory.setProperties(properties);

            InventoryStreamProcessor.getInstance().in(serviceInstanceInventory);
        }
        return serviceInstanceId;
    }

這裡的邏輯就比較清晰了,先從快取中拿例項ID:

getServiceInstanceInventoryCache().getServiceInstanceId(serviceId, uuid);

拿不到則加入後臺任務處理生成ID。

再跟進getServiceInstanceId方法,

if (Objects.isNull(serviceInstanceId) || serviceInstanceId == Const.NONE) {
            serviceInstanceId = getCacheDAO().getServiceInstanceId(serviceId, uuid);
            if (serviceId != Const.NONE) {
                serviceInstanceNameCache.put(ServiceInstanceInventory.buildId(serviceId, uuid), serviceInstanceId);
            }
        }

從快取中拿不到則從DAO中拿,

GetResponse response = getClient().get(ServiceInstanceInventory.INDEX_NAME, id);
            if (response.isExists()) {
                return (int)response.getSource().getOrDefault(RegisterSource.SEQUENCE, 0);
            } else {
                return Const.NONE;
            }

後者從ES索引serviceinstanceinventory去拿。

為了證實上述邏輯無誤,從ES中讀取資料試下,果然例項ID都註冊在ES裡面:

再從客戶端證實下,既然例項ID是寫入ES的,那麼用以前的ID肯定是能註冊成功的,因此修改客戶端程式碼,將UUID寫死註冊試下:

 int serviceInstanceRegister(int applicationid, long registertime, char *osname, char *hostname, int processno,
                                char *ipv4s) {
        ServiceInstances request;
        ServiceInstance *s = request.add_instances();
        uuid= "7e22c317-e2e2-4f81-a53d-fe011013e0a3";
        if (uuid == NULL) {
            std::string uuid_str = boost::uuids::to_string(boost_uuid);
            uuid = (char *) malloc(uuid_str.size() + 1);
            bzero(uuid, uuid_str.size() + 1);
            strncpy(uuid, uuid_str.c_str(), uuid_str.size() + 1);
        }

馬上註冊成功了:

7e22c317-e2e2-4f81-a53d-fe011013e0a3
size:1
Register Instance:
7e22c317-e2e2-4f81-a53d-fe011013e0a3: 3386041
PHP Warning:  skywalking: register service success in Unknown on line 0
PHP Warning:  skywalking: hook redis handler success in Unknown on line 0
PHP Warning:  skywalking: hook session handler success in Unknown on line 0

  

再回到這個問題,原因已經知道了,如何解決呢,有兩個辦法:

1、加大註冊時等待時間,如等待到100秒;

2、記錄最近一次註冊成功的UUID並且持久化,下次啟動時直接用上次的;

因為2涉及到改程式碼,因此先用方案1解決問題。

Skywalking Php二:程式碼分析

故障演練利器之ChaosBlade介紹

全球智慧DNS解析實踐

一次線上Mysql死鎖分析