php中parse_url函式的原始碼及分析(scheme部分)

阿新 • • 發佈：2018-12-09

前言

看師傅們的文章時發現,parse_url出現的次數較多，單純parse_url解析漏洞的考題也有很多，在此研究一下原始碼(太菜了看不懂，待日後再補充Orz)

原始碼

在ext/standard/url.c檔案中

PHPAPI php_url *php_url_parse_ex(char const *str, size_t length)
{
    char port_buf[6];
    php_url *ret = ecalloc(1, sizeof(php_url));
    char const *s, *e, *p, *pp, *ue;

    s = str;
    ue = s + length;

    /* parse scheme */
    if ((e = memchr(s, ':', length)) && e != s) {
        /* validate scheme */
        p = s;
        while (p < e) {
            /* scheme = 1*[ lowalpha | digit | "+" | "-" | "." ] */
            if (!isalpha(*p) && !isdigit(*p) && *p != '+' && *p != '.' && *p != '-') {
                if (e + 1 < ue && e < s + strcspn(s, "?#")) {
                    goto parse_port;
                } else if (s + 1 < ue && *s == '/' && *(s + 1) == '/') { /* relative-scheme URL */
                    s += 2;
                    e = 0;
                    goto parse_host;
                } else {
                    goto just_path;
                }
            }
            p++;
        }

        if (e + 1 == ue) { /* only scheme is available */
            ret->scheme = estrndup(s, (e - s));
            php_replace_controlchars_ex(ret->scheme, (e - s));
            return ret;
        }

        /*
         * certain schemas like mailto: and zlib: may not have any / after them
         * this check ensures we support those.
         */
        if (*(e+1) != '/') {
            /* check if the data we get is a port this allows us to
             * correctly parse things like a.com:80
             */
            p = e + 1;
            while (p < ue && isdigit(*p)) {
                p++;
            }

            if ((p == ue || *p == '/') && (p - e) < 7) {
                goto parse_port;
            }

            ret->scheme = estrndup(s, (e-s));
            php_replace_controlchars_ex(ret->scheme, (e - s));

            s = e + 1;
            goto just_path;
        } else {
            ret->scheme = estrndup(s, (e-s));
            php_replace_controlchars_ex(ret->scheme, (e - s));

            if (e + 2 < ue && *(e + 2) == '/') {
                s = e + 3;
                if (!strncasecmp("file", ret->scheme, sizeof("file"))) {
                    if (e + 3 < ue && *(e + 3) == '/') {
                        /* support windows drive letters as in:
                           file:///c:/somedir/file.txt
                        */
                        if (e + 5 < ue && *(e + 5) == ':') {
                            s = e + 4;
                        }
                        goto just_path;
                    }
                }
            } else {
                s = e + 1;
                goto just_path;
            }
        }
    } else if (e) { /* no scheme; starts with colon: look for port */
        parse_port:
        p = e + 1;
        pp = p;

        while (pp < ue && pp - p < 6 && isdigit(*pp)) {
            pp++;
        }

        if (pp - p > 0 && pp - p < 6 && (pp == ue || *pp == '/')) {
            zend_long port;
            memcpy(port_buf, p, (pp - p));
            port_buf[pp - p] = '\0';
            port = ZEND_STRTOL(port_buf, NULL, 10);
            if (port > 0 && port <= 65535) {
                ret->port = (unsigned short) port;
                if (s + 1 < ue && *s == '/' && *(s + 1) == '/') { /* relative-scheme URL */
                    s += 2;
                }
            } else {
                if (ret->scheme) efree(ret->scheme);
                efree(ret);
                return NULL;
            }
        } else if (p == pp && pp == ue) {
            if (ret->scheme) efree(ret->scheme);
            efree(ret);
            return NULL;
        } else if (s + 1 < ue && *s == '/' && *(s + 1) == '/') { /* relative-scheme URL */
            s += 2;
        } else {
            goto just_path;
        }
    } else if (s + 1 < ue && *s == '/' && *(s + 1) == '/') { /* relative-scheme URL */
        s += 2;
    } else {
        goto just_path;
    }

    parse_host:
    /* Binary-safe strcspn(s, "/?#") */
    e = ue;
    if ((p = memchr(s, '/', e - s))) {
        e = p;
    }
    if ((p = memchr(s, '?', e - s))) {
        e = p;
    }
    if ((p = memchr(s, '#', e - s))) {
        e = p;
    }

    /* check for login and password */
    if ((p = zend_memrchr(s, '@', (e-s)))) {
        if ((pp = memchr(s, ':', (p-s)))) {
            ret->user = estrndup(s, (pp-s));
            php_replace_controlchars_ex(ret->user, (pp - s));

            pp++;
            ret->pass = estrndup(pp, (p-pp));
            php_replace_controlchars_ex(ret->pass, (p-pp));
        } else {
            ret->user = estrndup(s, (p-s));
            php_replace_controlchars_ex(ret->user, (p-s));
        }

        s = p + 1;
    }

    /* check for port */
    if (s < ue && *s == '[' && *(e-1) == ']') {
        /* Short circuit portscan,
           we're dealing with an
           IPv6 embedded address */
        p = NULL;
    } else {
        p = zend_memrchr(s, ':', (e-s));
    }

    if (p) {
        if (!ret->port) {
            p++;
            if (e-p > 5) { /* port cannot be longer then 5 characters */
                if (ret->scheme) efree(ret->scheme);
                if (ret->user) efree(ret->user);
                if (ret->pass) efree(ret->pass);
                efree(ret);
                return NULL;
            } else if (e - p > 0) {
                zend_long port;
                memcpy(port_buf, p, (e - p));
                port_buf[e - p] = '\0';
                port = ZEND_STRTOL(port_buf, NULL, 10);
                if (port > 0 && port <= 65535) {
                    ret->port = (unsigned short)port;
                } else {
                    if (ret->scheme) efree(ret->scheme);
                    if (ret->user) efree(ret->user);
                    if (ret->pass) efree(ret->pass);
                    efree(ret);
                    return NULL;
                }
            }
            p--;
        }
    } else {
        p = e;
    }

    /* check if we have a valid host, if we don't reject the string as url */
    if ((p-s) < 1) {
        if (ret->scheme) efree(ret->scheme);
        if (ret->user) efree(ret->user);
        if (ret->pass) efree(ret->pass);
        efree(ret);
        return NULL;
    }

    ret->host = estrndup(s, (p-s));
    php_replace_controlchars_ex(ret->host, (p - s));

    if (e == ue) {
        return ret;
    }

    s = e;

    just_path:

    e = ue;
    p = memchr(s, '#', (e - s));
    if (p) {
        p++;
        if (p < e) {
            ret->fragment = estrndup(p, (e - p));
            php_replace_controlchars_ex(ret->fragment, (e - p));
        }
        e = p-1;
    }

    p = memchr(s, '?', (e - s));
    if (p) {
        p++;
        if (p < e) {
            ret->query = estrndup(p, (e - p));
            php_replace_controlchars_ex(ret->query, (e - p));
        }
        e = p-1;
    }

    if (s < e || s == ue) {
        ret->path = estrndup(s, (e - s));
        php_replace_controlchars_ex(ret->path, (e - s));
    }

    return ret;
}

/* {{{ proto mixed parse_url(string url, [int url_component])
   Parse a URL and return its components */
PHP_FUNCTION(parse_url)
{
    char *str;
    size_t str_len;
    php_url *resource;
    zend_long key = -1;

    if (zend_parse_parameters(ZEND_NUM_ARGS(), "s|l", &str, &str_len, &key) == FAILURE) {
        return;
    }

    resource = php_url_parse_ex(str, str_len);
    if (resource == NULL) {
        /* @todo Find a method to determine why php_url_parse_ex() failed */
        RETURN_FALSE;
    }

    if (key > -1) {
        switch (key) {
            case PHP_URL_SCHEME:
                if (resource->scheme != NULL) RETVAL_STRING(resource->scheme);
                break;
            case PHP_URL_HOST:
                if (resource->host != NULL) RETVAL_STRING(resource->host);
                break;
            case PHP_URL_PORT:
                if (resource->port != 0) RETVAL_LONG(resource->port);
                break;
            case PHP_URL_USER:
                if (resource->user != NULL) RETVAL_STRING(resource->user);
                break;
            case PHP_URL_PASS:
                if (resource->pass != NULL) RETVAL_STRING(resource->pass);
                break;
            case PHP_URL_PATH:
                if (resource->path != NULL) RETVAL_STRING(resource->path);
                break;
            case PHP_URL_QUERY:
                if (resource->query != NULL) RETVAL_STRING(resource->query);
                break;
            case PHP_URL_FRAGMENT:
                if (resource->fragment != NULL) RETVAL_STRING(resource->fragment);
                break;
            default:
                php_error_docref(NULL, E_WARNING, "Invalid URL component identifier " ZEND_LONG_FMT, key);
                RETVAL_FALSE;
        }
        goto done;
    }

    /* allocate an array for return */
    array_init(return_value);

    /* add the various elements to the array */
    if (resource->scheme != NULL)
        add_assoc_string(return_value, "scheme", resource->scheme);
    if (resource->host != NULL)
        add_assoc_string(return_value, "host", resource->host);
    if (resource->port != 0)
        add_assoc_long(return_value, "port", resource->port);
    if (resource->user != NULL)
        add_assoc_string(return_value, "user", resource->user);
    if (resource->pass != NULL)
        add_assoc_string(return_value, "pass", resource->pass);
    if (resource->path != NULL)
        add_assoc_string(return_value, "path", resource->path);
    if (resource->query != NULL)
        add_assoc_string(return_value, "query", resource->query);
    if (resource->fragment != NULL)
        add_assoc_string(return_value, "fragment", resource->fragment);
done:
    php_url_free(resource);
}

程式碼中遇到的問題解決

函式定義部分

PHP_FUNCTION(parse_url)
{
    char *str;
    size_t str_len;
    php_url *resource;
    zend_long key = -1;

    if (zend_parse_parameters(ZEND_NUM_ARGS(), "s|l", &str, &str_len, &key) == FAILURE) {
        return;
    }

    resource = php_url_parse_ex(str, str_len);
    if (resource == NULL) {
        /* @todo Find a method to determine why php_url_parse_ex() failed */
        RETURN_FALSE;
}

引用這篇文章的內容http://www.nowamagic.net/librarys/veda/detail/1467

b   Boolean
l   Integer 整型
d   Floating point 浮點型
s   String 字串
r   Resource 資源
a   Array 陣列
o   Object instance 物件
O   Object instance of a specified type 特定型別的物件
z   Non-specific zval 任意型別
Z   zval**型別
f   表示函式、方法名稱

那麼其中的"s|l"表示parse_url需要兩個引數，一個字串型，一個整型

php_url型別的宣告在ext/standard/url.h中

typedef struct php_url {
    char *scheme;
    char *user;
    char *pass;
    char *host;
    unsigned short port;
    char *path;
    char *query;
    char *fragment;
} php_url;

問題

parse_url只有兩個引數，不知道strlen這個引數哪裡去了……？還有他的值到底是怎麼獲得的……

函式內部實現部分

使用php_url_parse_ex函式來處理我們傳過去的url，先暫定str_len為str的長度……

if ((e = memchr(s, ':', length)) && e != s) {
        /* validate scheme */
        p = s;
        while (p < e) {
            /* scheme = 1*[ lowalpha | digit | "+" | "-" | "." ] */
            if (!isalpha(*p) && !isdigit(*p) && *p != '+' && *p != '.' && *p != '-') {
                if (e + 1 < ue && e < s + strcspn(s, "?#")) {
                    goto parse_port;
                } else if (s + 1 < ue && *s == '/' && *(s + 1) == '/') { /* relative-scheme URL */
                    s += 2;
                    e = 0;
                    goto parse_host;
                } else {
                    goto just_path;
                }
            }
            p++;
        }

        if (e + 1 == ue) { /* only scheme is available */
            ret->scheme = estrndup(s, (e - s));
            php_replace_controlchars_ex(ret->scheme, (e - s));
            return ret;
        }

        /*
         * certain schemas like mailto: and zlib: may not have any / after them
         * this check ensures we support those.
         */
        if (*(e+1) != '/') {
            /* check if the data we get is a port this allows us to
             * correctly parse things like a.com:80
             */
            p = e + 1;
            while (p < ue && isdigit(*p)) {
                p++;
            }

            if ((p == ue || *p == '/') && (p - e) < 7) {
                goto parse_port;
            }

            ret->scheme = estrndup(s, (e-s));
            php_replace_controlchars_ex(ret->scheme, (e - s));

            s = e + 1;
            goto just_path;
        } else {
            ret->scheme = estrndup(s, (e-s));
            php_replace_controlchars_ex(ret->scheme, (e - s));

            if (e + 2 < ue && *(e + 2) == '/') {
                s = e + 3;
                if (!strncasecmp("file", ret->scheme, sizeof("file"))) {
                    if (e + 3 < ue && *(e + 3) == '/') {
                        /* support windows drive letters as in:
                           file:///c:/somedir/file.txt
                        */
                        if (e + 5 < ue && *(e + 5) == ':') {
                            s = e + 4;
                        }
                        goto just_path;
                    }
                }
            } else {
                s = e + 1;
                goto just_path;
            }
        }
    } else if (e) { /* no scheme; starts with colon: look for port */

如果s中含有冒號則e指向冒號且同時如果冒號不在s的開頭，p指向s

當p不指向冒號向迴圈，p指向下一位

如果p指向的值是字母或者數字或者是+,-,.則指標指向下一位，這就代表冒號前面的值其實是任意的字母、數字、+、-、.

如果冒號所在位置小於str，且?#在冒號後面(如果有的話)，就跳轉到port解析部分

如果str的長度大於1且str的前兩個字元是//，s指向//後面的一個字元，e變為0，跳轉到host解析

如果冒號是最後一位字元，則冒號前面的東西會當作scheme返回

如果冒號後面不是/，則p指向冒號後面一位當p小於str且p指向的為數字字元，p一直指向後一位，直到p指向str末尾或者p指向的字元為/，同時冒號後面的數字位數小於6位，跳轉到port解析

如果冒號後面不是純數字或數字後面有一個/，那麼冒號前面的內容就當作scheme，放在ret的scheme引數中，s指向冒號後一位，跳轉到path解析

如果冒號後面是/，那麼冒號前面的內容就當作scheme，放在ret的scheme引數中。如果下面一位也是/，那麼s指向//後面一位，如果scheme為file，那麼判斷接下來一位是不是/，如果是，判斷冒號後是否有五個字元，如果有那麼第五個字元是不是冒號(為了處理file:///c:)，s指向///後的一位字元，跳轉到path解析

如果冒號後面不是三個/，s指向冒號後面一位，之後跳轉到path解析

如果冒號在str開頭，那麼進行port解析

姿勢

只要請求的url裡不含有冒號(:)就會被當成path解析

php中parse_url函式的原始碼及分析(scheme部分)

前言看師傅們的文章時發現,parse_url出現的次數較多，單純parse_url解析漏洞的考題也有很多，在此研究一下原始碼(太菜了看不懂，待日後再補充Orz) 原始碼在ext/standard/url.c檔案中 PHPAPI php_url *php_url_parse_ex(char const

php中按指定標識及長度替換字符的方法代碼

utf UNC col null sub 手機 span color mb_strlen /** * 按指定標識及長度替換字符 * @param $str * @param int $start 開始的位數 * @param int $end 後面保留

PHP 中 call_user_func 函式和 call_user_func_array 函式

PHP 中 call_user_func() 函式和 call_user_func_array()函式都是回撥函式區別： call_user_func() 可以有多個引數，第一個引數為被呼叫的回撥函式，除了第一個引數外，其他引數均為被呼叫函式的引數 c

PHP中替換函式str_replace()

str_replace(find,replace,string,count) find:需要替換的值 replace:將要替換的值, string:被替換的字串 count:同級替換的次數如果需要把資料庫中儲存的帶回車以及空格的文章,按照其原來的樣式顯示在html中,可以使用此函式, function

前程無憂爬蟲原始碼及分析（一）

一、網頁分析 1.1 關鍵字頁面(url入口) 首先在前程無憂網站上檢索關鍵詞"大資料"： &n

php中==和===的含義及區別

===比較兩個變數的值和型別；==比較兩個變數的值，不比較資料型別。比如 $a = '123'; $b = 123; $a === $b為假； $a == $b為真；有些情況下不能使用==，可以使用===，比如： <

PHP中Closure::bindTo的用法分析

最近使用laravel-admin開發一個後臺，過程中發現了這麼一個呼叫：在display方法的閉包函式中，使用了$this去獲取值（$this是laravel中的Model，這裡取的是資料庫中返回

php中intval函式細節

int intval(mixed $var [, int $base]); 1.intval()的返回值是整型，1或者0。可作用於陣列或者物件（物件報錯資訊：Notice: Object of class 物件名 could not be converted to int in&nb

PHP中is_numeric函式十六進位制繞過BUG 容易引發安全問題

0×00 簡介國內一部分CMS程式裡面有用到過is_numberic函式，我們先看看這個函式的結構 bool is_numeric ( mixed $var ) 如果 var 是數字和數字字串則返回 TRUE，否則返回 FALSE。 0×01 函式是否安全接下來我們

php中在函式前加static的作用

前幾天在糾結一個問題：為什麼兩個或多個php檔案可以相互呼叫指令碼檔案中的內容，今天看到一篇文章的解釋終於懂了具體如下 a.php <?php require_once '

Js中的函式型別及宣告和表示式

Js中比較有趣的恐怕就是函式型別了： function value(){ return value; } alert(typeof value); //“function” typeof是一個獲得運算元型別的操作符，列印結果是function型別，這

【PHP】解析PHP中的函式

目錄結構： contents structure [+] 可變引數的函式變數函式回撥函式自定義函式庫閉包（Closure）函式的使用在這篇文章中，筆者將會講解如何使用PHP中的函式，PHP是一門弱語言型別，相比較於強語言型別（java

PHP中實現函式過載

轉載自：http://cnn237111.blog.51cto.com/2359144/1284085 由於PHP是弱型別語言，因此函式的輸入引數型別無法確定（可以使用型別暗示，但是型別暗示無法用在諸如整型，字串之類的標量型別上），並且對於一個函式，比如只定義了3個輸入引數，PHP卻執行呼叫的時候輸入4個或

在PHP中var_dump()函式輸出不完整的問題

PHP開發環境裡，安裝了xdebug模組後，var_dump()輸出的結果將比較易於檢視，但預設情況下，var_dump() 輸出的結果將有所變化：過多的陣列元素不再顯示，字串變數將只顯示前N個字元，較深的陣列元素也被顯示成省略號。這點會帶來一些不便，不過我們可以修改ph

PHP中eval函式的危害與正確禁用方法

php的eval函式並不是系統元件函式，因此我們在php.ini中使用disable_funct

php中sprintf函式用法

在使用sprintf過程中，有時候會重複使用後面的引數 <?php $num = 5; $location = 'tree'; $format = 'The %2$s contains %1$d

PHP中的函式

php中的 header () 函式詳解 PHP中requeir_once()函式的使用方法 PHP中mysqli_query()函式 PHP中mysqli_num_rows()函式 PHP中mysqli_fetch_assoc()函式 PHP中mysqli_fetch_

matlab 中 mvnrnd 函式用法及舉例

使用matlab來實現： mu = [2 3]; SIGMA = [1 0; 0 2]; r = mvnrnd(mu,SIGMA,100); plot(r(:,1),r(:,2),'r+'); hold on; mu = [7 8]; SIGMA = [ 1 0; 0 2]; r2 = mvnrnd(mu,S

PHP中$_POST[]函式的作用

在PHP中，$_POST[]主要用來獲取<form>表單中填入的值。如果想要用$_POST[]獲取<form>中的值，需要設定兩個屬性--action&method--。action指向表單中的資料提交到的目標檔案，method設定提交方式

PHP中file_get_contents函式抓取https地址出錯的解決方法

在php中，抓取https的網站，提示如下的錯誤內容：Warning: file_get_contents() [function.file-get-contents]: failed to open stream: Invalid argument in I:Webmyph

php中parse_url函式的原始碼及分析(scheme部分)

前言

原始碼

程式碼中遇到的問題解決

函式定義部分

問題

函式內部實現部分

姿勢

相關推薦