PHP base_convert 函数的一个有趣现象

PHP 的 base_convert 函数能在任意进制之间转换数字,这是常识。那么请你不要实际运行,用常识判断一下,这句代码运行的结果:

echo base_convert('http://demon.tw', 16, 10);

如果你的答案是 222,那么恭喜你答对了,其实上面那句代码跟这句是一样的:

echo base_convert('de', 16, 10);

也就是说,base_convert 函数会忽略掉该进制以外的其他字符。下面通过 base_convert 函数的 C 源码来分析原因,base_convert 函数定义在 PHP 源码的 ext/standard/math.c 中:

/* {{{ proto string base_convert(string number, int frombase, int tobase)
   Converts a number in a string from any base <= 36 to any base <= 36 */
PHP_FUNCTION(base_convert)
{
    zval **number, **frombase, **tobase, temp;
    char *result;

    if (ZEND_NUM_ARGS() != 3 || zend_get_parameters_ex(3, &number, &frombase, &tobase) == FAILURE) {
        WRONG_PARAM_COUNT;
    }
    convert_to_string_ex(number);
    convert_to_long_ex(frombase);
    convert_to_long_ex(tobase);
    if (Z_LVAL_PP(frombase) < 2 || Z_LVAL_PP(frombase) > 36) {
        php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid `from base' (%ld)", Z_LVAL_PP(frombase));
        RETURN_FALSE;
    }
    if (Z_LVAL_PP(tobase) < 2 || Z_LVAL_PP(tobase) > 36) {
        php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid `to base' (%ld)", Z_LVAL_PP(tobase));
        RETURN_FALSE;
    }

    if(_php_math_basetozval(*number, Z_LVAL_PP(frombase), &temp) != SUCCESS) {
        RETURN_FALSE;
    }
    result = _php_math_zvaltobase(&temp, Z_LVAL_PP(tobase) TSRMLS_CC);
    RETVAL_STRING(result, 0);
}

前面几行都是解析和校验参数是否正确,关键代码是 _php_math_basetozval 和 _php_math_zvaltobase 函数,_php_math_basetozval 定义如下:

/* {{{ _php_math_basetozval */
/*
 * Convert a string representation of a base(2-36) number to a zval.
 */
PHPAPI int _php_math_basetozval(zval *arg, int base, zval *ret)
{
    long num = 0;
    double fnum = 0;
    int i;
    int mode = 0;
    char c, *s;
    long cutoff;
    int cutlim;

    if (Z_TYPE_P(arg) != IS_STRING || base < 2 || base > 36) {
        return FAILURE;
    }

    s = Z_STRVAL_P(arg);

    cutoff = LONG_MAX / base;
    cutlim = LONG_MAX % base;
    
    for (i = Z_STRLEN_P(arg); i > 0; i--) {
        c = *s++;

        /* might not work for EBCDIC */
        if (c >= '0' && c <= '9') 
            c -= '0';
        else if (c >= 'A' && c <= 'Z') 
            c -= 'A' - 10;
        else if (c >= 'a' && c <= 'z') 
            c -= 'a' - 10;
        else
            continue;

        if (c >= base)
            continue;
        
        switch (mode) {
        case 0: /* Integer */
            if (num < cutoff || (num == cutoff && c <= cutlim)) {
                num = num * base + c;
                break;
            } else {
                fnum = num;
                mode = 1;
            }
            /* fall-through */
        case 1: /* Float */
            fnum = fnum * base + c;
        }   
    }

    if (mode == 1) {
        ZVAL_DOUBLE(ret, fnum);
    } else {
        ZVAL_LONG(ret, num);
    }
    return SUCCESS;
}
/* }}} */

代码太长看起来很烦,关键是这一段:

 for (i = Z_STRLEN_P(arg); i > 0; i--) {
        c = *s++;

        /* might not work for EBCDIC */
        if (c >= '0' && c <= '9') 
            c -= '0';
        else if (c >= 'A' && c <= 'Z') 
            c -= 'A' - 10;
        else if (c >= 'a' && c <= 'z') 
            c -= 'a' - 10;
        else
            continue;

        if (c >= base)
            continue;

遍历字符串,碰到除了 [0-9a-zA-Z] 以外的字符只是用 continue 直接跳到下一次循环,所以其他字符并不影响进制的转换。而且当 c 大于 base 时也是直接跳到下一次循环,所以该进制以外的其他字母亦不会影响进制的转换。这是 base_convert 函数的一个 BUG 呢,还是设计者有意为之?

本文转载自:http://demon.tw/programming/php-base_convert.html