lang3 的 split 方法誤用

阿新 • • 發佈：2019-12-31

lang3 的 split 方法誤用-原文連結
apache 的 lang3 是我們開發常用到的三方工具包，然而對這個包不甚瞭解的話，會產生莫名其秒的 bug ，在這裡做下記錄。

誤用示例

public class TestDemo {

    @Test
    public void test() throws IOException {
        String sendMsg = "{\"expiredTime\":\"20190726135831\",\"drives\":\"androidgetui\",\"msgBody\":\"{\\\"serialNumber\\\":\\\"wow22019072611349502\\\",\\\"push_key\\\":\\\"appactive#549110277\\\",\\\"title\\\":\\\"\\xe6\\x9c\\x89\\xe4\\xba\\xba@\\xe4\\xbd\\xa0\\\",\\\"message\\\":\\\"\\xe4\\xbb\\x8a\\xe5\\xa4\\xa9\\xe5\\x87\\xa0\\xe7\\x82\\xb9\\xe5\\x87\\xba\\xe5\\x8f\\x91\\xef\\xbc\\x9f\\\",\\\"link\\\":\\\"chelaile://homeTab/home?select=3\\\",\\\"open_type\\\":0,\\\"expireDays\\\":\\\"30\\\",\\\"type\\\":14}\",\"clients\":[\"13065ffa4e25c4a7c68\"]}CHELAILE_PUSH{\"cityId\":\"007\",\"gpsTime\":\"2019-07-24 21:33:06\",\"lat\":\"30.605916\",\"lng\":\"103.980439\",\"s\":\"android\",\"sourceUdid\":\"a4419b93-fb0e-43c7-98fa-5b7c18255660\",\"token\":\"13065ffa4e25c4a7c68\",\"tokenType\":\"3\",\"udid\":\"UDID2TOKEN#a4419b93-fb0e-43c7-98fa-5b7c18255660\",\"userCreateTime\":\"2018-04-20 08:13:32\",\"userLastActiveTime\":\"2019-07-24 21:33:06\",\"vc\":\"150\"}" 
;
        String[] dataArr = StringUtils.split(sendMsg,"CHELAILE_PUSH");
        Assert.assertEquals(dataArr.length,2);
    }
}
複製程式碼

分析原因

通過分析字串的拆分結果，發現該方法並不是將分隔符去擷取字串，而是將分隔符的每一個字元都當成分隔符去擷取字串，當我們的分隔符是一個字元的時候一般不會出現上面示例中出現的問題，如果分隔符是多個字元的時候這個問題就顯現出來了。

檢視 StringUtils 原始碼

    /**
     * 
     * <pre>
     * StringUtils.split(null,*)         = null
     * StringUtils.split("",*)           = []
     * StringUtils.split("abc def",null) = ["abc","def"]
     * StringUtils.split("abc def"," ")  = ["abc","def"]
     * StringUtils.split("abc  def"," ") = ["abc","def"]
     * StringUtils.split("ab:cd:ef",":") = ["ab","cd","ef"]
     * </pre>
     *
     * @param 
 str  要解析的字串，可能為空
     * @param separatorChars  用做分割字元的字元們（注意是字串們哦！），當 separatorChars 傳入的值為空的時候則用空格來做分隔符
     */
    public static String[] split(final String str,final String separatorChars) {
        return splitWorker(str,separatorChars,-1,false);
    }
    
    /**
     * Performs the logic for the {@code 
 split} and
     * {@code splitPreserveAllTokens} methods that return a maximum array
     * length.
     *
     * @param str  the String to parse,may be {@code null}
     * @param separatorChars the separate character
     * @param max  the maximum number of elements to include in the
     *  array. A zero or negative value implies no limit.
     * @param preserveAllTokens if {@code true},adjacent separators are
     * treated as empty token separators; if {@code false},adjacent
     * separators are treated as one separator.
     * @return an array of parsed Strings,{@code null} if null String input
     */
    private static String[] splitWorker(final String str,final String separatorChars,final int max,final boolean preserveAllTokens) {
        // Performance tuned for 2.0 (JDK1.4)
        // Direct code is quicker than StringTokenizer.
        // Also,StringTokenizer uses isSpace() not isWhitespace()

        if (str == null) {
            return null;
        }
        final int len = str.length();
        if (len == 0) {
            return ArrayUtils.EMPTY_STRING_ARRAY;
        }
        final List<String> list = new ArrayList<>();
        int sizePlus1 = 1;
        int i = 0,start = 0;
        boolean match = false;
        boolean lastMatch = false;
        if (separatorChars == null) {
            
            // 用空格作為分隔符切割字串
            while (i < len) {
                if (Character.isWhitespace(str.charAt(i))) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start,i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        } else if (separatorChars.length() == 1) {
            // 分隔符的字元數為 1 的時候，切割字串的邏輯
            final char sep = separatorChars.charAt(0);
            while (i < len) {
                if (str.charAt(i) == sep) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start,i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        } else {
            // 當分隔符的字元數為多個的時候，分割字串的邏輯
            // 示例：分隔字串 abc，分割字串的分隔符可以是 a,ab,abc
            while (i < len) {
                if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start,i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        }
        if (match || preserveAllTokens && lastMatch) {
            list.add(str.substring(start,i));
        }
        return list.toArray(new String[list.size()]);
    }
複製程式碼