基於JDK1.8的String原始碼學習筆記

阿新 • • 發佈：2019-01-15

String,可能是學習Java一上來就學習的，經常用，但是卻往往只是一知半解，甚至API有時也得現查。所以還是老規矩，倒騰原始碼。

一.java doc

這次首先關注String的doc，因為其實作為這麼完備的語言，我感覺java 的doc是寫的非常清楚的。

/*Strings are constant; their values cannot be changed after they
 * are created. String buffers support mutable strings.
 * Because String objects are immutable they can be shared.
 * String的值一旦給定就不能改變，所以其是可以share，可以適應多執行緒
 * /

 /**
  * The Java language provides special support for the string
 * concatenation operator (&nbsp;+&nbsp;), and for conversion of
 * other objects to strings. String concatenation is implemented
 * through the {@code StringBuilder}(or {@code StringBuffer})
 * class and its {@code append} method.
 * String 對於+運算子提供了特殊的支援，但是是通過StringBuilder或者StringBuffer支援的，+運算子玄機很大啊
   
*/
 
 /**
  * String conversions are implemented through the method
 * {@code toString}, defined by {@code Object} and
 * inherited by all classes in Java
 * 這裡指的是toString()方法，繼承自Object的方法
  */

二.定義，String類定義

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence

    
//可以看到String是final型別的，繼承了Comparable介面，則需要實現compareTo比較功能，
   //同時實現了Serializable介面，可以序列化
   //至於CharSequence介面，倒是第一次見到
   定義如下：
    *A <tt>CharSequence</tt> is a readable sequence of <code>char</code> values. This
     * interface provides uniform, read-only access to many different kinds of
      
* <code>char</code> sequences.

charSequence,StringBuilder,StringBuffer同樣實現了該介面

三.重要屬性

/** The value is used for character storage. */
private final char value[];
這是所有屬性裡最重要的了，可以看到String是以char陣列的形式儲存在這裡的，而且他是final型別的，證明其不可變（immutable），這也就是為什麼String 不可變的原因。不過對於強迫症的我來說，為什麼要把【】放在變數名上，而不是型別上。《JAVA程式設計思想》可不是這麼推薦的啊！！！

四.構造器

(1)無參構造器

/**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    public String() {
        this.value = new char[0];
    }

將建立一個包含0個字元的字串序列。
可以看到由於String是不可變的，所以如果使用無參構造器，則完全沒有必要！！

（2）String 引數

/**
     * Initializes a newly created {@code String} object so that it represents
     * the same sequence of characters as the argument; in other words, the
     * newly created string is a copy of the argument string. Unless an
     * explicit copy of {@code original} is needed, use of this constructor is
     * unnecessary since Strings are immutable.
     *
     * @param  original
     *         A {@code String}
     */
    public String(String original) {
        this.value = original.value;
        this.hash = original.hash;
    }

可以看到只是將value引用指向original中的value陣列，因為兩者都是final的，所以這個看來也沒那麼必要。因為String s1=new String("s1s1"); String s2=new String(s1);這種用法完全沒有必要，而不如直接引用，s2=s1;

(3)char[]引數

public String(char value[]) {
        this.value = Arrays.copyOf(value, value.length);
    }

可以發現當通過char陣列構建時，只是將char陣列複製到value中，而且是複製，而不是簡單的引用相等。

public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

與上面的區別是，這裡只是利用char陣列中的一部分來構建String，其中offset代表起始下標，count是所有構建的長度。

(4)byte[]

所謂好的適用性模組，一定是能有一坨坨的各種適應程式碼的。下面是一系列的利用byte[]陣列來構建String物件的構造器，主要差別是可能需要指定特殊的字符集來解碼，但是這一點其實在web程式設計，網路程式設計中還是很重要的。

public String(byte bytes[], Charset charset) {
        this(bytes, 0, bytes.length, charset);
    }
public String(byte bytes[], int offset, int length) {
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(bytes, offset, length);
    }//採用預設的字符集從byte陣列中offset開始，長度為length構建String物件
public String(byte bytes[]) {
        this(bytes, 0, bytes.length);
    }
public String(byte bytes[], int offset, int length, Charset charset) {
        if (charset == null)
            throw new NullPointerException("charset");
        checkBounds(bytes, offset, length);
        this.value =  StringCoding.decode(charset, bytes, offset, length);
    }//指定了字符集，起始位置，以及長度

(5)基於StringBuilder,StringBuffer引數

public String(StringBuffer buffer) {
        synchronized(buffer) {
            this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
        }
    }//由於不是原子性操作，仍然使用了同步方法synchronized

    public String(StringBuilder builder) {
        this.value = Arrays.copyOf(builder.getValue(), builder.length());
    }

其實與toString()方法，效果一樣。更習慣於toString()方法。

五. 重要方法

(1)length()

public int length() {
        return value.length;
    }

返回字串中所包含的字元數目，即value陣列的長度

(2)isEmpty()

public boolean isEmpty() {
        return value.length == 0;
    }

判斷字串是否為空，即判斷value陣列的長度為0即可

(3)charAt(int index)

public char charAt(int index) {
        if ((index < 0) || (index >= value.length)) {
            throw new StringIndexOutOfBoundsException(index);
        }
        return value[index];
    }

返回第index個字元，即只需要檢索value陣列即可

(4)getBytes()

public byte[] getBytes() {
        return StringCoding.encode(value, 0, value.length);
    }

    public byte[] getBytes(String charsetName)
            throws UnsupportedEncodingException {
        if (charsetName == null) throw new NullPointerException();
        return StringCoding.encode(charsetName, value, 0, value.length);
    }//以指定字符集編碼

    public byte[] getBytes(Charset charset) {
        if (charset == null) throw new NullPointerException();
        return StringCoding.encode(charset, value, 0, value.length);
    }

String物件轉為byte陣列

(5)equals()

public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

可以看到equals方法重寫了，會判斷兩個字串的每一個字元是否相等。

(6)compareTo(String anotherString)

public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

比較兩個字串的大小。如果兩個字串的字元序列相等，則返回0；不相等時，從兩個字串第0個字元開始比較，返回第一個不相等的字元差。另一種情況，較長的字串的前面部分恰好是較短的字串，則返回他們的長度差。

(7)regionMatches(int toffset,String other,int ooffset,int len)

/* @param   toffset   the starting offset of the subregion in this string.
     * @param   other     the string argument.
     * @param   ooffset   the starting offset of the subregion in the string
     *                    argument.
     * @param   len       the number of characters to compare.
     * @return  {@code true} if the specified subregion of this string
     *          exactly matches the specified subregion of the string argument;
     *          {@code false} otherwise.
     */
    public boolean regionMatches(int toffset, String other, int ooffset,
            int len) {
        char ta[] = value;
        int to = toffset;
        char pa[] = other.value;
        int po = ooffset;
        // Note: toffset, ooffset, or len might be near -1>>>1.
        if ((ooffset < 0) || (toffset < 0)
                || (toffset > (long)value.length - len)
                || (ooffset > (long)other.value.length - len)) {
            return false;
        }
        while (len-- > 0) {
            if (ta[to++] != pa[po++]) {
                return false;
            }
        }
        return true;
    }

判斷部分子字串是否相等，主要用來判斷一段區間內是否相等。

(8)equalsIgnoreCase(String anotherString)

public boolean equalsIgnoreCase(String anotherString) {
        return (this == anotherString) ? true
                : (anotherString != null)
                && (anotherString.value.length == value.length)
                && regionMatches(true, 0, anotherString, 0, value.length);
    }
    判斷兩個字串在忽略大小寫的情況下是否相等，主要呼叫regionMatches方法

    public boolean regionMatches(boolean ignoreCase, int toffset,
            String other, int ooffset, int len) {
        char ta[] = value;
        int to = toffset;
        char pa[] = other.value;
        int po = ooffset;
        // Note: toffset, ooffset, or len might be near -1>>>1.
        if ((ooffset < 0) || (toffset < 0)
                || (toffset > (long)value.length - len)
                || (ooffset > (long)other.value.length - len)) {
            return false;
        }
        while (len-- > 0) {
            char c1 = ta[to++];
            char c2 = pa[po++];
            //在這裡先行判斷，如果相等就直接跳過後面即可，可以提高效率
            if (c1 == c2) {
                continue;
            }
            if (ignoreCase) {
                // If characters don't match but case may be ignored,
                // try converting both characters to uppercase.
                // If the results match, then the comparison scan should
                // continue.
                char u1 = Character.toUpperCase(c1);
                char u2 = Character.toUpperCase(c2);
                //都轉換成大寫的形式，如果相等，則跳過
                if (u1 == u2) {
                    continue;
                }
                // Unfortunately, conversion to uppercase does not work properly
                // for the Georgian alphabet, which has strange rules about case
                // conversion.  So we need to make one last check before
                // exiting.
                if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
                    continue;
                }
            }
            return false;
        }
        return true;
    }

可以看出來這個判斷方法並不難，但是每一處程式碼都是為了提高效率。

(9)startsWith(String prefix, int toffset)

public boolean startsWith(String prefix, int toffset) {
        char ta[] = value;
        int to = toffset;
        char pa[] = prefix.value;
        int po = 0;
        int pc = prefix.value.length;
        // Note: toffset might be near -1>>>1.
        if ((toffset < 0) || (toffset > value.length - pc)) {
            return false;
        }
        while (--pc >= 0) {
            if (ta[to++] != pa[po++]) {
                return false;
            }
        }
        return true;
    }

該物件從offset位置算起，是否以prefix開始。

public boolean startsWith(String prefix) {
        return startsWith(prefix, 0);
    }

判斷String是否以prefix字串開始。

(10)endsWith(String suffix)

public boolean endsWith(String suffix) {
        return startsWith(suffix, value.length - suffix.value.length);
    }

判斷String是否以suffix結尾，可以看到其直接複用了startsWith。

(11)indexOf(int ch)

public int indexOf(int ch) {
        return indexOf(ch, 0);
    }//可以直接定位ch第一次出現時的下標
    通過呼叫indexOf(int ch,int fromIndex)來實現

public int indexOf(int ch, int fromIndex) {
        final int max = value.length;
        if (fromIndex < 0) {
            fromIndex = 0;
        } else if (fromIndex >= max) {
            // Note: fromIndex might be near -1>>>1.
            return -1;
        }
        
        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            // handle most cases here (ch is a BMP code point or a
            // negative value (invalid code point))
            final char[] value = this.value;
            for (int i = fromIndex; i < max; i++) {
                if (value[i] == ch) {
                    return i;
                }
            }
            return -1;
        } else {
            return indexOfSupplementary(ch, fromIndex);
        }
    }//找出ch字元在該字串中從fromIndex開始後第一次出現的位置

而我們應用這個方法時卻只是這樣應用
   String s="abcdefg";
   int idx=s.indexOf('f');//idx=5
   可見我們並沒有直接傳入一個int型的引數，而是直接傳入char型
   這裡其實涉及到了自動型別轉換中的，自動提升問題，當把一個表數範圍小的數值或變數直接賦給另一個表數範圍大的變數時，系統將可以進行自動型別轉換。也就是這裡char型別自動轉為了int型。

(12)lastIndexOf(int ch)

public int lastIndexOf(int ch) {
        return lastIndexOf(ch, value.length - 1);
    }
//找出ch字元在該字串中最後一次出現的位置
public int lastIndexOf(int ch, int fromIndex) {
        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            // handle most cases here (ch is a BMP code point or a
            // negative value (invalid code point))
            final char[] value = this.value;
            int i = Math.min(fromIndex, value.length - 1);
            for (; i >= 0; i--) {
                if (value[i] == ch) {
                    return i;
                }
            }
            return -1;
        } else {
            return lastIndexOfSupplementary(ch, fromIndex);
        }
    }

返回值：在此物件表示的字元序列（小於等於fromIndex）中最後一次出現該字元的索引；如果在該點之前未出現該字元，則返回-1。

(13)indexOf(String str)

public int indexOf(String str) {
            return indexOf(str, 0);
        }

 public int indexOf(String str, int fromIndex) {
            return indexOf(value, 0, value.length,
                    str.value, 0, str.value.length, fromIndex);
 }

找出str子字串在該字串中第一次出現的位置。

最終呼叫的程式碼，為下面的程式碼，這裡可能有點亂，但是隻要理清楚這幾個引數即可理清楚整個過程了。

/* @param   source       the characters being searched.//這裡就是value陣列
         * @param   sourceOffset offset of the source string./ //源字串的偏移量
         * @param   sourceCount  count of the source string.    //這裡是value陣列的長度
         * @param   target       the characters being searched for.  //待搜尋目標字串
         * @param   targetOffset offset of the target string.   //待搜尋目標字串的偏移量
         * @param   targetCount  count of the target string.   //待搜尋目標字串的長度
         * @param   fromIndex    the index to begin searching from. //起始位置
         */
        static int indexOf(char[] source, int sourceOffset, int sourceCount,
            char[] target, int targetOffset, int targetCount,
            int fromIndex) {
            if (fromIndex >= sourceCount) {//越界了
                return (targetCount == 0 ? sourceCount : -1);
            }
            if (fromIndex < 0) {
                fromIndex = 0;
            }
            if (targetCount == 0) {
                return fromIndex;
            }

            char first = target[targetOffset];//待搜尋字串第一個字元
            int max = sourceOffset + (sourceCount - targetCount);//搜尋第一個匹配的字元時所能達到的最大值，因為要保證後面的長度>=targetCount

            //下面這裡就是核心搜尋演算法了，會先匹配第一個字元，然後依次向後移，直到完全匹配
            //或者是匹配到max仍然沒有匹配成功
            for (int i = sourceOffset + fromIndex; i <= max; i++) {
                /* Look for first character. */
                if (source[i] != first) {
                    while (++i <= max && source[i] != first);
                }

                /* Found first character, now look at the rest of v2 */
                //可以注意這裡i下標只是用來匹配第一個字元，因為有可能部分匹配時，需要從先在匹配
                //所以這裡重新應用下標j
                if (i <= max) {
                    int j = i + 1;
                    int end = j + targetCount - 1;
                    for (int k = targetOffset + 1; j < end && source[j]
                            == target[k]; j++, k++);

                    if (j == end) {
                        /* Found whole string. */
                        return i - sourceOffset;
                    }
                }
            }
            return -1;
        }//當匹配失敗時，返回-1

這段搜尋匹配的程式碼寫的非常漂亮，程式碼簡潔而且清晰。感覺哪怕分析String原始碼看到這一段也值了。

(14)lastIndexOf(String str)

public int lastIndexOf(String str) {
      return lastIndexOf(str, value.length);//這裡fromIndex傳入的是value陣列的長度，因為要進行的是倒序匹配，表明從最後一個字元開始
}

找出str子字串在該字串中最後一次出現的位置。.呼叫的程式碼如下：

/*Returns the index within this string of the last occurrence of the
  * specified substring, searching backward starting at the specified index.
 * <p>The returned index is the largest value <i>k</i> for which:
 * <blockquote><pre>
 * <i>k</i> {@code <=} fromIndex
 */
//  這裡說的真繞，也就是要搜尋返回的字串下標要小於等於fromIndex,然後再是其中的最大值, 也就//是正向起始搜尋位置最大值為fromIndex,fromIndex為開始反向搜尋的索引位置
public int lastIndexOf(String str, int fromIndex) {
     return lastIndexOf(value, 0, value.length,str.value, 0, str.value.length, fromIndex);
   }

最終呼叫的方法如下，與上面的方法類似，只不過這次是從後搜尋，所以匹配也倒著匹配從最後一個字元匹配。

static int lastIndexOf(char[] source, int sourceOffset, int sourceCount,
            char[] target, int targetOffset, int targetCount,
            int fromIndex) {
                /*
                 * Check arguments; return immediately where possible. For
                 * consistency, don't check for null str.
                 */
                //第一個字元所能匹配的最大位置，類似於上面的max
                int rightIndex = sourceCount - targetCount;
                if (fromIndex < 0) {
                    return -1;
                }
                if (fromIndex > rightIndex) {
                    fromIndex = rightIndex;
                }
                /* Empty string always matches. */
                if (targetCount == 0) {
                    return fromIndex;
                }

                int strLastIndex = targetOffset + targetCount - 1;//目標字串最後一個字元下標
                char strLastChar = target[strLastIndex];//最後一個字元
                int min = sourceOffset + targetCount - 1;//目標字串最後一個字元所能匹配的源字串最小下標
                int i = min + fromIndex;//這裡i下標永遠是最後一個字元匹配的下標索引

            startSearchForLastChar:
                while (true) {
                    while (i >= min && source[i] != strLastChar) {
                        i--;
                    }
                    //小於min則不可能在搜尋到了
                    if (i < min) {
                        return -1;
                    }
                    int j = i - 1;
                    int start = j - (targetCount - 1);
                    int k = strLastIndex - 1;

                    while (j > start) {
                        if (source[j--] != target[k--]) {
                            //當存在部分匹配，而前半部分不匹配時，跳出當前查詢，整體向前窗移
                            i--;
                            continue startSearchForLastChar;//直接跳到頂層while迴圈
                        }
                    }
                    return start - sourceOffset + 1;
                }
            }

可以看到與indexOf方法是對應的，只不過是反向搜尋。

在這裡是時候來幾組例項來學習一下子字串匹配問題了
            public static void main(String[] args){
                String s1="java java java";
                //indexOf兩個匹配問題
                System.out.println(s1.indexOf("java"));//輸出0
                System.out.println(s1.indexOf("java",2));//輸出5，大於等於2,從2開始搜尋起始點
                System.out.println(s1.indexOf("java",9));//輸出10，大於等於9，從9開始搜尋起始點
            }

            public static void main(String[] args){
                String s1="java java java";
                //接下來是lastIndexOf
                System.out.println(s1.lastIndexOf("java"));//輸出為10
                System.out.println(s1.lastIndexOf("java",2));//輸出為0，返回值要小於等於2，從2開始，向左邊搜尋起始點
                System.out.println(s1.lastIndexOf("java",9));//輸出為5，返回值要小於等於9，從9開始，向左邊搜尋起始點
            }

(15)substring(int beginIndex)

public String substring(int beginIndex) {
            if (beginIndex < 0) {
                throw new StringIndexOutOfBoundsException(beginIndex);
            }
            int subLen = value.length - beginIndex;
            if (subLen < 0) {
                throw new StringIndexOutOfBoundsException(subLen);
            }
            return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
  }

這裡要注意的是這個方法是substring，而不是subString;

獲取從beginIndex開始到結束的子字串，而這裡返回一個新建的String物件.

/*
         * Returns a string that is a substring of this string. The
         * substring begins at the specified {@code beginIndex} and
         * extends to the character at index {@code endIndex - 1}.
         * Thus the length of the substring is {@code endIndex-beginIndex}.
         */
  public String substring(int beginIndex, int endIndex) {
            if (beginIndex < 0) {
                throw new StringIndexOutOfBoundsException(beginIndex);
            }
            if (endIndex > value.length) {
                throw new StringIndexOutOfBoundsException(endIndex);
            }
            int subLen = endIndex - beginIndex;
            if (subLen < 0) {
                throw new StringIndexOutOfBoundsException(subLen);
            }
            return ((beginIndex == 0) && (endIndex == value.length)) ? this
                    : new String(value, beginIndex, subLen);
 }

獲取從beginIndex位置開始到endIndex位置的子字串，但是這裡不包含endIndex，因為長度為endIndex-beginIndex;

(16)concat(String str)

public String concat(String str) {
            int otherLen = str.length();
            if (otherLen == 0) {
                return this;
            }
            int len = value.length;
            char buf[] = Arrays.copyOf(value, len + otherLen);
            str.getChars(buf, len);
            return new String(buf, true);
    }

將該String物件與str連線在一起，與+運算子功能相同,但是可以看到已經新new一個String物件了，所以對於String物件慎用==，一定要用equals()

這個方法主要呼叫了getChars(buf,len)方法，而getChars方法只是一個數組複製包裝方法;

/**
         * Copy characters from this string into dst starting at dstBegin.
         * This method doesn't perform any range checking.
         */
void getChars(char dst[], int dstBegin) {
            System.arraycopy(value, 0, dst, dstBegin, value.length);
    }

同時他仍然有一個public 呼叫版本，對外方法

/*
         * Copies characters from this string into the destination character
         * array.
         * <p>
         * The first character to be copied is at index {@code srcBegin};
         * the last character to be copied is at index {@code srcEnd-1}
         * (thus the total number of characters to be copied is
         * {@code srcEnd-srcBegin}). The characters are copied into the
         * subarray of {@code dst} starting at index {@code dstBegin}
         * and ending at index:
         * <blockquote><pre>
         *     dstbegin + (srcEnd-srcBegin) - 1
         * </pre></blockquote>
         *
         * @param      srcBegin   index of the first character in the string
         *                        to copy.
         * @param      srcEnd     index after the last character in the string
         *                        to copy.
         * @param      dst        the destination array.
         * @param      dstB