String源碼分析
一、類定義
public final class String implements java.io.Serializable, Comparable<String>, CharSequence {...}
- final型,表示不能被繼承,並對象初始化後不能改變。
- 實現了Serializable,表示可以序列化和反序列化。
- 實現了Comparable,表示需要完成compareTo(String s)方法,用於比較
- 實現了CharSequence,包含了length():int , charAt(int):char,subSequence(int,int):CharSequece,toString():String,chars():intStream,codePoints():IntStream.
二、成員變量
private final char value[]; private int hash; private static final long serialVersionUID = -6849794470754667710L; private static final ObjectStreamField[] serialPersistentFields = new ObjectStreamField[0]; public static final Comparator<String> CASE_INSENSITIVE_ORDER = new CaseInsensitiveComparator();
value 作為string的底層實現,為字符數組。
hash 為字符串的hashcode
serialVersionUID 作為系列化和反序列化的標誌
serialPersistentFields ObjectStreamFields數組用來聲明一個類的序列化字段。類中未使用
CASE_INSENSITIVE_ORDER 用於做無大小寫排序用的比較器,一個內部類生成的比較器
三、方法
2.1 構造方法
(1)字符串作為參數
public String(){ this.value = "".value}; public String(String original){ this.value=original.value; this.hash=original.hash; }
用一個String類型的對象來初始化一個String。這裏將直接將源String中的value和hash兩個屬性直接賦值給目標String。
(2)字符數組作為參數
public String(char value[]){
this.value=Arrays.copyOf(value, value.length)
}
public String(char value[],int offest, int count){
if(offest<0){
throw new StringIndexOutOfBoundsException(count);
}
if(offest <=0){
if(count<0){throw new StringIndexOutOfBoundsException(count);}
if(offest<=value.length){this.value = "".vlaue; return;}
}
if(offest>value.length-count){
throw new StringIndexOutOfBoundsException(offset+ count);
}
this.value = Arrays.copyOfRange(value,offset,offset+count);
}
當我們使用字符數組創建String的時候,會用到Arrays.copyOf方法和Arrays.copyOfRange方法。這兩個方法是將原有的字符數組中的內容逐一的復制到String中的字符數組中。
(3)int數組作為參數
public String(int[] codePoints, int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= codePoints.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1.
if (offset > codePoints.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
final int end = offset + count;
// Pass 1: Compute precise size of char[]
int n = count;
for (int i = offset; i < end; i++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
continue;
else if (Character.isValidCodePoint(c))
n++;
else throw new IllegalArgumentException(Integer.toString(c));
}
// Pass 2: Allocate and fill in char[]
final char[] v = new char[n];
for (int i = offset, j = 0; i < end; i++, j++) {
int c = codePoints[i];
if (Character.isBmpCodePoint(c))
v[j] = (char)c;
else
Character.toSurrogates(c, v, j++);
}
this.value = v;
}
(4) 字節數組作為參數
public String(byte bytes[], int offset, int length, String charsetName)
throws UnsupportedEncodingException {
if (charsetName == null)
throw new NullPointerException("charsetName");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charsetName, bytes, offset, length);
}
public String(byte bytes[], int offset, int length, Charset charset) {
if (charset == null)
throw new NullPointerException("charset");
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(charset, bytes, offset, length);
}
public String(byte bytes[], String charsetName)
throws UnsupportedEncodingException {
this(bytes, 0, bytes.length, charsetName);
}
public String(byte bytes[], Charset charset) {
this(bytes, 0, bytes.length, charset);
}
public String(byte bytes[], int offset, int length) {
checkBounds(bytes, offset, length);
this.value = StringCoding.decode(bytes, offset, length);
}
public String(byte bytes[]) {
this(bytes, 0, bytes.length);
}
byte是網絡傳輸或存儲的序列化形式。byte[]和String之間的相互轉換就不得不關註編碼問題。String(byte[] bytes, Charset charset)是指通過charset來解碼指定的byte數組,將其解碼成unicode的char[]數組,夠造成新的String。 其中都用到了decode函數,具體如下:
static char[] decode(String charsetName, byte[] ba, int off, int len)
throws UnsupportedEncodingException
{
StringDecoder sd = deref(decoder);
String csn = (charsetName == null) ? "ISO-8859-1" : charsetName;
if ((sd == null) || !(csn.equals(sd.requestedCharsetName())
|| csn.equals(sd.charsetName()))) {
sd = null;
try {
Charset cs = lookupCharset(csn);
if (cs != null)
sd = new StringDecoder(cs, csn);
} catch (IllegalCharsetNameException x) {}
if (sd == null)
throw new UnsupportedEncodingException(csn);
set(decoder, sd);
}
return sd.decode(ba, off, len);
}
可以如是不指定字符集的話,則會用默認的ISO-8859-1字符集解碼
(5)StringBuffer和StringBulider作為參數
public String(StringBuffer buffer) {
synchronized(buffer) {
this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
}
}
public String(StringBuilder builder) {
this.value = Arrays.copyOf(builder.getValue(), builder.length());
}
關於效率問題,Java的官方文檔有提到說使用StringBuilder的toString方法會更快一些,原因是StringBuffer的toString方法是synchronized的,在犧牲了效率的情況下保證了線程安全。
2.2 常用方法
length() 返回字符串長度
isEmpty() 返回字符串是否為空
charAt(int index) 返回字符串中第(index+1)個字符
char[] toCharArray() 轉化成字符數組
trim() 去掉兩端空格
toUpperCase() 轉化為大寫
toLowerCase() 轉化為小寫
String concat(String str) //拼接字符串
String replace(char oldChar, char newChar) //將字符串中的oldChar字符換成newChar字符
//以上兩個方法都使用了String(char[] value, boolean share);
boolean matches(String regex) //判斷字符串是否匹配給定的regex正則表達式
boolean contains(CharSequence s) //判斷字符串是否包含字符序列s
String[] split(String regex, int limit) 按照字符regex將字符串分成limit份。
String[] split(String regex)
getBytes
public byte[] getBytes(String charsetName)throws UnsupportedEncodingException {
if (charsetName == null) throw new NullPointerException();
return StringCoding.encode(charsetName, value, 0, value.length);
}
public byte[] getBytes(Charset charset) {
if (charset == null) throw new NullPointerException();
return StringCoding.encode(charset, value, 0, value.length);
}
比較方法
boolean equals(Object anObject);
boolean contentEquals(StringBuffer sb);
boolean contentEquals(CharSequence cs);
boolean equalsIgnoreCase(String anotherString);
int compareTo(String anotherString);
int compareToIgnoreCase(String str);
boolean regionMatches(int toffset, String other, int ooffset,int len) //局部匹配
boolean regionMatches(boolean ignoreCase, int toffset,String other, int ooffset, int len) //局部匹配
其中比較有特點的:
public boolean equals(Object anObject) {
if (this == anObject) { //判斷兩個對象是否是指向同一內存地址的
return true;
}
if (anObject instanceof String) { //判斷兩個字符串的值是否相同
String anotherString = (String)anObject;
int n = value.length;
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
其中的局部匹配使用 參考
判斷字符串開始結束字符串
public boolean startsWith(String prefix, int toffset) { //prefix前綴, toffset開始比較的位置
char ta[] = value;
int to = toffset;
char pa[] = prefix.value;
int po = 0;
int pc = prefix.value.length;
// Note: toffset might be near -1>>>1.
if ((toffset < 0) || (toffset > value.length - pc)) {
return false;
}
while (--pc >= 0) {
if (ta[to++] != pa[po++]) {
return false;
}
}
return true;
}
同理有:
public boolean startsWith(String prefix){}
public boolean endsWith(String suffix) {return startsWith(suffix, value.length - suffix.value.length);}
四、總結
String對象是不可改變的,賦值給字符串引用以新的引用時,實際是改變其指向的內存地址,但是原內存的值是沒有改變的。
String源碼分析