Java 記憶體分配
Java 記憶體分配
原文:JVM專題2: JVM記憶體結構 - Milton - 部落格園 (cnblogs.com)
JVM 記憶體結構
The JVM is an abstract computing machine that enables a computer to run a Java program. There are three notions of JVM:
specification(where working of JVM is specified. But the implementation has been provided by Sun and other companies),
implementation
instance(after writing Java command, to run Java class, an instance of JVM is created).
The JVM loads the code, verifies the code, executes the code, manages memory (this includes allocating memory from the Operating System (OS), managing Java allocation including heap compaction and removal of garbage objects) and finally provides the runtime environment.
JVM memory is divided into multiple parts: Heap Memory, Non-Heap Memory and Other.
Heap memory
Heap memory is the run time data area from which the memory for all java class instances and arrays is allocated. The heap is created when the JVM starts up and may increase or decrease in size while the application runs. The size of the heap can be specified using –Xms VM option. The heap can be of fixed size or variable size depending on the garbage collection strategy. Maximum heap size can be set using –Xmx option. By default, the maximum heap size is set to 64 MB.
Non-Heap memory
The JVM has memory other than the heap, referred to as Non-Heap Memory. It is created at the JVM startup and stores per-class structures such as runtime constant pool, field and method data, and the code for methods and constructors, as well as interned Strings. The default maximum size of non-heap memory is 64 MB. This can be changed using –XX:MaxPermSize VM option.
Other memory
JVM uses this space to store the JVM code itself, JVM internal structures, loaded profiler agent code and data, etc.
JVM記憶體劃分
堆記憶體
這一部分儲存的是執行時產生的java物件例項, 陣列等.
非堆記憶體
儲存的是類結構, 常量和構造方法這些的資料和程式碼, 執行時棧記憶體, 還有字串等. PermGen屬於非堆記憶體, 在JAVA8之後, 字串常量被挪到堆記憶體了, 類和方法定義以及執行時常量池挪到了MetaSpace, 但是MetaSpace不屬於JVM記憶體, 而是原生記憶體.
其他記憶體
這塊主要是用來儲存JVM自己的程式碼, JVM的結構和資料等.
一個堆大小為2G的JVM可能佔用的記憶體
= 堆記憶體 + (執行緒數 * 執行緒棧) + 永久代 + 二進位制程式碼 + 堆外記憶體
= 2G + (1000 * 1M) + 256M + 48~240M + (about 2G)
= 5.xG
- 堆記憶體: 儲存Java物件, 預設為實體記憶體的1/64, 由Xms, Xmx, Xmn等引數控制
- 執行緒棧: 儲存區域性變數(原子型別, 引用)及其他, JDK5以後每個執行緒堆疊大小為1M, 以前每個執行緒堆疊大小為256K, 根據應用的執行緒所需記憶體大小進行調整. 在相同實體記憶體下,減小這個值能生成更多的執行緒. 但是作業系統對一個程序內的執行緒數還是有限制的, 不能無限生成, 經驗值在3000 - 5000左右
- 永久代: 儲存類定義及常量池, JDK7以前為
PermSize
,MaxPermSize
, JDK8之後為MetaspaceSize
,MaxMetaspaceSize
- 二進位制程式碼: JDK7與8, 開啟多層編譯時的預設值不一樣, 從48到240M
- 堆外記憶體: 被Netty 堆外快取等使用, 預設最大值約為堆記憶體大小
也就是說堆記憶體為2G的JVM需要準備差不多4G記憶體, 一個例項如果有1000個執行緒, 可能需要佔到5.5G. 如果有3000個執行緒加上堆記憶體和PerGem/MetaSpace記憶體差不多要12G.
Java8 client takes Larger of 1/64th of your physical memory for your Xmssize (Minimum HeapSize) and Smaller of 1/4th of your physical memory for your -Xmxsize (Maximum HeapSize). But for large boxes, "1/4th RAM" rule of thumb definitely does not hold. On a 4-socket, 64gb per socket server (256gb RAM), Xmx defaults to 32gb. 32gb may be related to CompressedOops' limitations being at around this point, too
If you want to use 32-bit references, your heap is limited to32 GB. Note: You can access large direct memory and memory mapped sizes even if you use 32-bit references in your heap. i.e. use well above 32 GB.
However, if you are willing to use 64-bit references, the size is likely to be limited by your OS, just as it is with 32-bit JVM. e.g. on Windows 32-bit this is 1.2 to 1.5 GB.
Note: you will want your JVM heap to fit into main memory, ideally inside one NUMA region. That's about 1 TB on the bigger machines. If your JVM spans NUMA regions the memory access and the GC in particular will take much longer. If your JVM heap start swapping it might take hours to GC, or even make your machine unusable as it thrashes the swap drive.
參考
Java 8 中的常量池、字串池、包裝類物件池
常量池分為靜態常量池、執行時常量池.
靜態常量池在 .class 中, 執行時常量池在方法區中, JDK 1.8 中方法區(method area)已經被元空間(metaspace)代替.
字串池在JDK 1.7 之後被分離到堆區.
String str = new String("Hello world") 建立了 2 個物件, 一個駐留在字串池, 一個分配在 Java 堆, str指向堆上的例項.
String.intern() 能在執行時向字串池新增常量.
部分包裝類實現了池化技術, -128~127 以內的物件可以重用.
在 JDK 1.6 以及以前的版本中, String pool是放在 Perm 區(Permanent Generation). 字串如果不存在, 會在Perm區新建例項.
在 JDK 1.7 的版本及之後, String pool移到Java Heap, 字串如果不存在, 會在堆上新建例項.
String.intern()的大體實現: Java 呼叫 c++ 實現的 StringTable 的 intern() 方法, StringTable 的 intern() 方法跟 Java 中的 HashMap 的實現是差不多的, 只是不能自動擴容, 預設大小是1009.
字串池實際上是一個 HashTable, Java 中 HashMap 和 HashTable 的原理大同小異, 將字串池看作雜湊表更便於我們套用學習資料結構時的一些知識, 比如解決資料衝突時, HashMap 和 HashTable 使用的是開雜湊(或者說拉鍊法). 字串池實際存的是引用, 這些引用指向字串例項.
參考
Java 堆的結構是什麼樣子的?
說說各個區域的作用?
預設情況下, jvm會在每次垃圾回收後增長或收縮heap大小, 以便保持合適比例的空閒空間. 對於伺服器應用的heap大小, 有以下原則
- 給jvm設定儘可能多的記憶體, 預設的大小遠遠不夠
- 將-Xms和-Xmx設定為一樣, 避免jvm做heap大小決策
- 在增加處理器核數時也增加記憶體, 記憶體分配是可以同步處理的
堆記憶體的結構如下
- 一個年老代: 在保證程式正常執行的前提下, 設定10~20%的冗餘, 其餘分配給年輕代
- 一個年輕代
- 一個伊甸區: 一般初始化時設定伊甸區和倖存區的比例為 6:1:1, 可以用SurvivorRatio來調解倖存區的大小, 但是這個比例是會隨著YCG變化的
- 兩個倖存區: 這是用來做YCG用的, 在給定的時間總是有一個倖存區是空的, 在做YCG的時候, 會將需要保留的資料複製到空的倖存區, 再將原倖存區清空. 在每次垃圾回收時, jvm會選擇一個閾值, 即某個物件被移入老年代前要經歷的回收次數. 這個閾值取決於是否可以將倖存區保持50%可用空間. 引數 -XX:+PrintTenuringDistribution 可以用於顯示這個閾值以及各年輕代物件的年齡, 這對於獲取應用的物件的生命週期分佈特別有用
大多數情況下新物件都被分配在新生代中, 新生代由Eden Space和兩塊相同大小的Survivor Space組成, 後兩者主要用於Minor GC時的物件複製
JVM在Eden Space中會開闢一小塊獨立的TLAB(Thread Local Allocation Buffer)區域用於更高效的記憶體分配, 我們知道在堆上分配記憶體需要鎖定整個堆, 而在TLAB上則不需要, JVM在分配物件時會盡量在TLAB上分配, 以提高效率.
什麼是堆中的永久代(Perm Gen space)?
Permanent Generation or “Perm Gen” contains the application metadata required by the JVM to describe the classes and methods used in the application. Perm Gen is populated by JVM at runtime based on the classes used by the application. Perm Gen also contains Java SE library classes and methods. Perm Gen objects are garbage collected in a full garbage collection.
With Java 8, there is no Perm Gen, that means there is no more “java.lang.OutOfMemoryError: PermGen” space problems.Unlike Perm Gen which resides in the Java heap, Metaspace is not part of the heap
. Most allocations of the class metadata are now allocated out of native memory. Metaspace by default auto increases its size (up to what the underlying OS provides), while Perm Gen always has fixed maximum size. Two new flags can be used to set the size of the metaspace, they are: “-XX:MetaspaceSize” and “-XX:MaxMetaspaceSize”. The theme behind the Metaspace is that the lifetime of classes and their metadata matches the lifetime of the classloaders. That is, as long as the classloader is alive, the metadata remains alive in the Metaspace and can’t be freed.
http://www.openkb.info/2014/07/garbage-collection-in-permgen.html
儲存類定義及常量池, JDK7以前為PermSize, MaxPermSize, PermGen是heap的一部分,
在Java SE 6 Update 3 or earlier, 預設是不回收的, 可以配置,
在Java SE 6 Update 3之後, PermGen在Full GC時會預設被回收
在JAVA8之後PermGen被替代為MetaspaceSize, MaxMetaspaceSize,不再是heap的一部分
. 但也是會被GC的
JDK 1.6 下的 永久代 = 字串池 + 方法區 或者 永久代 = (包含字串池的)方法區. 永久代擁有了例項物件, 不符合虛擬機器規範.
JVM規範中執行時資料區域中的方法區, 在HotSpot虛擬機器中又被習慣稱為永生代或者永生區, Permanet Generation 中存放的為一些class的資訊、常量、靜態變數等資料, 當系統中要載入的類、反射的類和呼叫的方法較多時, Permanet Generation可能會被佔滿,在未配置為採用CMS GC的情況下也會執行Full GC
. 如果經過Full GC仍然回收不了, 那麼JVM會丟擲如下錯誤資訊:java.lang.OutOfMemoryError: PermGen space 為避免Perm Gen佔滿造成Full GC現象, 可採用的方法為增大Perm Gen空間或轉為使用CMS GC
Java 中會存在記憶體洩漏嗎, 簡述一下?
- 物件用完引用不釋放, 或者在static欄位上建立了大物件
- 建立intern大字串
- 資源用完不關閉, 例如stream, connection等
- 在HashSet裡新增未正確實現hashCode和equals方法的物件
You cannot really "leak memory" in Java unless you:
- intern strings
- generate classes
- leak memory in the native code called by jni
- keep references to things that you do not want in some forgotten or obscure place.
I take it that you are interested in the last case. The common scenarios are:
listeners, especially done with inner classes caches. A nice example would be to:
build a Swing gui that launches a potentially unlimited number of modal windows;
have the modal window do something like this during its initialization:
StaticGuiHelper.getMainApplicationFrame().getOneOfTheButtons().addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent e){
// do nothing...
}
})
The registered action does nothing, but it will cause the modal window to linger in memory forever, even after closing, causing a leak - since the listeners are never unregistered, and each anonymous inner class object holds a reference (invisible) to its outer object. What's more - any object referenced from the modal windows have a chance of leaking too.
Another answer:
- Static Field Holding Onto the Object Reference
The first scenario that might cause a Java memory leak is referencing a heavy object with a static field.
private Random random = new Random(); public static final ArrayList<Double> list = new ArrayList<Double>(1000000); @Test public void givenStaticField_whenLotsOfOperations_thenMemoryLeak() throws InterruptedException { for (int i = 0; i < 1000000; i++) { list.add(random.nextDouble()); }
System.gc(); Thread.sleep(10000); // to allow GC do its job
}
- Calling String.intern() on Long String
The second group of scenarios that frequently causes memory leaks involves String operations – specifically the String.intern() API.
@Test public void givenLengthString_whenIntern_thenOutOfMemory() throws IOException, InterruptedException { Thread.sleep(15000);
String str = new Scanner(new File("src/test/resources/large.txt"), "UTF-8") .useDelimiter("\\A").next(); str.intern(); System.gc(); Thread.sleep(15000);
}
- Unclosed Streams
Forgetting to close a stream is a very common scenario, and certainly, one that most developers can relate to. The problem was partially removed in Java 7 when the ability to automatically close all types of streams was introduced into the try-with-resource clause.
Why partially? Because the try-with-resources syntax is optional:
@Test(expected = OutOfMemoryError.class) public void givenURL_whenUnclosedStream_thenOutOfMemory() throws IOException, URISyntaxException { String str = ""; URLConnection conn = new URL("http://norvig.com/big.txt").openConnection(); BufferedReader br = new BufferedReader( new InputStreamReader(conn.getInputStream(), StandardCharsets.UTF_8));
while (br.readLine() != null) { str += br.readLine(); } //
}
- Unclosed Connections
This scenario is quite similar to the previous one, with the primary difference of dealing with unclosed connections (e.g. to a database, to an FTP server, etc.). Again, improper implementation can do a lot of harm, leading to memory problems.
@Test(expected = OutOfMemoryError.class) public void givenConnection_whenUnclosed_thenOutOfMemory() throws IOException, URISyntaxException {
URL url = new URL("ftp://speedtest.tele2.net"); URLConnection urlc = url.openConnection(); InputStream is = urlc.getInputStream(); String str = ""; //
}
- Adding Objects With no hashCode() and equals() Into a HashSet
A simple but very common example that can lead to a memory leak is to use a HashSet with objects that are missing their hashCode() or equals() implementations.
Specifically, when we start adding duplicate objects into a Set – this will only ever grow, instead of ignoring duplicates as it should. We also won’t be able to remove these objects, once added.
public class Key { public String key;
public Key(String key) { Key.key = key; }
}
Now, let’s see the scenario:
@Test(expected = OutOfMemoryError.class)
public void givenMap_whenNoEqualsNoHashCodeMethods_thenOutOfMemory()
throws IOException, URISyntaxException {
Map < Object, Object > map = System.getProperties();
while (true) {
map.put(new Key("key"), "value");
}
}
Java Stack 棧結構
Java棧由棧幀組成, 一個幀對應一個方法呼叫. 呼叫方法時壓入棧幀, 方法返回時彈出棧幀並拋棄.
Java棧的主要任務是儲存方法引數, 區域性變數, 中間運算結果, 並且提供部分其它模組工作需要的資料. 前面已經提到Java棧是執行緒私有的, 這就保證了執行緒安全性, 使得程式設計師無需考慮棧同步訪問的問題, 只有執行緒本身可以訪問它自己的區域性變數區.
它分為三部分: 區域性變數區、運算元棧、幀資料區
-
區域性變數區
區域性變數區是以字長為單位的陣列, 在這裡, byte、short、char型別會被轉換成int型別儲存, 除了long和double型別佔兩個字長以外, 其餘型別都只佔用一個字長. 特別地, boolean型別在編譯時會被轉換成int或byte型別, boolean陣列會被當做byte型別陣列來處理. 區域性變數區也會包含物件的引用, 包括類引用、介面引用以及陣列引用.
區域性變數區包含了方法引數和區域性變數, 此外, 例項方法隱含第一個區域性變數this, 它指向呼叫該方法的物件引用. 對於物件, 區域性變數區中永遠只有指向堆的引用. -
運算元棧
運算元棧也是以字長為單位的陣列, 但是正如其名, 它只能進行入棧出棧的基本操作. 在進行計算時, 運算元被彈出棧, 計算完畢後再入棧. -
幀資料區
幀資料區的任務主要有:- 記錄指向類的常量池的指標, 以便於解析.
- 幫助方法的正常返回, 包括恢復呼叫該方法的棧幀, 設定PC暫存器指向呼叫方法對應的下一條指令, 把返回值壓入呼叫棧幀的運算元棧中.
- 記錄異常表, 發生異常時將控制權交由對應異常的catch子句, 如果沒有找到對應的catch子句, 會恢復呼叫方法的棧幀並重新丟擲異常.
區域性變數區和運算元棧的大小依照具體方法在編譯時就已經確定. 呼叫方法時會從方法區中找到對應類的型別資訊, 從中得到具體方法的區域性變數區和運算元棧的大小, 依此分配棧幀記憶體, 壓入Java棧.
深拷貝和淺拷貝
在 Java 中除了基本資料型別primitive之外, 還存在類的例項物件這個引用資料型別, 而一般使用=
號做賦值操作的時候對於基本資料型別是拷貝的它的值, 但是對於物件而言, 其實賦值的只是這個物件的引用, 將原物件的引用傳遞過去, 他們實際上還是指向的同一個物件.
淺拷貝和深拷貝就是在這個基礎之上做的區分
如果在拷貝這個物件的時候, 只對基本資料型別進行了拷貝, 而對引用資料型別只是進行了引用的傳遞, 而沒有真實的建立一個新的物件, 則認為是淺拷貝. 反之, 在對引用資料型別進行拷貝的時候, 建立了一個新的物件, 並且複製其內的成員變數, 則認為是深拷貝.
如果一個物件內部只有基本資料型別, 那用 clone() 方法獲取到的就是這個物件的深拷貝, 而如果其內部還有引用資料型別, 那用 clone() 方法就是一次淺拷貝的操作
進行一個深拷貝比較常用的方案有兩種:
- 序列化這個物件再反序列化回來, 就可以得到這個新的物件, 無非就是序列化的規則需要我們自己來寫.
- 利用 clone() 方法, 重寫 clone() 方法, 可以對其內的引用型別的變數(以及再下面的變數), 都進行一次 clone(), 確保物件內部的物件也是深拷貝.