JVM初探- 使用堆外記憶體減少Full GC
JVM初探-使用堆外記憶體減少Full GC
標籤 : JVM
問題: 大部分主流網際網路企業線上Server JVM選用了CMS收集器(如Taobao、LinkedIn、Vdian), 雖然CMS可與使用者執行緒併發GC以降低STW時間, 但它也並非十分完美, 尤其是當出現
Concurrent Mode Failure
由並行GC轉入序列時, 將導致非常長時間的Stop The World(詳細可參考JVM初探- 記憶體分配、GC原理與垃圾收集器).解決: 由GCIH可以聯想到: 將長期存活的物件(如Local Cache)移入堆外記憶體(off-heap, 又名直接記憶體/direct-memory
), 從而減少CMS管理的物件數量, 以降低Full GC的次數和頻率, 達到提高系統響應速度的目的.
引入
這個idea最初來源於TaobaoJVM對OpenJDK定製開發的GCIH部分(詳見撒迦的分享-JVM定製改進@淘寶), 其中GCIH就是將CMS Old Heap區的一部分劃分出來, 這部分記憶體雖然還在堆內, 但已不被GC所管理.將長生命週期Java物件放在Java堆外, GC不能管理GCIH內Java物件(GC Invisible Heap):
(圖片來源: [email protected] PPT)
- 這樣做有兩方面的好處:
- 減少GC管理記憶體:
由於GCIH會從Old區“切出” - GCIH內容程序間共享:
由於這部分割槽域不再是JVM執行時資料的一部分, 因此GCIH內的物件可供對個JVM例項所共享(如一臺Server跑多個MR-Job可共享同一份Cache資料), 這樣一臺Server也就可以跑更多的VM例項.
- 減少GC管理記憶體:
(實際測試資料/圖示可下載撒迦分享PPT).
但是大部分的互聯公司不能像阿里這樣可以有專門的工程師針對自己的業務特點定製JVM, 因此我們只能”眼饞”GCIH帶來的效能提升卻無法”享用”. 但通用的JVM開放了介面可直接向作業系統申請堆外記憶體(ByteBuffer
Unsafe
), 而這部分記憶體也是GC所顧及不到的, 因此我們可用JVM堆外記憶體來模擬GCIH的功能(但相比GCIH不足的是需要付出serialize/deserialize的開銷).
JVM堆外記憶體
在JVM初探 -JVM記憶體模型一文中介紹的Java執行時資料區域中是找不到堆外記憶體區域的:
因為它並不是JVM執行時資料區的一部分, 也不是Java虛擬機器規範中定義的記憶體區域, 這部分記憶體區域直接被作業系統管理.
在JDK 1.4以前, 對這部分記憶體訪問沒有光明正大的做法: 只能通過反射拿到Unsafe
類, 然後呼叫allocateMemory()/freeMemory()
來申請/釋放這塊記憶體. 1.4開始新加入了NIO, 它引入了一種基於Channel與Buffer的I/O方式, 可以使用Native函式庫直接分配堆外記憶體, 然後通過一個儲存在Java堆裡面的DirectByteBuffer
物件作為這塊記憶體的引用進行操作, ByteBuffer
提供瞭如下常用方法來跟堆外記憶體打交道:
API | 描述 |
---|---|
static ByteBuffer allocateDirect(int capacity) |
Allocates a new direct byte buffer. |
ByteBuffer put(byte b) |
Relative put method (optional operation). |
ByteBuffer put(byte[] src) |
Relative bulk put method (optional operation). |
ByteBuffer putXxx(Xxx value) |
Relative put method for writing a Char/Double/Float/Int/Long/Short value (optional operation). |
ByteBuffer get(byte[] dst) |
Relative bulk get method. |
Xxx getXxx() |
Relative get method for reading a Char/Double/Float/Int/Long/Short value. |
XxxBuffer asXxxBuffer() |
Creates a view of this byte buffer as a Char/Double/Float/Int/Long/Short buffer. |
ByteBuffer asReadOnlyBuffer() |
Creates a new, read-only byte buffer that shares this buffer’s content. |
boolean isDirect() |
Tells whether or not this byte buffer is direct. |
ByteBuffer duplicate() |
Creates a new byte buffer that shares this buffer’s content. |
下面我們就用通用的JDK API來使用堆外記憶體來實現一個local cache.
示例1.: 使用JDK API實現堆外Cache
注: 主要邏輯都集中在方法
invoke()
內, 而AbstractAppInvoker
是一個自定義的效能測試框架, 在後面會有詳細的介紹.
/**
* @author jifang
* @since 2016/12/31 下午6:05.
*/
public class DirectByteBufferApp extends AbstractAppInvoker {
@Test
@Override
public void invoke(Object... param) {
Map<String, FeedDO> map = createInHeapMap(SIZE);
// move in off-heap
byte[] bytes = serializer.serialize(map);
ByteBuffer buffer = ByteBuffer.allocateDirect(bytes.length);
buffer.put(bytes);
buffer.flip();
// for gc
map = null;
bytes = null;
System.out.println("write down");
// move out from off-heap
byte[] offHeapBytes = new byte[buffer.limit()];
buffer.get(offHeapBytes);
Map<String, FeedDO> deserMap = serializer.deserialize(offHeapBytes);
for (int i = 0; i < SIZE; ++i) {
String key = "key-" + i;
FeedDO feedDO = deserMap.get(key);
checkValid(feedDO);
if (i % 10000 == 0) {
System.out.println("read " + i);
}
}
free(buffer);
}
private Map<String, FeedDO> createInHeapMap(int size) {
long createTime = System.currentTimeMillis();
Map<String, FeedDO> map = new ConcurrentHashMap<>(size);
for (int i = 0; i < size; ++i) {
String key = "key-" + i;
FeedDO value = createFeed(i, key, createTime);
map.put(key, value);
}
return map;
}
}
由JDK提供的堆外記憶體訪問API只能申請到一個類似一維陣列的ByteBuffer
, JDK並未提供基於堆外記憶體的實用資料結構實現(如堆外的Map
、Set
), 因此想要實現Cache的功能只能在write()
時先將資料put()
到一個堆內的HashMap
, 然後再將整個Map
序列化後MoveIn
到DirectMemory, 取快取則反之. 由於需要在堆內申請HashMap
, 因此可能會導致多次Full GC. 這種方式雖然可以使用堆外記憶體, 但效能不高、無法發揮堆外記憶體的優勢.
幸運的是開源界的前輩開發了諸如Ehcache、MapDB、Chronicle Map等一系列優秀的堆外記憶體框架, 使我們可以在使用簡潔API訪問堆外記憶體的同時又不損耗額外的效能.
其中又以Ehcache最為強大, 其提供了in-heap、off-heap、on-disk、cluster四級快取, 且Ehcache企業級產品(BigMemory Max / BigMemory Go)實現的BigMemory也是Java堆外記憶體領域的先驅.
示例2: MapDB API實現堆外Cache
public class MapDBApp extends AbstractAppInvoker {
private static HTreeMap<String, FeedDO> mapDBCache;
static {
mapDBCache = DBMaker.hashMapSegmentedMemoryDirect()
.expireMaxSize(SIZE)
.make();
}
@Test
@Override
public void invoke(Object... param) {
for (int i = 0; i < SIZE; ++i) {
String key = "key-" + i;
FeedDO feed = createFeed(i, key, System.currentTimeMillis());
mapDBCache.put(key, feed);
}
System.out.println("write down");
for (int i = 0; i < SIZE; ++i) {
String key = "key-" + i;
FeedDO feedDO = mapDBCache.get(key);
checkValid(feedDO);
if (i % 10000 == 0) {
System.out.println("read " + i);
}
}
}
}
結果 & 分析
- DirectByteBufferApp
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 0.00 5.22 78.57 59.85 19 2.902 13 7.251 10.153
- the last one jstat of MapDBApp
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 0.03 8.02 0.38 44.46 171 0.238 0 0.000 0.238
執行
DirectByteBufferApp.invoke()
會發現有看到很多Full GC的產生, 這是因為HashMap需要一個很大的連續陣列, Old區很快就會被佔滿, 因此也就導致頻繁Full GC的產生.
而執行MapDBApp.invoke()
可以看到有一個DirectMemory
持續增長的過程, 但FullGC卻一次都沒有了.
實驗: 使用堆外記憶體減少Full GC
實驗環境
- java -version
java version "1.7.0_79"
Java(TM) SE Runtime Environment (build 1.7.0_79-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.79-b02, mixed mode)
- VM Options
-Xmx512M
-XX:MaxDirectMemorySize=512M
-XX:+PrintGC
-XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled
-XX:CMSInitiatingOccupancyFraction=80
-XX:+UseCMSInitiatingOccupancyOnly
- 實驗資料
170W條動態(FeedDO).
實驗程式碼
第1組: in-heap、affect by GC、no serialize
- ConcurrentHashMapApp
public class ConcurrentHashMapApp extends AbstractAppInvoker {
private static final Map<String, FeedDO> cache = new ConcurrentHashMap<>();
@Test
@Override
public void invoke(Object... param) {
// write
for (int i = 0; i < SIZE; ++i) {
String key = String.format("key_%s", i);
FeedDO feedDO = createFeed(i, key, System.currentTimeMillis());
cache.put(key, feedDO);
}
System.out.println("write down");
// read
for (int i = 0; i < SIZE; ++i) {
String key = String.format("key_%s", i);
FeedDO feedDO = cache.get(key);
checkValid(feedDO);
if (i % 10000 == 0) {
System.out.println("read " + i);
}
}
}
}
GuavaCacheApp類似, 詳細程式碼可參考完整專案.
第2組: off-heap、not affect by GC、need serialize
- EhcacheApp
public class EhcacheApp extends AbstractAppInvoker {
private static Cache<String, FeedDO> cache;
static {
ResourcePools resourcePools = ResourcePoolsBuilder.newResourcePoolsBuilder()
.heap(1000, EntryUnit.ENTRIES)
.offheap(480, MemoryUnit.MB)
.build();
CacheConfiguration<String, FeedDO> configuration = CacheConfigurationBuilder
.newCacheConfigurationBuilder(String.class, FeedDO.class, resourcePools)
.build();
cache = CacheManagerBuilder.newCacheManagerBuilder()
.withCache("cacher", configuration)
.build(true)
.getCache("cacher", String.class, FeedDO.class);
}
@Test
@Override
public void invoke(Object... param) {
for (int i = 0; i < SIZE; ++i) {
String key = String.format("key_%s", i);
FeedDO feedDO = createFeed(i, key, System.currentTimeMillis());
cache.put(key, feedDO);
}
System.out.println("write down");
// read
for (int i = 0; i < SIZE; ++i) {
String key = String.format("key_%s", i);
Object o = cache.get(key);
checkValid(o);
if (i % 10000 == 0) {
System.out.println("read " + i);
}
}
}
}
MapDBApp與前同.
第3組: off-process、not affect by GC、serialize、affect by process communication
- LocalRedisApp
public class LocalRedisApp extends AbstractAppInvoker {
private static final Jedis cache = new Jedis("localhost", 6379);
private static final IObjectSerializer serializer = new Hessian2Serializer();
@Test
@Override
public void invoke(Object... param) {
// write
for (int i = 0; i < SIZE; ++i) {
String key = String.format("key_%s", i);
FeedDO feedDO = createFeed(i, key, System.currentTimeMillis());
byte[] value = serializer.serialize(feedDO);
cache.set(key.getBytes(), value);
if (i % 10000 == 0) {
System.out.println("write " + i);
}
}
System.out.println("write down");
// read
for (int i = 0; i < SIZE; ++i) {
String key = String.format("key_%s", i);
byte[] value = cache.get(key.getBytes());
FeedDO feedDO = serializer.deserialize(value);
checkValid(feedDO);
if (i % 10000 == 0) {
System.out.println("read " + i);
}
}
}
}
RemoteRedisApp類似, 詳細程式碼可參考下面完整專案.
實驗結果
* | ConcurrentMap | Guava |
---|---|---|
TTC | 32166ms/32s | 47520ms/47s |
Minor C/T | 31/1.522 | 29/1.312 |
Full C/T | 24/23.212 | 36/41.751 |
MapDB | Ehcache | |
TTC | 40272ms/40s | 30814ms/31s |
Minor C/T | 511/0.557 | 297/0.430 |
Full C/T | 0/0.000 | 0/0.000 |
LocalRedis | NetworkRedis | |
TTC | 176382ms/176s | 1h+ |
Minor C/T | 421/0.415 | - |
Full C/T | 0/0.000 | - |
備註:
- TTC: Total Time Cost 總共耗時
- C/T: Count/Time 次數/耗時(seconds)
結果分析
對比前面幾組資料, 可以有如下總結:
- 將長生命週期的大物件(如cache)移出heap可大幅度降低Full GC次數與耗時;
- 使用off-heap儲存物件需要付出serialize/deserialize成本;
- 將cache放入分散式快取需要付出程序間通訊/網路通訊的成本(UNIX Domain/TCP IP)
附:
off-heap的Ehcache能夠跑出比in-heap的HashMap/Guava更好的成績確實是我始料未及的O(∩_∩)O~, 但確實這些資料和堆記憶體的搭配導致in-heap的Full GC太多了, 當heap堆開大之後就肯定不是這個結果了. 因此在使用堆外記憶體降低Full GC前, 可以先考慮是否可以將heap開的更大.
附: 效能測試框架
在main函式啟動時, 掃描
com.vdian.se.apps
包下的所有繼承了AbstractAppInvoker
的類, 然後使用Javassist為每個類生成一個代理物件: 當invoke()
方法執行時首先檢查他是否標註了@Test
註解(在此, 我們借用junit定義好了的註解), 並在執行的前後記錄方法執行耗時, 並最終對比每個實現類耗時統計.
- 依賴
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-proxy</artifactId>
<version>${commons.proxy.version}</version>
</dependency>
<dependency>
<groupId>org.javassist</groupId>
<artifactId>javassist</artifactId>
<version>${javassist.version}</version>
</dependency>
<dependency>
<groupId>com.caucho</groupId>
<artifactId>hessian</artifactId>
<version>${hessian.version}</version>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
</dependency>
啟動類: OffHeapStarter
/**
* @author jifang
* @since 2017/1/1 上午10:47.
*/
public class OffHeapStarter {
private static final Map<String, Long> STATISTICS_MAP = new HashMap<>();
public static void main(String[] args) throws IOException, IllegalAccessException, InstantiationException {
Set<Class<?>> classes = PackageScanUtil.scanPackage("com.vdian.se.apps");
for (Class<?> clazz : classes) {
AbstractAppInvoker invoker = createProxyInvoker(clazz.newInstance());
invoker.invoke();
//System.gc();
}
System.out.println("********************* statistics **********************");
for (Map.Entry<String, Long> entry : STATISTICS_MAP.entrySet()) {
System.out.println("method [" + entry.getKey() + "] total cost [" + entry.getValue() + "]ms");
}
}
private static AbstractAppInvoker createProxyInvoker(Object invoker) {
ProxyFactory factory = new JavassistProxyFactory();
Class<?> superclass = invoker.getClass().getSuperclass();
Object proxy = factory
.createInterceptorProxy(invoker, new ProfileInterceptor(), new Class[]{superclass});
return (AbstractAppInvoker) proxy;
}
private static class ProfileInterceptor implements Interceptor {
@Override
public Object intercept(Invocation invocation) throws Throwable {
Class<?> clazz = invocation.getProxy().getClass();
Method method = clazz.getMethod(invocation.getMethod().getName(), Object[].class);
Object result = null;
if (method.isAnnotationPresent(Test.class)
&& method.getName().equals("invoke")) {
String methodName = String.format("%s.%s", clazz.getSimpleName(), method.getName());
System.out.println("method [" + methodName + "] start invoke");
long start = System.currentTimeMillis();
result = invocation.proceed();
long cost = System.currentTimeMillis() - start;
System.out.println("method [" + methodName + "] total cost [" + cost + "]ms");
STATISTICS_MAP.put(methodName, cost);
}
return result;
}
}
}
- 包掃描工具: PackageScanUtil
public class PackageScanUtil {
private static final String CLASS_SUFFIX = ".class";
private static final String FILE_PROTOCOL = "file";
public static Set<Class<?>> scanPackage(String packageName) throws IOException {
Set<Class<?>> classes = new HashSet<>();
String packageDir = packageName.replace('.', '/');
Enumeration<URL> packageResources = Thread.currentThread().getContextClassLoader().getResources(packageDir);
while (packageResources.hasMoreElements()) {
URL packageResource = packageResources.nextElement();
String protocol = packageResource.getProtocol();
// 只掃描專案內class
if (FILE_PROTOCOL.equals(protocol)) {
String packageDirPath = URLDecoder.decode(packageResource.getPath(), "UTF-8");
scanProjectPackage(packageName, packageDirPath, classes);
}
}
return classes;
}
private static void scanProjectPackage(String packageName, String packageDirPath, Set<Class<?>> classes) {
File packageDirFile = new File(packageDirPath);
if (packageDirFile.exists() && packageDirFile.isDirectory()) {
File[] subFiles = packageDirFile.listFiles(new FileFilter() {
@Override
public boolean accept(File pathname) {
return pathname.isDirectory() || pathname.getName().endsWith(CLASS_SUFFIX);
}
});
for (File subFile : subFiles) {
if (!subFile.isDirectory()) {
String className = trimClassSuffix(subFile.getName());
String classNameWithPackage = packageName + "." + className;
Class<?> clazz = null;
try {
clazz = Class.forName(classNameWithPackage);
} catch (ClassNotFoundException e) {
// ignore
}
assert clazz != null;
Class<?> superclass = clazz.getSuperclass();
if (superclass == AbstractAppInvoker.class) {
classes.add(clazz);
}
}
}
}
}
// trim .class suffix
private static String trimClassSuffix(String classNameWithSuffix) {
int endIndex = classNameWithSuffix.length() - CLASS_SUFFIX.length();
return classNameWithSuffix.substring(0, endIndex);
}
}
注: 在此僅掃描專案目錄下的單層目錄的class檔案, 功能更強大的包掃描工具可參考Spring原始碼或Touch原始碼中的
PackageScanUtil
類.
AppInvoker基類: AbstractAppInvoker
提供通用測試引數 & 工具函式.
public abstract class AbstractAppInvoker {
protected static final int SIZE = 170_0000;
protected static final IObjectSerializer serializer = new Hessian2Serializer();
protected static FeedDO createFeed(long id, String userId, long createTime) {
return new FeedDO(id, userId, (int) id, userId + "_" + id, createTime);
}
protected static void free(ByteBuffer byteBuffer) {
if (byteBuffer.isDirect()) {
((DirectBuffer) byteBuffer).cleaner().clean();
}
}
protected static void checkValid(Object obj) {
if (obj == null) {
throw new RuntimeException("cache invalid");
}
}
protected static void sleep(int time, String beforeMsg) {
if (!Strings.isNullOrEmpty(beforeMsg)) {
System.out.println(beforeMsg);
}
try {
Thread.sleep(time);
} catch (InterruptedException ignored) {
// no op
}
}
/**
* 供子類繼承 & 外界呼叫
*
* @param param
*/
public abstract void invoke(Object... param);
}
序列化/反序列化介面與實現
public interface IObjectSerializer {
<T> byte[] serialize(T obj);
<T> T deserialize(byte[] bytes);
}
public class Hessian2Serializer implements IObjectSerializer {
private static final Logger LOGGER = LoggerFactory.getLogger(Hessian2Serializer.class);
@Override
public <T> byte[] serialize(T obj) {
if (obj != null) {
try (ByteArrayOutputStream os = new ByteArrayOutputStream()) {
Hessian2Output out = new Hessian2Output(os);
out.writeObject(obj);
out.close();
return os.toByteArray();
} catch (IOException e) {
LOGGER.error("Hessian serialize error ", e);
throw new CacherException(e);
}
}
return null;
}
@SuppressWarnings("unchecked")
@Override
public <T> T deserialize(byte[] bytes) {
if (bytes != null) {
try (ByteArrayInputStream is = new ByteArrayInputStream(bytes)) {
Hessian2Input in = new Hessian2Input(is);
T obj = (T) in.readObject();
in.close();
return obj;
} catch (IOException e) {
LOGGER.error("Hessian deserialize error ", e);
throw new CacherException(e);
}
}
return null;
}
}
GC統計工具
#!/bin/bash
pid=`jps | grep $1 | awk '{print $1}'`
jstat -gcutil ${pid} 400 10000
- 使用
sh jstat-uti.sh ${u-main-class}
附加: 為什麼在實驗中in-heap cache的Minor GC那麼少?
現在我還不能給出一個確切地分析答案, 有的同學說是因為CMS Full GC會連帶一次Minor GC, 而用jstat會直接計入Full GC, 但檢視詳細的GC日誌也並未發現什麼端倪. 希望有了解的同學可以在下面評論區可以給我留言, 再次先感謝了O(∩_∩)O~.