JUC中Atomic class之lazySet的一點疑惑
最近再次翻netty和disrupt的原始碼, 發現一些地方使用AtomicXXX.lazySet()/unsafe.putOrderedXXX系列, 以前一直沒有注意lazySet這個方法, 仔細研究一下發現很有意思。我們拿AtomicReferenceFieldUpdater的set()和lazySet()作比較, 其他AtomicXXX類和這個類似。
public void set(T obj, V newValue) { // ... unsafe.putObjectVolatile(obj, offset, newValue); } public void lazySet(T obj, V newValue) { // ... unsafe.putOrderedObject(obj, offset, newValue); }
1.首先set()是對volatile變數的一個寫操作, 我們知道volatile的write為了保證對其他執行緒的可見性會追加以下兩個Fence(記憶體屏障)
1)StoreStore // 在intel cpu中, 不存在[寫寫]重排序, 這個可以直接省略了
2)StoreLoad // 這個是所有記憶體屏障裡最耗效能的
注: 記憶體屏障相關參考Doug Lea大大的cookbook (http://g.oswego.edu/dl/jmm/cookbook.html)
2.Doug Lea大大又說了, lazySet()省去了StoreLoad屏障, 只留下StoreStore
在這裡 http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6275329
把最耗效能的StoreLoad拿掉, 效能必然會提高不少(雖然不能禁止寫讀的重排序了保證不了可見性, 但給其他應用場景提供了更好的選擇, 比如上邊連線中Doug Lea舉例的場景)。
但是但是, 在好奇心驅使下我翻了下JDK的原始碼(unsafe.cpp):
// 這是unsafe.putObjectVolatile() UNSAFE_ENTRY(void, Unsafe_SetObjectVolatile(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jobject x_h)) UnsafeWrapper("Unsafe_SetObjectVolatile"); oop x = JNIHandles::resolve(x_h); oop p = JNIHandles::resolve(obj); void* addr = index_oop_from_field_offset_long(p, offset); OrderAccess::release(); if (UseCompressedOops) { oop_store((narrowOop*)addr, x); } else { oop_store((oop*)addr, x); } OrderAccess::fence(); UNSAFE_END // 這是unsafe.putOrderedObject() UNSAFE_ENTRY(void, Unsafe_SetOrderedObject(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jobject x_h)) UnsafeWrapper("Unsafe_SetOrderedObject"); oop x = JNIHandles::resolve(x_h); oop p = JNIHandles::resolve(obj); void* addr = index_oop_from_field_offset_long(p, offset); OrderAccess::release(); if (UseCompressedOops) { oop_store((narrowOop*)addr, x); } else { oop_store((oop*)addr, x); } OrderAccess::fence(); UNSAFE_END
仔細看程式碼是不是有種被騙的感覺, 他喵的一毛一樣啊. 難道是JIT做了手腳?生成彙編看看
生成assembly code需要hsdis外掛
為了測試程式碼簡單, 使用AtomicLong來測:
// set() public class LazySetTest { private static final AtomicLong a = new AtomicLong(); public static void main(String[] args) { for (int i = 0; i < 100000000; i++) { a.set(i); } } } // lazySet() public class LazySetTest { private static final AtomicLong a = new AtomicLong(); public static void main(String[] args) { for (int i = 0; i < 100000000; i++) { a.lazySet(i); } } }
分別執行以下命令:
1.export LD_LIBRARY_PATH=~/hsdis外掛路徑/ 2.javac LazySetTest.java && java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly LazySetTest // ------------------------------------------------------ // set()的assembly code片段: 0x000000010ccbfeb3: mov %r10,0x10(%r9) 0x000000010ccbfeb7: lock addl $0x0,(%rsp) ;*putfield value ; - java.util.concurrent.atomic.AtomicLong::[email protected] (line 112) ; - LazySetTest::[email protected] (line 13) 0x000000010ccbfebc: inc %ebp ;*iinc ; - LazySetTest::[email protected] (line 12) // ------------------------------------------------------ // lazySet()的assembly code片段: 0x0000000108766faf: mov %r10,0x10(%rcx) ;*invokevirtual putOrderedLong ; - java.util.concurrent.atomic.AtomicLong::[email protected] (line 122) ; - LazySetTest::[email protected] (line 13) 0x0000000108766fb3: inc %ebp ;*iinc ; - LazySetTest::[email protected] (line 12)
好吧, set()生成的assembly code多了一個lock字首的指令
查詢IA32手冊可知道, lock addl $0x0,(%rsp)其實就是StoreLoad屏障了, 而lazySet()確實沒生成StoreLoad屏障
這裡JIT除了將方法內聯, 相同程式碼生成不同指令是怎麼做到的?
檢視如上程式碼, 812行和868行分別有如下程式碼:
do_intrinsic(_putObjectVolatile, sun_misc_Unsafe, putObjectVolatile_name, putObject_signature, F_RN) do_intrinsic(_putOrderedObject, sun_misc_Unsafe, putOrderedObject_name, putOrderedObject_signature, F_RN)
putObjectVolatile與putOrderedObject都在vmSymbols.hpp的巨集定義中,jvm會根據instrinsics id生成特定的指令集 putObjectVolatile與putOrderedObject生成的彙編指令不同估計是源於這裡了, 繼續往下看 hotspot/src/share/vm/opto/libaray_call.cpp這個類:
首先看如下兩行程式碼:
case vmIntrinsics::_putObjectVolatile: return inline_unsafe_access(!is_native_ptr, is_store, T_OBJECT, is_volatile); case vmIntrinsics::_putOrderedObject: return inline_unsafe_ordered_store(T_OBJECT);
再看inline_unsafe_access()和inline_unsafe_ordered_store(), 不貼出全部程式碼了, 只貼出重要的部分:
bool LibraryCallKit::inline_unsafe_ordered_store(BasicType type) { // This is another variant of inline_unsafe_access, differing in // that it always issues store-store ("release") barrier and ensures // store-atomicity (which only matters for "long"). // ... if (type == T_OBJECT) // reference stores need a store barrier. store = store_oop_to_unknown(control(), base, adr, adr_type, val, type); else { store = store_to_memory(control(), adr, val, type, adr_type, require_atomic_access); } insert_mem_bar(Op_MemBarCPUOrder); return true; } --------------------------------------------------------------------------------------------------------- bool LibraryCallKit::inline_unsafe_access(bool is_native_ptr, bool is_store, BasicType type, bool is_volatile) { // .... if (is_volatile) { if (!is_store) insert_mem_bar(Op_MemBarAcquire); else insert_mem_bar(Op_MemBarVolatile); } if (need_mem_bar) insert_mem_bar(Op_MemBarCPUOrder); return true; }
我們可以看到 inline_unsafe_access()方法中, 如果是is_volatile為true, 並且是store操作的話, 有這樣的一句程式碼 insert_mem_bar(Op_MemBarVolatile), 而inline_unsafe_ordered_store沒有插入這句程式碼
再繼續看/hotspot/src/cpu/x86/vm/x86_64.ad的membar_volatile
instruct membar_volatile(rFlagsReg cr) %{ match(MemBarVolatile); effect(KILL cr); ins_cost(400); format %{ $$template if (os::is_MP()) { $$emit$$"lock addl [rsp + #0], 0\t! membar_volatile" } else { $$emit$$"MEMBAR-volatile ! (empty encoding)" } %} ins_encode %{ __ membar(Assembler::StoreLoad); %} ins_pipe(pipe_slow); %}
lock addl [rsp + #0], 0\t! membar_volatile指令原來來自這裡
總結:
錯過一些細節, 但在主流程上感覺是有一點點明白了, 有錯誤之處請指正
參考了以下資料:
1.http://g.oswego.edu/dl/jmm/cookbook.html
2.https://wikis.oracle.com/display/HotSpotInternals/PrintAssembly
3.http://www.quora.com/How-does-AtomicLong-lazySet-work
4.http://bad-concurrency.blogspot.ru/2012/10/talk-from-jax-london.html