This Is Why They Call It a Weakly-Ordered CPU

阿新 • • 發佈：2019-01-28

OCT 19, 2012

http://preshing.com/20121019/this-is-why-they-call-it-a-weakly-ordered-cpu/

注：對於理解weak cpu下的reordering而言，這真是一篇相當好的文章。拿起你的xcode和4s，可以直接測試執行作者的例子。沒什麼比鮮活的例子更令人印象深刻。

還有就是，除了在iphone 3GS上測試外，這裡可以再次使用cpu affinity設定來驗證單核執行的情況。

---->正文開始

在前面，我們已經瞭解了lock-free程式設計的一些主題，比如acquire and release語義，以及weakly-ordered CPU。我試圖使這些主題講解的容易接受和容易理解。但是什麼都沒有一個實際的例子來的更直觀。
（注：acquire and release後面就翻譯到）

如果用一件事情來表徵weakly-ordered CPU，那就是一個CPU core看到的共享記憶體中幾個value的變化順序和另一個寫入它們的core不同。這就是本篇中我希望使用純粹的C++11來描述的。

對於正常應用，x86/64和AMD都不會有這種特性，所以PC上是不可能出現的。我們真正需要的是一個weakly-ordered裝置，幸運的是，我口袋裡就有一個：iPhone4S。
蘋果的iPhone4S執行在ARM雙核處理器上，而ARM體系結構就是weakly-ordered。

The Experiment

我們的實驗包括一個被mutex保護的integer變數sharedValue。我們生成兩個執行緒，每個執行緒都一直執行，直到它們將sharedValue增加了10,000,000次。

我們不會讓執行緒block在等待mutex上。相反，每個執行緒都會做busy loop（只是為了浪費CPU），並且試圖獲取mutex。如果成功上鎖，就增加sharedValue，再unlock。如果lock失敗，就繼續busy loop。虛擬碼像這樣：

count = 0
while count < 10000000:
    doRandomAmountOfBusyWork()
    if tryLockMutex():
        // The lock succeeded
        sharedValue++
        unlockMutex()
        count++
    endif
end while

每個執行緒執行在各自的CPU core上，那麼時間線看起來應該這樣。每一個紅色段表示成功的lock和增加，深藍色段表示lock嘗試失敗，因為另一個執行緒已經hold了mutex。

這很容易首先，因為mutex就是一個概念，有很多種方式實現一個。我們可以直接使用C++11提供的std::mutex，顯然，一切都會執行正常。那我就沒有什麼好說的了。去二呆子，我們將自己實現一個mutex——然後讓我們再將其分解展示weak hardware ordering的結果。直觀上，潛在的memory reordering最可能發生線上程之間存在“close shave”的那些時刻——比如，在上面的圖中，正當一個執行緒釋放鎖的時候另一個執行緒獲得了鎖。

最新的Xcode很好的支援C++11的thread和atomic型別，我們就用它了。C++11的所有識別符號都在std名稱空間中。

A Ridiculously Simple Mutex

我們的mutex只包含一個integer變數flag，1表示mutex已經被獲取，0表示沒有。為了保證mutex的互斥性，一個thread只能在flag為0的時候將它設定為1，並且這個操作是atomic的。為了做到這一點，我們將flag定義為C++11 atomic型別，atomic<int>，並且使用它的read-modify-write操作：

int expected = 0;
if (flag.compare_exchange_strong(expected, 1, memory_order_acquire)) {
    // The lock succeeded
}

引數memory_order_acquire是一個順序限制。我們在這個操作上施加了acquire語義，來保證我們可以接收到前一個獲得mutex的執行緒寫入的最新值。
這是釋放鎖：

flag.store(0, memory_order_release);

基於memory_order_release順序限制將flag設定為0，這就應用了release語義。Acquire and Release語義必須成對的使用，以保證共享變數的值可以從一個執行緒完整的傳播給另一個。

If We Don’t Use Acquire and Release Sematics…

現在，讓我們使用C++11實驗一把，但是不使用正確的順序限制，讓我們在兩個地方都是用memory_order_relaxed，這意味著C++11編譯器並不會強制memory ordering，任何reordering都是允許的。

void IncrementSharedValue10000000Times(RandomDelay& randomDelay) {
    int count = 0;
    while (count < 10000000) {
        randomDelay.doBusyWork();
        int expected = 0;
        if (flag.compare_exchange_strong(expected, 1, memory_order_relaxed)) {
            // Lock was successful
            sharedValue++;
            flag.store(0, memory_order_relaxed);
            count++;
        }
    }
}

在這個時點上，看看編譯器生成的ARM彙編程式碼會有一些發現，在Release，使用Xcode的Disassembly檢視：

如果你對組合語言不熟悉，不用擔心。我們所需要知道的就是compiler是否對共享變數的任何操作做了重新排序。這包括flag上的兩次操作，以及中間的sharedValue的遞增操作。我已經在上面的組合語言上做了標註。你可以看到，我們很幸運：compiler沒有重新排列這些操作的順序，即使memory_order_relaxed引數意味著它可以這麼做，憑心而論。

我已經寫了一個簡單程式重複上面的實現，在每次執行結束後列印sharedValue的最終結果。在Github上你可以看到程式碼：https://github.com/preshing/AcquireRelease
這是Xcode的執行輸出：

仔細看看，sharedValue的最終結果一貫的小於20,000,000，即使每個執行緒都精確的執行了10,000,000次遞增操作，並且組合語言中指令的順序和我們程式的操作順序也是一致的（也就是說compiler沒有給我們重排序）。

你可能已經猜到了，這個結果完全來自於CPU的memory reordering。指出可能的一種重排序——有好幾種——記憶體互動 str .w r0, [r11]（sharedValue的store）可以和str r5, [r6]（flag的store 0）重排序。換句話說，在我們結束之前，mutex可以被釋放掉！！！另一個執行緒就可以將我們所做的修改置換掉，導致了sharedValue的值與預期的不相符。就像實驗中看到的那樣。

Using Acquire and Release Semantics Correctly

要想修正我們的程式，很簡單就是使用C++11正確的memory ordering限制。

void IncrementSharedValue10000000Times(RandomDelay& randomDelay) {
    int count = 0;
    while (count < 10000000) {
        randomDelay.doBusyWork();
        int expected = 0;
        if (flag.compare_exchange_strong(expected, 1, memory_order_acquire)) {
            // Lock was successful
            sharedValue++;
            flag.store(0, memory_order_release);
            count++;
        }
    }
}

注意上面的兩個memory_order_xxx限制。
結果就是，我們可以看到編譯器插入了一堆dmb ish指令，在ARMv7指令集中起到memory barrier的作用。我不是ARM專家——歡迎評論——但是可以安全的假設這條命令就像PowerPC上的lwsync一樣，為在compare_exchange_srong上獲取acquire語義，以及store上獲取release語義，提供了所有的memory barrier型別。

這一次，我們自己的mutex確實保護了sharedValue，在每次lock mutex成功時，保證了所有的修改都正確的傳遞給了另外一個執行緒。

如果你還不是很直觀的理解這個實驗，我建議你看看我的程式碼控制那篇文章。使用那個類比的術語，你可以想象兩個電腦對sharedValue和flag都有自己的本地copy，你需要一個經理來保持它們是sync的。個人而言，我發現用這種視覺化的方式很有幫助。

我還是喜歡重申一遍——我們這裡看到的memory reordering只能在multicore或者multiprocessor裝置上觀察到。如果你將同樣的程式碼在iPhone 3GS或者第一代iPad上執行，你不會看到sharedValue有錯誤值的情況，它們也是同樣的ARMv7體系，但是隻有一個CPU core。

Interesting Notes

同樣的程式，你可以在使用x86/64CPU的Windows，MacOS或者Linux平臺上測試，除非你的compiler在這些指令上做了reordering，否這你是看不到執行時的memory reordering的——即使是multicore系統上。因為x86/64 processor是strongly-ordered：當一個CPU core執行一系列writes時，其它的任何CPU看到的這些值改變的順序，和它們write時的順序完全一致。

這也可說明為什麼錯誤使用了C++11的atomic時，程式依然是正確的，而你並不知道這種錯誤。

在本例下，VS2012的釋出版本生成的x86程式碼真是很糟糕。一點也不像Xcode生成的ARM程式碼那麼高效。畢竟在多核上使用lock-free程式設計的首要原因就是效能！[2013 Feb更新：就像後面的評論，VS2012 Professional的最新版生成的機器程式碼好多了]

這一篇是前面證明x86/64平臺上的StoreLoad reordering的姊妹篇（也就是前面的caught in the act那篇）。然而，根據我的經驗，#StoreLoad barrier的使用並不像其它ordering限制那麼頻繁。

最後，我不是第一個例證在實際中weak hardware ordering的人，有可能我是第一個使用C++11的那個。Pierre Lebeaupin和ridiculousfish以前也寫過文章使用不同的例子描述了這種現象。
http://wanderingcoder.net/2011/04/01/arm-memory-ordering/
http://ridiculousfish.com/blog/posts/barrier.html

This Is Why They Call It a Weakly-Ordered CPU

OCT 19, 2012http://preshing.com/20121019/this-is-why-they-call-it-a-weakly-ordered-cpu/ 注：對於理解weak cpu下的reordering而言，這真是一篇相當好的文章。拿起你的xco

quartzScheduler_Worker-1] but has failed to stop it. This is very likely to create a memory leak解決

出現此問題是由一於spring 啟動了quartz，而當tomcat 關閉的時候而沒有關閉造成的。在web 加一個監聽器當關系的時候判斷作業是否啟動，啟動就關閉。  <listener> <listener

This is very likely to create a memory leak. Stack trace of thread錯誤分析

1、問題描述啟動tomcat部署專案時，報This is very likely to create a memory leak. Stack trace of thread錯誤。 29-May-2018 12:30:09.322 SEVERE [localhos

This Is Why The 2018 Nobel Prize In Physics, For Lasers, Is So Important

Every year, the most prestigious prize in the most fundamental of the natural sciences is given out: the Nobel Prize in Physics. Some recent prizes have li

This Is Why You Can’t Stop Worrying

How to Cure WorryBy this point I hope that I’ve convincingly showed that we worry (and continue to worry) because it very briefly makes us feel good by dis

tomcat報錯：This is very likely to create a memory leak問題解決

這種問題在開發中經常會碰到的，看看前輩的總結經驗 Tomcat記憶體溢位的原因　　在生產環境中tomcat記憶體設定不好很容易出現記憶體溢位。造成記憶體溢位是不一樣的，當然處理方式也不一樣。　　這裡根據平時遇到的情況和相關資料進行一個總結。常見的一般會

This is very likely to create a memory leak 異常

INFO [cn.com.ksplatform.core.expand.spring.SpringContext] - Closing Root WebApplicationContext: startup date [Mon Dec 12 18:31:08 CST 201

This Is Why We Coding

(1) BeanFactory (2) BeanDefinition 1、 XmlBeanFactory(屌絲IOC)的整個流程 2、 FileSystemXmlApplicationContext 的IOC容器流程 1、高富帥

This is a bug I believe, and it took me 2-3 days to figure it out. Please do the following to get it working,

this nco etc figure ood client clas gpo see This is a bug I believe, and it took me 2-3 days to figure it out. Please do the following to

It is illegal to call this method if the current request is not in asynchronous mode

nested exception is java.lang.IllegalStateException: It is illegal to call this method if the current request is not in asynchronous mode (i.e. is

Mike Trout is going to be offered a lifetime contract this offseason…but should he take it?

In 1984, Magic Johnson for the Los Angeles Lakers signed one of the most lucrative deals in history: 25 years, $25 million. Essentially, the Lakers offered

Kaspars Grosu on LinkedIn: "This is happening now it's not a dream not even Science fiction #innovation #tech #ai #tesla "

This is happening now it's not a dream not even Science fiction #innovation #tech #ai #tesla Friday 7 September 2018 Real life incident.. What happens whe

java.lang.IllegalStateException: It is illegal to call this method if the current request is not in

使用fastjson報的錯誤原因是序列化了 ServletRequest ServletResponse MultipartFile 這些類的物件而這些類的物件不能序列化參考關於一次AOP攔截入參記錄日誌報錯的梳理總結 It is illegal to call thi

What is a Thesaurus and Why is it a Whole Other Thing from a Dictionary?

What is a thesaurus? To understand it better, let’s look at this simple example. Consider the word “house,” which is defin

暫時解決java.lang.IllegalStateException: It is invalid to call isReady() when the response has not been put into non-blocking mode

服務器 lose img () arguments it is 分享 exc 解決環境：本機-apache-tomcat-9.0.0.M21 服務器-apache-tomcat-9.0.0.M9 錯誤復盤：最初不知道是tomcat版本解決的，實現了

This Is Why They Call It a Weakly-Ordered CPU

The Experiment

A Ridiculously Simple Mutex

If We Don’t Use Acquire and Release Sematics…

Using Acquire and Release Semantics Correctly

Interesting Notes

This Is Why They Call It a Weakly-Ordered CPU

quartzScheduler_Worker-1] but has failed to stop it. This is very likely to create a memory leak解決

This is very likely to create a memory leak. Stack trace of thread錯誤分析

This Is Why The 2018 Nobel Prize In Physics, For Lasers, Is So Important

This Is Why You Can’t Stop Worrying

tomcat報錯：This is very likely to create a memory leak問題解決

This is very likely to create a memory leak 異常

This Is Why We Coding

This is a bug I believe, and it took me 2-3 days to figure it out. Please do the following to get it working,

It is illegal to call this method if the current request is not in asynchronous mode

Mike Trout is going to be offered a lifetime contract this offseason…but should he take it?

Kaspars Grosu on LinkedIn: "This is happening now it's not a dream not even Science fiction #innovation #tech #ai #tesla "

java.lang.IllegalStateException: It is illegal to call this method if the current request is not in

What is a Thesaurus and Why is it a Whole Other Thing from a Dictionary?

暫時解決java.lang.IllegalStateException: It is invalid to call isReady() when the response has not been put into non-blocking mode

PAT-A1135. Is It A Red-Black Tree (30)

1043. Is It a Binary Search Tree (25)

1135. Is It A Red-Black Tree (30)

【PAT1135】Is It A Red-Black Tree（30）

elasticsearch this is not a http port

This Is Why They Call It a Weakly-Ordered CPU

The Experiment

A Ridiculously Simple Mutex

If We Don’t Use Acquire and Release Sematics…

Using Acquire and Release Semantics Correctly

Interesting Notes

相關推薦