visual studio C++ 使用OpenMP 進行平行計算

阿新 • • 發佈：2020-09-19

https://blog.csdn.net/dengm155/article/details/78836447?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param

那麼用openMP怎麼實現並行陣列求和呢？下面我們先給出一個基本的解決方案。該方案的思想是，首先生成一個數組sumArray，其長度為並行執行的執行緒的個數(預設情況下，該個數等於CPU的核數)，在for迴圈裡，讓各個執行緒更新自己執行緒對應的sumArray裡的元素，最後再將sumArray裡的元素累加到sum裡，程式碼如下
 
  
1 #include <iostream>
 2 #include <omp.h>
 3 int main(){
 4     int sum = 0;
 5     int a[10] = {1,2,3,4,5,6,7,8,9,10};
 6     int coreNum = omp_get_num_procs();//獲得處理器個數
 7     int* sumArray = new int[coreNum];//對應處理器個數，先生成一個數組
 8     for (int i=0;i<coreNum;i++)//將陣列各元素初始化為0
 9         sumArray[i] = 0 
;
10 #pragma omp parallel for
11     for (int i=0;i<10;i++)
12     {
13         int k = omp_get_thread_num();//獲得每個執行緒的ID
14         sumArray[k] = sumArray[k]+a[i];
15     }
16     for (int i = 0;i<coreNum;i++)
17         sum = sum + sumArray[i];
18     std::cout<<"sum: "<<sum<<std::endl;
 
19     return 0;
20 }
 
需要注意的是，在上面程式碼裡，我們用omp_get_num_procs()函式來獲取處理器個數，用omp_get_thread_num()函式來獲得每個執行緒的ID，為了使用這兩個函式，我們需要include <omp.h>。
上面的程式碼雖然達到了目的，但它產生了較多的額外操作，比如要先生成陣列sumArray，最後還要用一個for迴圈將它的各元素累加起來，有沒有更簡便的方式呢？答案是有，openMP為我們提供了另一個工具，歸約(reduction)，見下面程式碼：
 
 1 #include <iostream>
 2 int main(){
 3     int sum = 0;
 4     int a[10] = {1,2,3,4,5,6,7,8,9,10};
 5 #pragma omp parallel for reduction(+:sum)
 6     for (int i=0;i<10;i++)
 7         sum = sum + a[i];
 8     std::cout<<"sum: "<<sum<<std::endl;
 9     return 0;
10 }
 
上面程式碼裡，我們在#pragma omp parallel for 後面加上了 reduction(+:sum)，它的意思是告訴編譯器：下面的for迴圈你要分成多個執行緒跑，但每個執行緒都要儲存變數sum的拷貝，迴圈結束後，所有執行緒把自己的sum累加起來作為最後的輸出。
reduction雖然很方便，但它只支援一些基本操作，比如+,-,*,&,|,&&,||等。有些情況下，我們既要避免race condition，但涉及到的操作又超出了reduction的能力範圍，應該怎麼辦呢？這就要用到openMP的另一個工具，critical。來看下面的例子，該例中我們求陣列a的最大值，將結果儲存在max裡。
 
 1 #include <iostream>
 2 int main(){
 3     int max = 0;
 4     int a[10] = {11,2,33,49,113,20,321,250,689,16};
 5 #pragma omp parallel for
 6     for (int i=0;i<10;i++)
 7     {
 8         int temp = a[i];
 9 #pragma omp critical
10         {
11             if (temp > max)
12                 max = temp;
13         }
14     }
15     std::cout<<"max: "<<max<<std::endl;
16     return 0;
17 }
 
上例中，for迴圈還是被自動分成N份來並行執行，但我們用#pragma omp critical將 if (temp > max) max = temp 括了起來，它的意思是：各個執行緒還是並行執行for裡面的語句，但當你們執行到critical裡面時，要注意有沒有其他執行緒正在裡面執行，如果有的話，要等其他執行緒執行完再進去執行。這樣就避免了race condition問題，但顯而易見，它的執行速度會變低，因為可能存線上程等待的情況。
第二部分轉載於：http://www.cnblogs.com/wzyj/p/4501348.html

OpenMp之sections用法

section語句是用在sections語句裡用來將sections語句裡的程式碼劃分成幾個不同的段
#pragma omp [parallel] sections [子句] 
{ 
   #pragma omp section 
   { 
            程式碼塊 
   }  
} 
     當存在可選引數#pragma omp parallel sections時，塊中的程式碼section才會並行處理，而#pragma omp  sections是序列的程式。詳見下面的程式碼： 

#include<stdio.h>

#include<stdlib.h>

#include<omp.h>

#include <unistd.h>

int main()

{

 

 

   printf("parent threadid:%d\n",omp_get_thread_num());

   #pragma omp  sections

   {

     #pragma omp section

     {

          printf("section 0,threadid=%d\n",omp_get_thread_num());

          sleep(1);

     }

     #pragma omp section

     {

          printf("section 1,threadid=%d\n",omp_get_thread_num());

          //sleep(1);

     }

     #pragma omp section

     {

          printf("section 2,threadid=%d\n",omp_get_thread_num());

          sleep(1);

     }

   }

   #pragma omp parallel sections

   {

      #pragma omp section

     {

          printf("section 3,threadid=%d\n",omp_get_thread_num());

          sleep(1);

     }

      #pragma omp section

     {

          printf("section 4,threadid=%d\n",omp_get_thread_num());

          sleep(1);

     }

      #pragma omp section

     {

          printf("section 5,threadid=%d\n",omp_get_thread_num());

          sleep(1);

     }

   }

 

 return 0;

}
輸出結果為： 

parent threadid:0
section 0,threadid=0
section 1,threadid=0
section 2,threadid=0
section 3,threadid=0
section 4,threadid=2
section 5,threadid=1

針對上面的程式碼，首先應該明確下面幾點：
   1. sections之間是序列的。主執行緒把section0~2執行完之後才執行的第二個sections。
   2.沒有加parallel的sections裡面的section是序列的，為此我專門sleep了一秒，結果顯示0～2都是主執行緒做的。
   3.第二個sections裡面是並行的，程序編號分別為0，2，1。
問題來了，第二部分的0是不是主執行緒呢？還是第二部分新開的一個執行緒？為此需要真正輸出每個執行緒在核心中的執行緒編號：

#include<stdio.h>

#include<stdlib.h>

#include<omp.h>

#include <unistd.h>

#include <sys/types.h>

#include <sys/syscall.h>

 

int main()

{

 

   printf("pid:%d,tid=%ld\n",getpid(),syscall(SYS_gettid));

   #pragma omp sections

   {

     #pragma omp section

     {

          printf("section 0,tid=%ld\n",syscall(SYS_gettid));

          sleep(1);

     }

     #pragma omp section

     {

          printf("section 1,tid=%ld\n",syscall(SYS_gettid));

          //sleep(1);

     }

     #pragma omp section

     {

          printf("section 2,tid=%ld\n",syscall(SYS_gettid));

          sleep(1);

     }

   }

   #pragma omp parallel sections

   {

      #pragma omp section

     {

          printf("section 3,tid=%ld\n",syscall(SYS_gettid));

          sleep(1);

     }

      #pragma omp section

     {

          printf("section 4,tid=%ld\n",syscall(SYS_gettid));

          sleep(1);

     }

      #pragma omp section

     {

          printf("section 5,tid=%ld\n",syscall(SYS_gettid));

          sleep(1);

     }

   }

 

 return 0;

}
輸出結果：
 
pid:7619,tid=7619
section 0,tid=7619
section 1,tid=7619
section 2,tid=7619
section 5,tid=7621
section 4,tid=7619
section 3,tid=7620
從結果中可以看出以下幾點：
OpenMP上說當程式執行到第二個sections是並行的，主執行緒是休眠的，一直等所有的子執行緒都執行完畢之後才喚醒，可是在第二個sections中有個執行緒id和主執行緒id一致？其實是不一致的，首先從兩者的型別上來看，執行緒編號是long int的，而程序是int的，數字一致並不能說兩者相同。另外一方面，在linuxthreads時代，執行緒稱為輕量級程序（LWP），也就是每個執行緒就是個程序，每個執行緒（程序）ID不同；而從2.4.10後，採用NPTL（Native Posix Thread Library）的執行緒庫， 各個執行緒同樣是通過fork實現的，並且具備同一個父程序。
主程序id為7619，同時它又有個執行緒id也是7619，又一次證明在linux中執行緒程序差別不大。
猜測主執行緒並不是休眠，而是將原先的上下文儲存，然後自身也作為並行的一份子進行並行程式的執行，當並行程式完成之後，再回復原先的上下文資訊。
下面是一個比較複雜的例子

#include<stdio.h>

#include<stdlib.h>

#include<omp.h>

#include <unistd.h>

#include <sys/types.h>

#include <sys/syscall.h>

 

int main()

{

#pragma omp parallel

{

   printf("pid:%d,tid=%ld\n",getpid(),syscall(SYS_gettid));

   #pragma omp sections

   {

     #pragma omp section

     {

          printf("section 0,tid=%ld\n",syscall(SYS_gettid));

          //sleep(1);

     }

     #pragma omp section

     {

          printf("section 1,tid=%ld\n",syscall(SYS_gettid));

          sleep(1);

     }

     #pragma omp section

     {

          printf("section 2,tid=%ld\n",syscall(SYS_gettid));

          sleep(1);

     }

   }

   #pragma omp sections

   {

      #pragma omp section

     {

          printf("section 3,tid=%ld\n",syscall(SYS_gettid));

          sleep(1);

     }

      #pragma omp section

     {

          printf("section 4,tid=%ld\n",syscall(SYS_gettid));

          sleep(1);

     }

      #pragma omp section

     {

          printf("section 5,tid=%ld\n",syscall(SYS_gettid));

          sleep(1);

     }

   }

}

 

 return 0;

}

輸出結果：
 
pid:7660,tid=7660
section 0,tid=7660
section 1,tid=7660
pid:7660,tid=7662
section 2,tid=7662
pid:7660,tid=7663
pid:7660,tid=7661
section 3,tid=7660
section 5,tid=7661
section 4,tid=7662
 
#pragma omp parallel裡面的程式碼是並行處理的，但是並不意味著程式碼要執行N次（其中N為核數），sections之間是序列的，而並行的實際部分是sections內部的程式碼。當執行緒7660在處理0，1時，因為section1休眠1s，所以section2在此期間會被新的執行緒進行處理。第一個sections真正處理完成之後，第二個sections才開始並行處理。
另外值得注意的是，printf並不是並行的函式，它是將結果輸出到控制檯中，可是控制檯資源並不是共享的。當被某個執行緒佔用之後，其餘的執行緒只能等待，拿輸出的結果為例。對於#pragma omp parallel裡面的程式碼是並行的，可是執行緒之間還是有先後的次序的，次序和執行緒的建立時間有關，對於執行緒7660來說，本身就已經存在了，所以首先獲得printf函式，而直到它執行section0裡面的printf時，其他的執行緒還沒有建立完畢，接著是setion1裡面的printf，即使是這個時候有其他的執行緒建立完成了，也只能等待，在section1中，sleep了1秒鐘，printf函式被新的執行緒使用，下面也如此。

那麼用openMP怎麼實現並行陣列求和呢？下面我們先給出一個基本的解決方案。該方案的思想是，首先生成一個數組sumArray，其長度為並行執行的執行緒的個數(預設情況下，該個數等於CPU的核數)，在for迴圈裡，讓各個執行緒更新自己執行緒對應的sumArray裡的元素，最後再將sumArray裡的元素累加到sum裡，程式碼如下

 1 #include <iostream>
 2 #include <omp.h>
 3 int main(){
 4     int sum = 0;
 5     int a[10] = {1,2,3,4,5,6,7,8,9,10};
 6     int coreNum = omp_get_num_procs();//獲得處理器個數
 7     int* sumArray = new int[coreNum];//對應處理器個數，先生成一個數組
 8     for (int i=0;i<coreNum;i++)//將陣列各元素初始化為0
 9         sumArray[i] = 0;
10 #pragma omp parallel for
11     for (int i=0;i<10;i++)
12     {
13         int k = omp_get_thread_num();//獲得每個執行緒的ID
14         sumArray[k] = sumArray[k]+a[i];
15     }
16     for (int i = 0;i<coreNum;i++)
17         sum = sum + sumArray[i];
18     std::cout<<"sum: "<<sum<<std::endl;
19     return 0;
20 }

需要注意的是，在上面程式碼裡，我們用omp_get_num_procs()函式來獲取處理器個數，用omp_get_thread_num()函式來獲得每個執行緒的ID，為了使用這兩個函式，我們需要include <omp.h>。

上面的程式碼雖然達到了目的，但它產生了較多的額外操作，比如要先生成陣列sumArray，最後還要用一個for迴圈將它的各元素累加起來，有沒有更簡便的方式呢？答案是有，openMP為我們提供了另一個工具，歸約(reduction)，見下面程式碼：

 1 #include <iostream>
 2 int main(){
 3     int sum = 0;
 4     int a[10] = {1,2,3,4,5,6,7,8,9,10};
 5 #pragma omp parallel for reduction(+:sum)
 6     for (int i=0;i<10;i++)
 7         sum = sum + a[i];
 8     std::cout<<"sum: "<<sum<<std::endl;
 9     return 0;
10 }

上面程式碼裡，我們在#pragma omp parallel for 後面加上了 reduction(+:sum)，它的意思是告訴編譯器：下面的for迴圈你要分成多個執行緒跑，但每個執行緒都要儲存變數sum的拷貝，迴圈結束後，所有執行緒把自己的sum累加起來作為最後的輸出。

reduction雖然很方便，但它只支援一些基本操作，比如+,-,*,&,|,&&,||等。有些情況下，我們既要避免race condition，但涉及到的操作又超出了reduction的能力範圍，應該怎麼辦呢？這就要用到openMP的另一個工具，critical。來看下面的例子，該例中我們求陣列a的最大值，將結果儲存在max裡。

 1 #include <iostream>
 2 int main(){
 3     int max = 0;
 4     int a[10] = {11,2,33,49,113,20,321,250,689,16};
 5 #pragma omp parallel for
 6     for (int i=0;i<10;i++)
 7     {
 8         int temp = a[i];
 9 #pragma omp critical
10         {
11             if (temp > max)
12                 max = temp;
13         }
14     }
15     std::cout<<"max: "<<max<<std::endl;
16     return 0;
17 }

上例中，for迴圈還是被自動分成N份來並行執行，但我們用#pragma omp critical將 if (temp > max) max = temp 括了起來，它的意思是：各個執行緒還是並行執行for裡面的語句，但當你們執行到critical裡面時，要注意有沒有其他執行緒正在裡面執行，如果有的話，要等其他執行緒執行完再進去執行。這樣就避免了race condition問題，但顯而易見，它的執行速度會變低，因為可能存線上程等待的情況。

第二部分轉載於：http://www.cnblogs.com/wzyj/p/4501348.html

OpenMp之sections用法

section語句是用在sections語句裡用來將sections語句裡的程式碼劃分成幾個不同的段

#pragma omp [parallel] sections [子句] { #pragma omp section { 程式碼塊 } } 當存在可選引數#pragma omp parallel sections時，塊中的程式碼section才會並行處理，而#pragma omp sections是序列的程式。詳見下面的程式碼：

#include<stdio.h>
#include<stdlib.h>
#include<omp.h>
#include <unistd.h>
int main()
{
printf("parent threadid:%d\n",omp_get_thread_num());
#pragma omp sections
{
#pragma omp section
{
printf("section 0,threadid=%d\n",omp_get_thread_num());
sleep(1);
}
#pragma omp section
{
printf("section 1,threadid=%d\n",omp_get_thread_num());
//sleep(1);
}
#pragma omp section
{
printf("section 2,threadid=%d\n",omp_get_thread_num());
sleep(1);
}
}
#pragma omp parallel sections
{
#pragma omp section
{
printf("section 3,threadid=%d\n",omp_get_thread_num());
sleep(1);
}
#pragma omp section
{
printf("section 4,threadid=%d\n",omp_get_thread_num());
sleep(1);
}
#pragma omp section
{
printf("section 5,threadid=%d\n",omp_get_thread_num());
sleep(1);
}
}
return 0;
}

輸出結果為：

parent threadid:0
section 0,threadid=0
section 1,threadid=0
section 2,threadid=0
section 3,threadid=0
section 4,threadid=2
section 5,threadid=1

針對上面的程式碼，首先應該明確下面幾點：

1. sections之間是序列的。主執行緒把section0~2執行完之後才執行的第二個sections。

2.沒有加parallel的sections裡面的section是序列的，為此我專門sleep了一秒，結果顯示0～2都是主執行緒做的。

3.第二個sections裡面是並行的，程序編號分別為0，2，1。

問題來了，第二部分的0是不是主執行緒呢？還是第二部分新開的一個執行緒？為此需要真正輸出每個執行緒在核心中的執行緒編號：

#include<stdio.h>
#include<stdlib.h>
#include<omp.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/syscall.h>
int main()
{
printf("pid:%d,tid=%ld\n",getpid(),syscall(SYS_gettid));
#pragma omp sections
{
#pragma omp section
{
printf("section 0,tid=%ld\n",syscall(SYS_gettid));
sleep(1);
}
#pragma omp section
{
printf("section 1,tid=%ld\n",syscall(SYS_gettid));
//sleep(1);
}
#pragma omp section
{
printf("section 2,tid=%ld\n",syscall(SYS_gettid));
sleep(1);
}
}
#pragma omp parallel sections
{
#pragma omp section
{
printf("section 3,tid=%ld\n",syscall(SYS_gettid));
sleep(1);
}
#pragma omp section
{
printf("section 4,tid=%ld\n",syscall(SYS_gettid));
sleep(1);
}
#pragma omp section
{
printf("section 5,tid=%ld\n",syscall(SYS_gettid));
sleep(1);
}
}
return 0;
}

輸出結果：

pid:7619,tid=7619
section 0,tid=7619
section 1,tid=7619
section 2,tid=7619
section 5,tid=7621
section 4,tid=7619
section 3,tid=7620

從結果中可以看出以下幾點：

OpenMP上說當程式執行到第二個sections是並行的，主執行緒是休眠的，一直等所有的子執行緒都執行完畢之後才喚醒，可是在第二個sections中有個執行緒id和主執行緒id一致？其實是不一致的，首先從兩者的型別上來看，執行緒編號是long int的，而程序是int的，數字一致並不能說兩者相同。另外一方面，在linuxthreads時代，執行緒稱為輕量級程序（LWP），也就是每個執行緒就是個程序，每個執行緒（程序）ID不同；而從2.4.10後，採用NPTL（NativePosixThreadLibrary）的執行緒庫，各個執行緒同樣是通過fork實現的，並且具備同一個父程序。
主程序id為7619，同時它又有個執行緒id也是7619，又一次證明在linux中執行緒程序差別不大。
猜測主執行緒並不是休眠，而是將原先的上下文儲存，然後自身也作為並行的一份子進行並行程式的執行，當並行程式完成之後，再回復原先的上下文資訊。

下面是一個比較複雜的例子

#include<stdio.h>
#include<stdlib.h>
#include<omp.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/syscall.h>
int main()
{
#pragma omp parallel
{
printf("pid:%d,tid=%ld\n",getpid(),syscall(SYS_gettid));
#pragma omp sections
{
#pragma omp section
{
printf("section 0,tid=%ld\n",syscall(SYS_gettid));
//sleep(1);
}
#pragma omp section
{
printf("section 1,tid=%ld\n",syscall(SYS_gettid));
sleep(1);
}
#pragma omp section
{
printf("section 2,tid=%ld\n",syscall(SYS_gettid));
sleep(1);
}
}
#pragma omp sections
{
#pragma omp section
{
printf("section 3,tid=%ld\n",syscall(SYS_gettid));
sleep(1);
}
#pragma omp section
{
printf("section 4,tid=%ld\n",syscall(SYS_gettid));
sleep(1);
}
#pragma omp section
{
printf("section 5,tid=%ld\n",syscall(SYS_gettid));
sleep(1);
}
}
}
return 0;
}

輸出結果：

pid:7660,tid=7660
section 0,tid=7660
section 1,tid=7660
pid:7660,tid=7662
section 2,tid=7662
pid:7660,tid=7663
pid:7660,tid=7661
section 3,tid=7660
section 5,tid=7661
section 4,tid=7662

#pragma omp parallel裡面的程式碼是並行處理的，但是並不意味著程式碼要執行N次（其中N為核數），sections之間是序列的，而並行的實際部分是sections內部的程式碼。當執行緒7660在處理0，1時，因為section1休眠1s，所以section2在此期間會被新的執行緒進行處理。第一個sections真正處理完成之後，第二個sections才開始並行處理。

另外值得注意的是，printf並不是並行的函式，它是將結果輸出到控制檯中，可是控制檯資源並不是共享的。當被某個執行緒佔用之後，其餘的執行緒只能等待，拿輸出的結果為例。對於#pragma omp parallel裡面的程式碼是並行的，可是執行緒之間還是有先後的次序的，次序和執行緒的建立時間有關，對於執行緒7660來說，本身就已經存在了，所以首先獲得printf函式，而直到它執行section0裡面的printf時，其他的執行緒還沒有建立完畢，接著是setion1裡面的printf，即使是這個時候有其他的執行緒建立完成了，也只能等待，在section1中，sleep了1秒鐘，printf函式被新的執行緒使用，下面也如此。

visual studio C++ 使用OpenMP 進行平行計算

OpenMp之sections用法

visual studio C++ 使用OpenMP 進行平行計算

visual studio C++ 遇到的問題和解決辦法

python對檔案進行平行計算初探(二）

10分鐘學會Visual Studio將自己建立的類庫打包到NuGet進行引用(net,net core,C#)

Visual Studio 2019安裝、測試建立c語言專案(圖文教程)

C++運算子過載例項程式碼詳解（除錯環境 Visual Studio 2019）

Visual Studio 2019建立C++ Hello World專案的方法

visual studio 2019編譯c++17的方法

visual studio 2019安裝配置可編寫c/c++語言的IDE環境

Visual Studio Code (vscode) 配置C、C++環境/編寫執行C、C++的教程詳解（主要Windows、簡要Linux）

Visual Studio Code (vscode) 配置C、C++環境/編寫執行C、C++的教程詳解（Windows）【真正的小白版】

visual studio code 配置C++開發環境的教程詳解（windows 開發環境）

Visual Studio 2019安裝使用C語言程式（VS2019 C語言）

Visual Studio Code執行C++程式碼時顯示CLOCKS_PER_SEC未定義的問題及解決方法

Visual Studio Code 配置C、C++環境/編譯並執行的流程分析

Visual Studio Code配置C/C++開發環境的教程圖解

C#（99）：C# 8.0 新特性( NET Framework 4.8 與 Visual Studio 2019 )

C#（99）：C# 6.0 新特性(.NET Framework 4.6 與 Visual Studio 2015 )

C#（99）：C# 7.0 新特性(.NET Framework 4.7 與 Visual Studio 2017 )

C#（99）：C# 9.0 新特性( NET Framework 5.0 與 Visual Studio ? )

visual studio C++ 使用OpenMP 進行平行計算

相關推薦