openMP多線程編程

阿新 • • 發佈：2017-05-16

時間差 ostream lan 系統 1.8 內存 total hand 缺點

OpenMP(Open Muti-Processing)

OpenMP缺點：

1：作為高層抽象，OpenMp並不適合需要復雜的線程間同步和互斥的場合；

2：另一個缺點是不能在非共享內存系統(如計算機集群)上使用。在這樣的系統上，MPI使用較多。

關於openMP實現 臨界區與互斥鎖 可參考 reference3

windows系統下使用

==========================WINDOWS系統中使用==========================

基本使用：

在visual C++2010中使用OpenMP

1：將 Project 的Properties中C/C++裏Language的OpenMP Support開啟（參數為 /openmp）；

2：在編寫使用OpenMP 的程序時，則需要先include OpenMP的頭文件：omp.h；

3：在要並行化的for循環前面加上 #pragma omp parallel for

如下簡單例子：

[cpp] view plain copy

//未使用OpenMP
#include <stdio.h>
#include <stdlib.h>
void Test(int n) {
for(int i = 0; i < 10000; ++i)
{
//do nothing, just waste time
}
printf("%d, ", n);
}
int main(int argc,char* argv[])
{
for(int i = 0; i < 16; ++i)
Test(i);
system("pause");
}

結果為：

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,11,12,13,14,15，

[cpp] view plain copy

//使用OpenMP
<pre name="code" class="cpp">#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
void Test(int n) {
for(int i = 0; i < 10000; ++i) {
//do nothing, just waste time
}
printf("%d, ", n);
}
int main(int argc,char* argv[])
{
#pragma omp parallel for
for(int i = 0; i < 16; ++i)
Test(i);
system("pause");
}

（我的筆記本為2核 4線程）

顯示結果為：

0,12,4,8,1,13,5,9,2,14,6,10,3,15,7,11,

OpenMP將循環0-15拆分成0-3,4-7,8-11，12-15四個部分來執行。

當編譯器發現#pragma omp parallel for後，自動將下面的for循環分成N份，(N為電腦CPU線程數)，然後把每份指派給一個線程去執行，而且多線程之間為並行執行。

關於獲取CPU核數與線程ID

[cpp] view plain copy

#include <iostream>
#include <omp.h>
int main(){
int sum = 0;
int a[10] = {1,2,3,4,5,6,7,8,9,10};
int coreNum = omp_get_num_procs();//獲得處理器個數（其實獲取的是線程的數量，我的筆記本為2核4線程，測試時獲取的數字為4）</span>
int* sumArray = new int[coreNum];//對應處理器個數，先生成一個數組
for (int i=0;i<coreNum;i++)//將數組各元素初始化為0
sumArray[i] = 0;
#pragma omp parallel for
for (int i=0;i<10;i++)
{
int k = <span style="color:#3366FF;">omp_get_thread_num();//獲得每個線程的ID</span>
sumArray[k] = sumArray[k]+a[i];
}
for (int i = 0;i<coreNum;i++)
sum = sum + sumArray[i];
std::cout<<"sum: "<<sum<<std::endl;
return 0;
}

Ubuntu系統中使用

=================ubuntu系統中=====================================

Hands on FAQ:

*怎麽在Linux上運行OpenMP程序？
> 只需要安裝支持OpenMP的編譯器即可，比如GCC 4.2以上版本（好像Fedora Core帶的部分4.1版本也支持），或者ICC（我用的version 9.1是支持的，其他沒試過）。

*怎麽缺點編譯器是不是支持OpenMP？
> 看編譯器安裝路徑下/include目錄裏有沒有omp.h。

*怎麽區分OpenMP程序？
> 程序中有沒有以下內容：
> #include <omp.h>
> #pragma omp ...

*怎麽編譯OpenMP程序？
> gcc -fopenmp [sourcefile] -o [destination file]
> icc -openmp [sourcefile] -o [destination file]

*怎麽運行OpenMP程序？
> 編譯後得到的文件和普通可執行文件一樣可以直接執行。

*怎麽設置線程數？
>：在程序中寫入set_num_threads(n);
> Method2：export OMP_NUM_THREADS=n;
> 兩種方法各有用處，前者只對該程序有效，後者不用重新編譯就可以修改線程數。

Example1:並行與串行時間差別

Sequetial Version:

[cpp] view plain copy

#include<iostream>
#include<sys/time.h>
#include<unistd.h>
using namespace std;
void test(int n)
{
int a=0;
struct timeval tstart,tend;
double timeUsed;
gettimeofday(&tstart,NULL);
for(int i=0;i<1000000000;i++)
{
a=i+1;
}
gettimeofday(&tend,NULL);
timeUsed=1000000*(tend.tv_sec-tstart.tv_sec)+tend.tv_usec-tstart.tv_usec;
cout<<n<<" Time="<<timeUsed/1000<<" ms"<<endl;
}
int main()
{
struct timeval tstart,tend;
double timeUsed;
gettimeofday(&tstart,NULL);
int j=0;
for(j=0;j<4;j++)
{
test(j);
}
gettimeofday(&tend,NULL);
timeUsed=1000000*(tend.tv_sec-tstart.tv_sec)+tend.tv_usec-tstart.tv_usec;
cout<<" Total Time="<<timeUsed/1000<<" ms"<<endl;
return 0;
}

Parallel Version:

[cpp] view plain copy

#include<iostream>
#include<sys/time.h>
#include<unistd.h>
#include<omp.h>
using namespace std;
void test(int n)
{
int a=0;
struct timeval tstart,tend;
double timeUsed;
gettimeofday(&tstart,NULL);
for(int i=0;i<1000000000;i++)
{
a=i+1;
}
gettimeofday(&tend,NULL);
timeUsed=1000000*(tend.tv_sec-tstart.tv_sec)+tend.tv_usec-tstart.tv_usec;
cout<<n<<" Time="<<timeUsed/1000<<" ms"<<endl;
}
int main()
{
struct timeval tstart,tend;
double timeUsed;
gettimeofday(&tstart,NULL);
int j=0;
#pragma omp parallel for
for(j=0;j<4;j++)
{
test(j);
}
gettimeofday(&tend,NULL);
timeUsed=1000000*(tend.tv_sec-tstart.tv_sec)+tend.tv_usec-tstart.tv_usec;
cout<<" Total Time="<<timeUsed/1000<<" ms"<<endl;
return 0;
}

Result:

Sequential version:

[cpp] view plain copy

0 Time=2064.69 ms
1 Time=2061.11 ms
2 Time=2076.32 ms
3 Time=2077.93 ms
Total Time=8280.14 ms

Parallel version:

[cpp] view plain copy

2 Time=2148.22 ms
3 Time=2151.72 ms
0 Time=2151.85 ms
1 Time=2151.77 ms
Total Time=2158.81 ms

------------------------------------------------------------------------------------------------------------------------------------------------------------

Example2:矩陣擬合法計算Pi

Sequential Version:

[cpp] view plain copy

#include<iostream>
#include<sys/time.h>
#include<unistd.h>
//#include <omp.h>
using namespace std;
int main ()
{
struct timeval tstart,tend;
double timeUsed;
static long num_steps =1000000000;
double step;
int i;
double x, pi, sum = 0.0;
step = 1.0/(double) num_steps;
gettimeofday(&tstart,NULL);
//#pragma omp parallel for reduction(+:sum) private(x) /*只加了這一句，其他不變*/
for (i=0;i < num_steps; i++)
{
x = (i+0.5)*step;
sum = sum + 4.0/(1.0+x*x);
}
pi = step * sum;
gettimeofday(&tend,NULL);
timeUsed=1000000*(tend.tv_sec-tstart.tv_sec)+tend.tv_usec-tstart.tv_usec;
timeUsed=timeUsed/1000;
cout<<"pi="<<pi<<" ("<<num_steps<<" ) "<<timeUsed<<" ms"<<endl;
return 0;
}

Parallel Version:

[cpp] view plain copy

#include<iostream>
#include<sys/time.h>
#include<unistd.h>
#include <omp.h>
using namespace std;
int main ()
{
struct timeval tstart,tend;
double timeUsed;
static long num_steps = 1000000000;
double step;
int i;
double x, pi, sum = 0.0;
step = 1.0/(double) num_steps;
gettimeofday(&tstart,NULL);
#pragma omp parallel for reduction(+:sum) private(x) /*只加了這一句，其他不變*/
for (i=0;i < num_steps; i++)
{
x = (i+0.5)*step;
sum = sum + 4.0/(1.0+x*x);
}
pi = step * sum;
gettimeofday(&tend,NULL);
timeUsed=1000000*(tend.tv_sec-tstart.tv_sec)+tend.tv_usec-tstart.tv_usec;
timeUsed=timeUsed/1000;
cout<<"pi="<<pi<<" ("<<num_steps<<" ) "<<timeUsed<<" ms"<<endl;
return 0;
}

運行結果為：

[cpp] view plain copy

[email protected]:~/test$ ./parrPI2
pi=3.14159 (1000000000 ) 3729.68 ms
[email protected]:~/test$ ./seqPI2
pi=3.14159 (1000000000 ) 13433.1 ms

我的電腦為2核，4線程提升速度為13433/3739=3.6 。因為這個程序本身具有良好的並發性，循環間幾乎沒有數據依賴，除了sum，但是用reduction(+:sum)把對於sum的相關也消除了。

關於reduction ， private具體請到references 7中查看。

需要特別註意的一點是:

上述的計時方法使用的是gettimeofday() 而原博客給出的計時方法是time_t (使用time_t是沒法達到作者所說的速度的，你會發現並行的時間比串行還慢)。

主要原因：計時方法不一樣，具體請看兩者的區別（另一篇博客）

reference：

1：http://baike.baidu.com/view/1687659.htm

2：http://www.cnblogs.com/yangyangcv/archive/2012/03/23/2413335.html

-----------------------------------------------------------------------------------------------------------------

3：http://www.ibm.com/developerworks/cn/aix/library/au-aix-openmp-framework/index.html

4：http://openmp.org/wp/openmp-compilers/（官網）

5：http://blog.163.com/zl_dream1106/blog/static/84286020105210012295/ （linux 系統中OpenMP）

6：http://blog.163.com/zl_dream1106/blog/static/842860201052952352/?suggestedreading&wumii（OpenMP編程指南）

7：http://blog.163.com/zl_dream1106/blog/static/84286020105293213869/?suggestedreading&wumii（OpenMP 入門）

openMP多線程編程

時間差 ostream lan 系統 1.8 內存 total hand 缺點 OpenMP(Open Muti-Processing) OpenMP缺點： 1：作為高層抽象，OpenMp並不適合需要復雜的線程間同步和互斥的場合； 2：另一個缺點是不能在非共享

openMP多線程編程

windows系統下使用

Ubuntu系統中使用

openMP多線程編程

Java多線程編程模式實戰指南（三）：Two-phase Termination模式

七. 多線程編程7.線程優先級

七. 多線程編程8.線程同步

七. 多線程編程6.isAlive()和join()的使用

七. 多線程編程9.線程間通信

Python多線程編程

Java多線程編程：Callable、Future和FutureTask淺析

多線程編程-- part 3 多線程同步->synchronized關鍵字

多線程編程-- part 4 線程間的通信

Windows 下 C/C++ 多線程編程入門參考範例

多線程編程-- part5.1 互斥鎖之公平鎖-獲取鎖

多線程編程-- part 5.2 JUC鎖之Condition條件

Java多線程編程

多線程編程-- part 8 CyclicBarrier

python -- 多線程編程

多線程編程基礎

使用線程池優化多線程編程

Java多線程編程總結

《C#多線程編程實現方式》

openMP多線程編程

windows系統下使用

Ubuntu系統中使用

相關推薦