1. 程式人生 > 其它 >Parallel Programming in Fortran 95 using OpenMP

Parallel Programming in Fortran 95 using OpenMP

Parallel programming Open MP-Bell

CHAPTER 1

Basic directives

include a white space between the directive sentinel !$OMP and the following OpenMP directive.

conditional compilation !$

parallel region constructor

!$OMP PARALLEL

!$OMP END PARALLEL

Before and after the parallel region, the code is executed by only one thread-serial regions

. (It is not allowed to jump in or out of the parallel region using GOTO command.)

master thread - when a thread executing a serial region encounters a parallel region, it creates a team of thread, and it becomes the master thread of the team.

thread number- ranges from zero, for the master thread, up to N_p-1 .

At the beginning of the parallel region it is possible to impose clauses which fix certainaspects of the way in which the parallel region is going to work: for example the scope ofvariables, the number of threads, special treatments of some variables, etc.

!$OMP PARALLEL clause 1, clause 2...

!$OMP END PARALLEL

only the following ones are allowed within the !$OMP PARALLEL directive,

PRIVATE(list)

SHARED(list)

DEFAULT(PRIVATE|SHARED|NONE)

FIRSTPRIVATE(list)

COPYIN(list)

REDUCTION(operator:list)

IF(scalar_logical_expression)

NUM_THREADS(scalar_integer_expression)

Nested parallel region-totally N_p^2+N_p messages will be printed on the screen.

!$OMP PARALLEL

WRITE(,) "HELLO"

!$OMP PARALLEL

WRITE(,) "HI"

!$OMP END PARALLEL

!$OMP END PARALLEL

CHAPTER 2 OpenMP constructs

2.1 Work-sharing constructs

restrictions

Work-sharing constructs must be encountered by all threads in a team or by noneat all.

Work-sharing constructs must be encountered in the same order by all threads in ateam.

2.1.1 !$OMP DO END DO (should be placed inside a parallel region)

!$OMP DO

do i =1,1000

...

end do

!$OMP END DO

The way in which the work is distributed and in general how the working-sharing construct has to behave can be controlled with claused.

!$OMP DO clause 1, clause 2, ...

!$OMP END DO end_clause

only the following clauses are allowed in the !$OMP DO directive

PRIVATE(list)

FIRSTPRIVATE(list)

LASTPRIVATE(list)

REDUCTION(operator:list)

SCHEDULE(type, chunk)

ORDERED

add to the closing directive the NOWAIT clause in order to avoid the implied synchronization.

If after the do-loop the modified variables have to be used, it is nescessary to add an implied or an explicit updating of the shared variables using !$OMP FLUSH directive.

using !$OMP ORDERED OMP END ORDERED

!$OMP DO ORDERED

do i=1,1000

!$OMP ORDERED

A(i)=A(i-1)

!$OMP ORDERED

end do

!$OMP END DO

. When several nested do-loops are present, it is always convenient to parallelizethe outer most one, since then the amount of work distributed over the different threadsis maximal.

2.1.2 !$OMP SECTIONS-assign to each thread a completely different task leading to an multiple programs multiple data. Each section of code is executed once and only once by a thread in the team.

syntax- each block of the code, to be executed by one of the threads, starts with an !$OMP SECTION directive and extend until the same directive is found again or until the closing-directive OMP END SECTIONS is found.

!$OMP SECTIONS clause 1, clause 2

...

!$OMP SECTION

!$OMP SECTION

...

!$OMP END SECTIONS end_clause

!$OMP SECTIONS accepts the following clauses

PRIVATE(list)

FIRSTPRIVATE(list)

LASEPRIVATE(list)

REDUCTION(operator:list)

!$OMP END SECTIONS only accepts the NOWAIT clause.

Example

!$OMP SECTIONS

!$OMP SECTION

write(,) "hello"

!$OMP SECTION

write(,) "bye"

!$OMP END SECTIONS

2.1.3 !$OMP SINGLE OMP END SINGLE-The code enclosed in this directive-pair is only executed by one of the threads in the team,namely the one who first arrives to the opening-directive OMP SINGLE.

all the remaining threads wait at the implied synchronization in the closing-directive !$OMP END SINGLE.

!$OMP SINGLE clause 1, clause 2, ...

...

!$OMP END SINGLE end_clause

end_clause can be the cluase NOWAIT or COPYPRIVATE, but not both at the same time.

Only the following two clauses can be used in the opening-directive:

PRIVATE(list)

FIRSTPRIVATE(list)

2.1.4 !$OMP WORKSHARE OMP END WORKSHARE-allow parallelizable Fortran 95 commands' parallelization.

parallelizable Fortran 95 commands, like forall and where statements, cannot be treated with OpenMP directives.

Fortran 95 transformational array intrinsic functions can be parallelized with the aid of the !OMP WORKSHARE/!$OMP END WORKSHARE directive-pair:matmul, dot product, sum, product, maxval, minval, count, any, all, spread, pack, unpack,reshape, transpose, eoshift, cshift, minloc and maxloc.

2.2 Combined parallel work-sharing constructs-specifying a parallel region that contains only one work-sharing construct 【對於有單個work-sharing的結構,可以指定一個並行區域】

2.2.1 !$OMP PARALLEL DO OMP END PARALLEL DO

!$OMP PARALLEL DO clause 1, clause 2, ...

...

!$OMP END PARALLEL DO

clause 1, clause2 可以是!$OMP PARALLEL 或者是OMP DO後面的directive

2.2.2 !$OMP PARALLEL SECTIONS OMP END PARALLEL SECTIONS-用來指定僅包含單個OMP SECTIONS OMP END SECTIONSdirective-pairs

!$OMP PARALLEL SECTIONS clause 1, clause 2, ...

!$OMP END PARALLEL SECTIONS

clause 1, clause2 可以是!$OMP PARALLEL 或者是OMP SECTIONS後面的directive

2.3 Synchronization constructs

2.3.1 !$OMP MASTER OMP END MASTER-the code enclosed inside this directive-pair is executed only by the master thread of the team. Meanwhile, all the other threads continue with their work: no implied synchronization exists!

!$OMP MASTER

...

!$OMP END MASTER

In essence, this directive-pair is similar to using the !$OMP SINGLE/!OMP END SINGLE directive-pair presented before together with the NOWAIT clause

2.3.2 !$OMP CRITICAL OMP END CRITICAL-This directive-pair restricts the access to the enclosed code to only one thread at a time

!$OMP CRITICAL name

...

!$OMP END CRITICAL name

name argument identifies the critical section. it is strongly recommended to give a name to each critical section

When a thread reaches the beginning of a critical section, it waits there until no other thread is executing the code in the critical section. Different critical sections using the same name are treated as one common critical section, which means that only one thread at a time is inside them.

all unnamed critical sections are considered as one common critical section

!$OMP CRITICAL write_file

!$OMP CRITICAL write_file

2.3.3 !$OMP BARRIER-This directive represents an explicit synchronization between the different threads in the team. When encountered, each thread waits until all the other threads have reached this point.

The !$OMP BARRIER directive must be encountered by all threads in a team or bynone at all.

it is necessary to avoid deadlock:

!$OMP CRITICAL

!$OMP BARRIER

!$OMP END CRITICAL

!$OMP SINGLE

!$OMP BARRIER

!$OMP END SINGLE

!$OMP MASTER

!$OMP BARRIER

!$OMP END MASTER

!$OMP SECTIONS

!$OMP SECTION

!$OMP BARRIER

!$OMP SECTION

!$OMP END SECTIONS

2.3.4 !$OMP ATOMIC-When a variable in use can be modified from all threads in a team, it is necessary to ensure that only one thread at a time is writing/updating the memory location of the considered variable. The present directive targets to ensure that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple, simultaneously writing threads

Only the followingones can be used together with the !$OMP ATOMIC directive:

The variable x, affected by the !$OMP ATOMIC directive, must be of scalar nature and of intrinsic type.

!$OMP ATOMIC -this directive only affects the immediately following statement.

2.3.5 !$ OMP FLUSH-. This directive must appear at the precise point in the code at which the data synchronizationis required.It ensures the updating of all variables.

the !$OMP FLUSH directive offers the possibility ofincluding a list with the names of the variables to be flushed

!$OMP FLUSH (variable 1, variable 2,...)

有(顯式或者隱式)資料同步的命令:

無顯式(或隱式)資料同步的命令,隱式資料同步可以通過NOWAIT關閉

2.3.6 !$OMP ORDERED OMP END ORDERED

no thread can enter the ORDERED section until it is guaranteed that all previous iterations have been completed

the order of entrance is specified by the sequence condition of the loop iterations.

without the implied synchronization

only one ORDERED section is allowed to be executed by each iteration inside a parallelized do-loop

2.4 Data environment constructs

there are two kinds of data environment constructs

which are independent of other OpenMP constructs

which are associated to an OpenMP constructs and which effect only that OpenMP construct and its lexical extend (data scope attribute clauses)

2.4.1 !$OMP THREADPRIVATElist-its value is accessible from everywhere inside each thread and thatits value does not change from one parallel region to the next

e.g. my_id

The !$OMP THREADPRIVATE directive needs to be placed just after the declarations ofthe variables and before the main part of the software unit

can only appear in the clauses COPYIN and COPYPRIVATE.

application

CHAPTER 3 PRIVATE SHARED & Co

3.1 Data scope attribute clauses

3.1.1 PRIVATE(list)-非常耗費資源

!$OMP PARALLEL PRIVATE(a,b)

Variables that are used as counters for do-loops, forall commands or implicit do-loopsor are specified to be THREADPRIVATE become automatically private to each thread, eventhough they are not explicitly included inside a PRIVATE clause at the beginning of thescope of the directive-pair.

Variables declared as private have an undefined value at the beginning of the scope of the directive-pair, since they have just been created. Also when the scope of the directive-pair finishes, the original variable will have an undefined value (which valuefrom the available ones should it have!?).

3.1.2 SHARED(list)

!$OMP PARALLEL SHARED(c,d)

c and d are seen by all the threads inside the scope of the directive-pair.

does not consume any additional resources.

does not guarantee that the threads are immediately aware of changes made to the variable by another thread;

force the update of the shared variables by using the directive !$OMP FLUSH

avoid racing condition by programmer or !$OMP ATOMIC

3.1.3 DEFAULT(PRIVATE | SHARED | NONE)

When most of the variables used inside the scope of a directive-pair are going to be private/shared, it is possible to specify a default setting.

If no DEFAULT clause is specified, the default behavior is the same as if DEFAULT(SHARED) were specified

!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(a)

NONE:defualt語句指定為none屬性時,並行語句範圍內的所有變數都要在並行命令開始處顯式宣告屬性。例外為:do迴圈的counter, forall語句, 隱式do迴圈,以及屬性為THREADPRIVATE的變數。

3.1.4 FIRSTPRIVATE(list)-適用於需要初始值的區域性變數

屬性為PRIVATE的變數在directive-pair範圍的開始處,具有未定義的值。

!$OMP PARALLEL PRIVATE(a) FIRSTPRIVATE(b)

a的屬性為private,進入parallel region時,初始值未定義;但是b的初始值為parallel region 之前serial region的值。

非常耗費資源(變數的值需要從serial region傳給每個thread,相當於傳N倍的資料,N is the number of threads.

3.1.5 LASTPRIVATE(list)

當屬性為lastprivate時,變數的值為執行完並行語句時的值

執行完並行語句時,變數的值在不同執行緒之間進行同步。需要有顯式或者隱式的同步。

3.1.6 COPYIN(list)

屬性為threadprivate的變數可以通過COPYIN語句將變數值設定為master thread中的值。

耗費資源:需要將master thread中的值傳遞到每個thread.

如下圖,初始a為每個執行緒的id,後面通過copyin語句,使得對於每個執行緒,a的值都是0.

3.1.7 COPYPRIVATE(list)

只能用在!$OMP END SINGLE關閉指令之後

用於在!OMP SINGLE/OMP END SINGLE語句執行完之後,將屬性為private的變數broadcast到每個執行緒

!$OMP END SINGLE之後,NOWAITCOPYPRIVATE(list)語句不能同時使用

3.1.8 REDUCTION(operator:list)

確保只有一個執行緒在寫入/更新某個屬性為SHARED的變數

只適用於以下情況:

x = x operator expr

x = intrinsic_procedure (x, expr_list), 變數x必須為標量和內建型別

operatorintrinsic_procedure有:

3.2 Other clauses

3.2.1 IF(scalar_logical_expression)

在某些特定條件下開啟並行(因為某些情況開啟並行區域所耗費的時間比序列執行更長)

3.2.2 NUM_THREADS(scalar_integer_expression)

使用指定數目的執行緒

!$OMP PARALLEL NUM_THREADS(4)-此並行區域使用4個執行緒

3.2.3 NOWAIT

避免同步

使用NOWAIT時,同時關閉了隱式的同步

3.2.4 SCHEDULE(type, chunk)-chunk is optional

允許為DO迴圈指定執行緒的分配方式(不一定要均分)

!$OMP DO SCHEDULE(type,chunk)

four different options for scheduling :

STATIC

!$OMP DO SCHEDULE(STATICchunk)

如上圖,假設一共有三個執行緒的話,沒有chunk,預設chunk=200chunk取不同數值時,分配方案如下:

DYNAMIC

!$OMP DO SCHEDULE(DYNAMICchunk)

iteration space被劃分為chunk大小的pieces,當一個執行緒執行完一個pieces後,就自動去執行下一個;如果chunk沒有值,預設是1

相較於STATIC,具有更好的效能;但是增加了分配迴圈的過程,當piece越小,此過程cost越大

GUIDED

!$OMP DO SCHEDULE(GUIDEDchunk)

DYNAMIC類似,執行緒仍然是執行完一個去執行下一個,但是pieces的大小越來越小(指數關係),也就意味著執行緒執行的任務的piece越來越小。

chunk指定了最小piece的數目。但是由於指數劃分的原因,有可能不相等,最後會變成等分。

舉例:

RUNTIME

!$OMP DO SCHEDULE(RUNTIME)

前三種都是在編譯時制定好執行緒分配方案,runtime執行在程式執行時更改執行緒分配方案。

3.2.5 ORDERED

DO迴圈需要被順序執行時的命令

需要在DO迴圈開始時加上ORDERD命令

CHAPTER 4 The OpenMP run-time library-包含一系列外部過程,封裝在omp_lib庫中

4.1 Execution environment routines

4.1.1 OMP_set_num_threads-並行區域中使用的執行緒數目

call OMP_set_num_thread(number_of_threads)

只能在並行區域外部被呼叫

優先順序高於OMP_NUM_THREADS這一環境變數

4.1.2 OMP_get_num_threads-正在使用的執行緒數目

integer::a

a= OMP_get_num_threads()

只能在並行區域裡被呼叫,可以在並行區域的序列區域或者nested並行區域中被呼叫

4.1.3 OMP_get_max_threads

integer::a

a=OMP_get_max_threads

可以在並行區域或者序列區域中被呼叫;

返回當前程式中最多可以使用的執行緒的數目

4.1.4 OMP_get_thread_num

integer::a

a=OMP_get_thread_num()

返回當前執行緒的標識號

4.1.5 OMP_get_num_procs

integer::a

a=OMP_get_thread_num()

返回當前程式中可以使用的核的數目

4.1.6 OMP_in_parallel-獲得當前程式是否是在並行的資訊;如果parallel region中至少有一個block是並行的,則返回.TRUE.,否則.FALSE.

logical::a

a=OMP_get_thread_num()

4.1.7 OMP_set_dynamic-若為.TRUE.,並行區域中的執行緒數可以被run-time environment自動調整

call OMP_set_dynamic(.TRUE.)

call OMP_set_dynamic(.FALSE.)

4.1.8 OMP_get_dynamic-用於判斷執行緒動態調整是否開啟,若是,返回.TRUE.,否則.FALSE.

logican::a

a=OMP_get_dynamic()

4.1.9 OMP_set_nested-設定是否允許並行。預設值為FALSE,這意味著預設情況下,巢狀的並行是以序列方式進行的。優先順序高於環境變數:OMP_NESTED

call OMP_set_nested(.TRUE.)

call OMP_set_nested(.FALSE.)

4.1.10 OMP_get_nested-獲得巢狀並行是否允許的邏輯值。

4.2 Lock routines

4.2.1 OMP_init_lock and OMP_init_nest_lock

4.2.2 OMP_set_lock and OMP_set_nest_lock

4.2.3 OMP_unset_lock and OMP_unset_nest_lock

4.2.4 OMP_test_lock and OMP_test_nest_lock

4.2.5 OMP_destroy_lock and OMP_destroy_nest_lock

4.3 Timing routines

4.3.1 OMP_get_wtime

4.3.2 OMP_get_wtick

4.4 The Fortran 90 module omp_lib

Parallel programming Open MP-Bell

CHAPTER 1

Basic directives

include a white space between the directive sentinel !$OMP and the following OpenMP directive.

conditional compilation !$

parallel region constructor

!$OMP PARALLEL

!$OMP END PARALLEL

Before and after the parallel region, the code is executed by only one thread-serial regions. (It is not allowed to jump in or out of the parallel region using GOTO command.)

master thread - when a thread executing a serial region encounters a parallel region, it creates a team of thread, and it becomes the master thread of the team.

thread number- ranges from zero, for the master thread, up to N_p-1 .

At the beginning of the parallel region it is possible to impose clauses which fix certainaspects of the way in which the parallel region is going to work: for example the scope ofvariables, the number of threads, special treatments of some variables, etc.

!$OMP PARALLEL clause 1, clause 2...

!$OMP END PARALLEL

only the following ones are allowed within the !$OMP PARALLEL directive,

PRIVATE(list)

SHARED(list)

DEFAULT(PRIVATE|SHARED|NONE)

FIRSTPRIVATE(list)

COPYIN(list)

REDUCTION(operator:list)

IF(scalar_logical_expression)

NUM_THREADS(scalar_integer_expression)

Nested parallel region-totally N_p^2+N_p messages will be printed on the screen.

!$OMP PARALLEL

WRITE(,) "HELLO"

!$OMP PARALLEL

WRITE(,) "HI"

!$OMP END PARALLEL

!$OMP END PARALLEL

CHAPTER 2 OpenMP constructs

2.1 Work-sharing constructs

restrictions

Work-sharing constructs must be encountered by all threads in a team or by noneat all.

Work-sharing constructs must be encountered in the same order by all threads in ateam.

2.1.1 !$OMP DO END DO (should be placed inside a parallel region)

!$OMP DO

do i =1,1000

...

end do

!$OMP END DO

The way in which the work is distributed and in general how the working-sharing construct has to behave can be controlled with claused.

!$OMP DO clause 1, clause 2, ...

!$OMP END DO end_clause

only the following clauses are allowed in the !$OMP DO directive

PRIVATE(list)

FIRSTPRIVATE(list)

LASTPRIVATE(list)

REDUCTION(operator:list)

SCHEDULE(type, chunk)

ORDERED

add to the closing directive the NOWAIT clause in order to avoid the implied synchronization.

If after the do-loop the modified variables have to be used, it is nescessary to add an implied or an explicit updating of the shared variables using !$OMP FLUSH directive.

using !$OMP ORDERED OMP END ORDERED

!$OMP DO ORDERED

do i=1,1000

!$OMP ORDERED

A(i)=A(i-1)

!$OMP ORDERED

end do

!$OMP END DO

. When several nested do-loops are present, it is always convenient to parallelizethe outer most one, since then the amount of work distributed over the different threadsis maximal.

2.1.2 !$OMP SECTIONS-assign to each thread a completely different task leading to an multiple programs multiple data. Each section of code is executed once and only once by a thread in the team.

syntax- each block of the code, to be executed by one of the threads, starts with an !$OMP SECTION directive and extend until the same directive is found again or until the closing-directive OMP END SECTIONS is found.

!$OMP SECTIONS clause 1, clause 2

...

!$OMP SECTION

!$OMP SECTION

...

!$OMP END SECTIONS end_clause

!$OMP SECTIONS accepts the following clauses

PRIVATE(list)

FIRSTPRIVATE(list)

LASEPRIVATE(list)

REDUCTION(operator:list)

!$OMP END SECTIONS only accepts the NOWAIT clause.

Example

!$OMP SECTIONS

!$OMP SECTION

write(,) "hello"

!$OMP SECTION

write(,) "bye"

!$OMP END SECTIONS

2.1.3 !$OMP SINGLE OMP END SINGLE-The code enclosed in this directive-pair is only executed by one of the threads in the team,namely the one who first arrives to the opening-directive OMP SINGLE.

all the remaining threads wait at the implied synchronization in the closing-directive !$OMP END SINGLE.

!$OMP SINGLE clause 1, clause 2, ...

...

!$OMP END SINGLE end_clause

end_clause can be the cluase NOWAIT or COPYPRIVATE, but not both at the same time.

Only the following two clauses can be used in the opening-directive:

PRIVATE(list)

FIRSTPRIVATE(list)

2.1.4 !$OMP WORKSHARE OMP END WORKSHARE-allow parallelizable Fortran 95 commands' parallelization.

parallelizable Fortran 95 commands, like forall and where statements, cannot be treated with OpenMP directives.

Fortran 95 transformational array intrinsic functions can be parallelized with the aid of the !OMP WORKSHARE/!$OMP END WORKSHARE directive-pair:matmul, dot product, sum, product, maxval, minval, count, any, all, spread, pack, unpack,reshape, transpose, eoshift, cshift, minloc and maxloc.

2.2 Combined parallel work-sharing constructs-specifying a parallel region that contains only one work-sharing construct 【對於有單個work-sharing的結構,可以指定一個並行區域】

2.2.1 !$OMP PARALLEL DO OMP END PARALLEL DO

!$OMP PARALLEL DO clause 1, clause 2, ...

...

!$OMP END PARALLEL DO

clause 1, clause2 可以是!$OMP PARALLEL 或者是OMP DO後面的directive

2.2.2 !$OMP PARALLEL SECTIONS OMP END PARALLEL SECTIONS-用來指定僅包含單個OMP SECTIONS OMP END SECTIONSdirective-pairs

!$OMP PARALLEL SECTIONS clause 1, clause 2, ...

!$OMP END PARALLEL SECTIONS

clause 1, clause2 可以是!$OMP PARALLEL 或者是OMP SECTIONS後面的directive

2.3 Synchronization constructs

2.3.1 !$OMP MASTER OMP END MASTER-the code enclosed inside this directive-pair is executed only by the master thread of the team. Meanwhile, all the other threads continue with their work: no implied synchronization exists!

!$OMP MASTER

...

!$OMP END MASTER

In essence, this directive-pair is similar to using the !$OMP SINGLE/!OMP END SINGLE directive-pair presented before together with the NOWAIT clause

2.3.2 !$OMP CRITICAL OMP END CRITICAL-This directive-pair restricts the access to the enclosed code to only one thread at a time

!$OMP CRITICAL name

...

!$OMP END CRITICAL name

name argument identifies the critical section. it is strongly recommended to give a name to each critical section

When a thread reaches the beginning of a critical section, it waits there until no other thread is executing the code in the critical section. Different critical sections using the same name are treated as one common critical section, which means that only one thread at a time is inside them.

all unnamed critical sections are considered as one common critical section

!$OMP CRITICAL write_file

!$OMP CRITICAL write_file

2.3.3 !$OMP BARRIER-This directive represents an explicit synchronization between the different threads in the team. When encountered, each thread waits until all the other threads have reached this point.

The !$OMP BARRIER directive must be encountered by all threads in a team or bynone at all.

it is necessary to avoid deadlock:

!$OMP CRITICAL

!$OMP BARRIER

!$OMP END CRITICAL

!$OMP SINGLE

!$OMP BARRIER

!$OMP END SINGLE

!$OMP MASTER

!$OMP BARRIER

!$OMP END MASTER

!$OMP SECTIONS

!$OMP SECTION

!$OMP BARRIER

!$OMP SECTION

!$OMP END SECTIONS

2.3.4 !$OMP ATOMIC-When a variable in use can be modified from all threads in a team, it is necessary to ensure that only one thread at a time is writing/updating the memory location of the considered variable. The present directive targets to ensure that a specific memory location is updated atomically, rather than exposing it to the possibility of multiple, simultaneously writing threads

Only the followingones can be used together with the !$OMP ATOMIC directive:

The variable x, affected by the !$OMP ATOMIC directive, must be of scalar nature and of intrinsic type.

!$OMP ATOMIC -this directive only affects the immediately following statement.

2.3.5 !$ OMP FLUSH-. This directive must appear at the precise point in the code at which the data synchronizationis required.It ensures the updating of all variables.

the !$OMP FLUSH directive offers the possibility ofincluding a list with the names of the variables to be flushed

!$OMP FLUSH (variable 1, variable 2,...)

有(顯式或者隱式)資料同步的命令:

無顯式(或隱式)資料同步的命令,隱式資料同步可以通過NOWAIT關閉

2.3.6 !$OMP ORDERED OMP END ORDERED

no thread can enter the ORDERED section until it is guaranteed that all previous iterations have been completed

the order of entrance is specified by the sequence condition of the loop iterations.

without the implied synchronization

only one ORDERED section is allowed to be executed by each iteration inside a parallelized do-loop

2.4 Data environment constructs

there are two kinds of data environment constructs

which are independent of other OpenMP constructs

which are associated to an OpenMP constructs and which effect only that OpenMP construct and its lexical extend (data scope attribute clauses)

2.4.1 !$OMP THREADPRIVATElist-its value is accessible from everywhere inside each thread and thatits value does not change from one parallel region to the next

e.g. my_id

The !$OMP THREADPRIVATE directive needs to be placed just after the declarations ofthe variables and before the main part of the software unit

can only appear in the clauses COPYIN and COPYPRIVATE.

application

CHAPTER 3 PRIVATE SHARED & Co

3.1 Data scope attribute clauses

3.1.1 PRIVATE(list)-非常耗費資源

!$OMP PARALLEL PRIVATE(a,b)

Variables that are used as counters for do-loops, forall commands or implicit do-loopsor are specified to be THREADPRIVATE become automatically private to each thread, eventhough they are not explicitly included inside a PRIVATE clause at the beginning of thescope of the directive-pair.

Variables declared as private have an undefined value at the beginning of the scope of the directive-pair, since they have just been created. Also when the scope of the directive-pair finishes, the original variable will have an undefined value (which valuefrom the available ones should it have!?).

3.1.2 SHARED(list)

!$OMP PARALLEL SHARED(c,d)

c and d are seen by all the threads inside the scope of the directive-pair.

does not consume any additional resources.

does not guarantee that the threads are immediately aware of changes made to the variable by another thread;

force the update of the shared variables by using the directive !$OMP FLUSH

avoid racing condition by programmer or !$OMP ATOMIC

3.1.3 DEFAULT(PRIVATE | SHARED | NONE)

When most of the variables used inside the scope of a directive-pair are going to be private/shared, it is possible to specify a default setting.

If no DEFAULT clause is specified, the default behavior is the same as if DEFAULT(SHARED) were specified

!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(a)

NONE:defualt語句指定為none屬性時,並行語句範圍內的所有變數都要在並行命令開始處顯式宣告屬性。例外為:do迴圈的counter, forall語句, 隱式do迴圈,以及屬性為THREADPRIVATE的變數。

3.1.4 FIRSTPRIVATE(list)-適用於需要初始值的區域性變數

屬性為PRIVATE的變數在directive-pair範圍的開始處,具有未定義的值。

!$OMP PARALLEL PRIVATE(a) FIRSTPRIVATE(b)

a的屬性為private,進入parallel region時,初始值未定義;但是b的初始值為parallel region 之前serial region的值。

非常耗費資源(變數的值需要從serial region傳給每個thread,相當於傳N倍的資料,N is the number of threads.

3.1.5 LASTPRIVATE(list)

當屬性為lastprivate時,變數的值為執行完並行語句時的值

執行完並行語句時,變數的值在不同執行緒之間進行同步。需要有顯式或者隱式的同步。

3.1.6 COPYIN(list)

屬性為threadprivate的變數可以通過COPYIN語句將變數值設定為master thread中的值。

耗費資源:需要將master thread中的值傳遞到每個thread.

如下圖,初始a為每個執行緒的id,後面通過copyin語句,使得對於每個執行緒,a的值都是0.

3.1.7 COPYPRIVATE(list)

只能用在!$OMP END SINGLE關閉指令之後

用於在!OMP SINGLE/OMP END SINGLE語句執行完之後,將屬性為private的變數broadcast到每個執行緒

!$OMP END SINGLE之後,NOWAITCOPYPRIVATE(list)語句不能同時使用

3.1.8 REDUCTION(operator:list)

確保只有一個執行緒在寫入/更新某個屬性為SHARED的變數

只適用於以下情況:

x = x operator expr

x = intrinsic_procedure (x, expr_list), 變數x必須為標量和內建型別

operatorintrinsic_procedure有:

3.2 Other clauses

3.2.1 IF(scalar_logical_expression)

在某些特定條件下開啟並行(因為某些情況開啟並行區域所耗費的時間比序列執行更長)

3.2.2 NUM_THREADS(scalar_integer_expression)

使用指定數目的執行緒

!$OMP PARALLEL NUM_THREADS(4)-此並行區域使用4個執行緒

3.2.3 NOWAIT

避免同步

使用NOWAIT時,同時關閉了隱式的同步

3.2.4 SCHEDULE(type, chunk)-chunk is optional

允許為DO迴圈指定執行緒的分配方式(不一定要均分)

!$OMP DO SCHEDULE(type,chunk)

four different options for scheduling :

STATIC

!$OMP DO SCHEDULE(STATICchunk)

如上圖,假設一共有三個執行緒的話,沒有chunk,預設chunk=200chunk取不同數值時,分配方案如下:

DYNAMIC

!$OMP DO SCHEDULE(DYNAMICchunk)

iteration space被劃分為chunk大小的pieces,當一個執行緒執行完一個pieces後,就自動去執行下一個;如果chunk沒有值,預設是1

相較於STATIC,具有更好的效能;但是增加了分配迴圈的過程,當piece越小,此過程cost越大

GUIDED

!$OMP DO SCHEDULE(GUIDEDchunk)

DYNAMIC類似,執行緒仍然是執行完一個去執行下一個,但是pieces的大小越來越小(指數關係),也就意味著執行緒執行的任務的piece越來越小。

chunk指定了最小piece的數目。但是由於指數劃分的原因,有可能不相等,最後會變成等分。

舉例:

RUNTIME

!$OMP DO SCHEDULE(RUNTIME)

前三種都是在編譯時制定好執行緒分配方案,runtime執行在程式執行時更改執行緒分配方案。

3.2.5 ORDERED

DO迴圈需要被順序執行時的命令

需要在DO迴圈開始時加上ORDERD命令

CHAPTER 4 The OpenMP run-time library-包含一系列外部過程,封裝在omp_lib庫中

4.1 Execution environment routines

4.1.1 OMP_set_num_threads-並行區域中使用的執行緒數目

call OMP_set_num_thread(number_of_threads)

只能在並行區域外部被呼叫

優先順序高於OMP_NUM_THREADS這一環境變數

4.1.2 OMP_get_num_threads-正在使用的執行緒數目

integer::a

a= OMP_get_num_threads()

只能在並行區域裡被呼叫,可以在並行區域的序列區域或者nested並行區域中被呼叫

4.1.3 OMP_get_max_threads

integer::a

a=OMP_get_max_threads

可以在並行區域或者序列區域中被呼叫;

返回當前程式中最多可以使用的執行緒的數目

4.1.4 OMP_get_thread_num

integer::a

a=OMP_get_thread_num()

返回當前執行緒的標識號

4.1.5 OMP_get_num_procs

integer::a

a=OMP_get_thread_num()

返回當前程式中可以使用的核的數目

4.1.6 OMP_in_parallel-獲得當前程式是否是在並行的資訊;如果parallel region中至少有一個block是並行的,則返回.TRUE.,否則.FALSE.

logical::a

a=OMP_get_thread_num()

4.1.7 OMP_set_dynamic-若為.TRUE.,並行區域中的執行緒數可以被run-time environment自動調整

call OMP_set_dynamic(.TRUE.)

call OMP_set_dynamic(.FALSE.)

4.1.8 OMP_get_dynamic-用於判斷執行緒動態調整是否開啟,若是,返回.TRUE.,否則.FALSE.

logican::a

a=OMP_get_dynamic()

4.1.9 OMP_set_nested-設定是否允許並行。預設值為FALSE,這意味著預設情況下,巢狀的並行是以序列方式進行的。優先順序高於環境變數:OMP_NESTED

call OMP_set_nested(.TRUE.)

call OMP_set_nested(.FALSE.)

4.1.10 OMP_get_nested-獲得巢狀並行是否允許的邏輯值。

4.2 Lock routines

4.2.1 OMP_init_lock and OMP_init_nest_lock

4.2.2 OMP_set_lock and OMP_set_nest_lock

4.2.3 OMP_unset_lock and OMP_unset_nest_lock

4.2.4 OMP_test_lock and OMP_test_nest_lock

4.2.5 OMP_destroy_lock and OMP_destroy_nest_lock

4.3 Timing routines

4.3.1 OMP_get_wtime

4.3.2 OMP_get_wtick

4.4 The Fortran 90 module omp_lib