1 (+*), ! .1/ · [zy\ $ • emacs7# emacsx %mabq • ^x ^s u]8controlvxidgj7 • ^x ^c x u^z 3...
TRANSCRIPT
��(+"*),"��!��.1/
��������%,&- �� � ��
$'#,(+"*),".0/ .�/ 1
��������.�/ �����������
2018/5/8
:3�-i��<;,)j
1. 4 10�(��)o LKSgR2. 4 17�
l � ��&F�"%0i��j
3. 4 24�oRWPg�'=�l fOKg�#BURVYfOb]�5
4. 5 1�l A�4YfOb\gO�$F�+li>�^_cBdhYJgfhcgOj (��)
5. 5 8�l A�4YfOb\gO�$F�+2iM`TQaXfTN�j
6. 5 15�l 5 -ZNVd.F� �
RWPgYfOb\gOiljBi�j
7. 5 22�l HD�$F� �
8. 6 5�l 5 -5 .F� �ilj
9. 6 12�l 5 k5 .F� �imj
10. 6 19�?l pq�7$ilj
l PgURV9@(6
11. 7 3�?l pq�7$imj
12. 7 10�?l pq�7$inj
13. 7 17�?l RB-HC8EB?�!;�B*/1��
e[hVCIGPgURV9@i2�o2018�8 6�i j24� ��
22018/5/8
��������������-����
������� ��������� 32018/5/8
[ZY\��$
• emacs7#�X emacs �%MABQ�• ^x ^s U]8controlV XIDGJ7��• ^x ^c X ��U ^z 3��0=4&GKFS7"�*�*='��6/5(-4'V
• ^g : !*?);5,5214+'• ^k : CTHQ:<��93�0'
�/1�8&���6 �.>='
• ^y : ^k3�/1�@&�7CTHQ7�6FLT0='• ^s ��� : ���7��93��0='• ^M x goto-line : � /1�93��0='
GKFSNREPOSEUWV&U�V 42018/5/8
`_^a���,62• rm N?BU] N?BU6N?BU>�+�
• rm *~ : test.c~ 436�~%/!-LID@ION?BU>�+��"�7��5�*~ 6�5��%�.0*8"2��0%�#8+�
• ls : ��!:NCUH6��>�:�• cd NCUH] NCUH5�+:�
• cd .. : �/�6NCUH5��• cd ~ ]PYRJAVDKT5�'��%=$94'4.-2&�
• cat N?BU] N?BU6��>�:• make : �N?BU>�:ZMakefile% :2(<1*$ �1&4![• make clean : �N?BU>�+�Zclean %Makefile1��);0!4!2 �1&4![
GMFXOWESQXEZ\[�Z�[ 52018/5/8
EDCF����&3• less 6./;�B 6./;�&��-�+(cat"'���� (�%$ !�)�#�@• 48>40> : 1��41<>;• / : ��& )"���+�• q B �� ?��,�*$�$ �#�@
453=7<2:9=2?A@�?�@ 62018/5/8
��-����*:1819(63����
• C�� �! Fortran�� �/$%8�Mat-Mat-noopt-ofp.tar.gz
•+50,'71-/$%8mat-mat-noopt.bash��&4;�# lecture-flat �"lecture5-flat���(8;1�#gt00�"gt05�����"pjsub �������
• lecture-flat : �����&4;• lecture5-flat: �����&4;
,.):19(62:(<>=�<�= 72018/5/8
��-���!*:1819(64!��;>����Fortran���# <• ��!)2:.&���%
$ cd /work/gt05/t050xx$ cp /work/gt05/z30105/Mat-Mat-noopt-ofp.tar.gz ./$ tar xvfz Mat-Mat-noopt-ofp.tar.gz$ cd Mat-Mat-noopt
• ��! �$�&��$ cd C : C��&���$ cd F : ?BCDC@A��&���
• ��"��$ make
• +50,'71-&�����$$ pjsub mat-mat-noopt.bash
• �������$���&���%$ cat mat-mat-noopt.bash.oXXXX (XXXX"�)
,/):19(63:(;=<�;�< 82018/5/8
�- ����("&"'�%$�� •�������������(C�����)
N = 512Mat-Mat time = 12.511196 [sec.]21.455619 [MFLOPS]OK!N = 512Mat-Mat time = 13.501827 [sec.]19.881417 [MFLOPS]OK!
!�("'�%#(�)+*�)�* 92018/5/8
DDR4�����
MCDRAM�����
�- ����("&"'�%$�� •�������������(Fortran�����)
N = 512Mat-Mat time[sec.] = 24.4274609088898MFLOPS = 10.9890854527813OK!N = 512Mat-Mat time[sec.] = 27.0449259281158MFLOPS = 9.92553856630092OK!
!�("'�%#(�)+*�)�* 102018/5/8
DDR4�����
MCDRAM�����
MCDRAM����•����� (DDR4���)
• mpiexec.hydra -n ${PJM_MPI_PROC} ./mat-mat-noopt
• MCDRAM���• mpiexec.hydra -n ${PJM_MPI_PROC} numactl -m 1
./mat-mat-noopt
2018/5/8 ������ ��� ������� 11
/93738-65'��:?��;
• #define N 512'���+�!*$���/,1��#�(!
• #define DEBUG <�<�&!*$���-���' �����#�(!�
• MyMatMat��'��•@EFBDC�N�N��=$>'���+� %��@EFBDC�A�A��?&"'���)(!
02.938-649-:<;�:�; 122018/5/8
Fortran����!��� ������
•��������%%������ integer, parameter :: NN=512
���!� ���!�"$#"�# 132018/5/8
����
•MyMatMat�J��#K9�:HGIDH</.067���&+$)%�
•.���:HGIDH<1�"6�!��5��-/8��/.9�3+$)%��
• =H?;C1���FAE9L0&+�74'����,2����FAE9-7��(L,��&+ �+$)%��
• =H?;C068���9���!*�068:HGIDH<&+56�1,'"���067:HGIDH<1� "/$/74'�
>?=H@G<CBH<JMK�J�K 142018/5/8
kdn`,6
1. [Lrq] &�-&�!KF:G8fgi1#Z[_^HJU��H8�1#HJU��L�%Y-�BT9
2. [Lrt] &�-&�!LZmlnim\Y8i, j, k jnbKF:G�@8�%�L��:Y-OT9ILZmlnim\��R��>70HJUCW;=9
^a]mbl\hem\orp8o�p 15
�6LkcjK4AU+/u•L00: ?XQG"�J�69•L10u DSEH$<VNX=U�69•L20u ���J�69•L30u ��3 ��)HAU�69•L40u �23 ��)HAU�69(5J 'Y�)HAU9•L50u �=� ��)HAU�69�*��6YP9�Lsq��M8.�Y��AUK�AU�69
2018/5/8
���O� ���P
1. ���"#`^a]�!��@J0GBL0��$"����"^a\_VQUTRQRTXWRQX"U,WRR�
2. Kevin Dowd�"�����"#;-M<>.NAL4M1L=EN8,L0[`^aCKN/48N2FL(�!)��+ %�&'*)��$"-L6N:2F:IM9C5LM<?H72L0M3D<L"^a\_VQZRRXSYQRUQU"V,VRR�
4<1L@J0GBL0OSP"O�P 162018/5/8
����
1. !)���2. �������� 3. OpenMP���4. �*"'")�&%���,��-���OpenMP�-
5. ����6. (#+���
� �*")�&$*�,.-�,�- 172018/5/8
OpenMP����� ������
������������������ 182018/5/8
�'��!*��
• ^�w|ozx}o�3�q}w{w|ozye�iKVQTILfKVQTDFF_��4�
• "� � ,
• ���#�\HMEJ8:9C5=:<9?;=>?<\HMEJ8:<C5B@A8=:<9?;=>?>\&��� ;9:>�>�;>�
• `��g$�a• F/0\GUWYWOTB9/0e.1• F/0\GUWYWOTB9/0g-�gq}w{w|ozyb���+�sm}|~t���
• �2)g��l�dnu~• NSTPUZX5LF!*�+6F[RZST�%7]rvp}ek!*�+]• ��h(]jd�w|ozx}ol�i� cg�3�
rvp}w|ozx}o���\��� 192018/5/8
OpenMP���
��� ��� �������� 202018/5/8
OpenMP-�����• OpenMP.��<=?���-(/-9@3>;��
584A9@3>:A3CED$C�D 21
!��
PE PE PE ��<=?
PEOpenMP���4B7
OpenMP���4B7
OpenMP���4B7
OpenMP���4B7
��#�A[ ]
�,��-PE&��#�,1265⇒����)"�,� 0'+%*$ ���-��*��'+%
2018/5/8
OpenMP38• OpenMP (OpenMP C and C++ Application Program
Interface)38&��NOR���$���6KSCQM@��.?��Y1. ���
2. QBJQR
3. � ��
@#�-0<72.'
• PUE+&��KSCQM7�!,/?0;7��@�*?<72.'DTIBQ6=? ��28(>:/A'
• ��NOR���V[\Z54W6�91&HUG��7��7�%+�)�&�"+��2.'
FIDTKSCQLTCVXW&V�W 222018/5/8
OpenMPA\eUQM)#�j<Elk•SfVX���L'5[gPc]hP`We•,�E\eUQM)#�D/�• $1 C�%m8SfVX����E�'D�9• 8SfVXL*6KSfVX�'@24����L!�;KDF3[gPc]hPE��7�(1. _Nh_`d-OaVRb0EWiT+.%�7�#�%D�G�42. OpenMP@���L��@8C4[gPc^DC>?4Kj�-k
• YiX0E���FOpenMP@F@8C4• YiX0E���FMPIL�4K• &���QhZNcJ3SfVX���EI• HPF3 XcalableMPj"��k CBEQhZNc@FYiX0E���7 %=73H=��:?4C4
SZQh[gPc]hPjlk3j�k 232018/5/8
OpenMP,FK=84!��P*/TQ•���-:L>A�
• 16:L>ARBOA• T2K5OEN:C8NPAMD Quad Core Opteron(Barcelona) %4;7>@Q%
FX10:OC8NDIO<9:?HPSparc64 IXfxQ• 32V128:L>ARBOA
• HITACHI SR16000 (IBM Power7)• 32��84%64V128"�84PSMT���Q
• Reedbush (Intel Xeon E5-2695 v4, Broadwell-EP)• 3684
• 60V272:L>ARBOA• Intel Xeon Phi (Intel MIC(Many Integrated Core) %Knights Conner)
• 60��84%120V240"�84PHT���Q• Oakforest-PACS (Intel MIC, Knights Landing)
• 68��84%272"�84����• $&�PTVU��Q.0%100:L>A3#'+OpenMP.12���(��)2,��• � /EM6JH�/��(�
:C8NEM6JGN6PSQ%P�Q 242018/5/8
OpenMP����������
•!���•#pragma omp ��������
•"%&'&#$���• !$omp ��������
������������ ����� 252018/5/8
OpenMP!,91)6!��• ��,91)5!,91)6*2-49 �OpenMP�!*2-49(��&• �<Intel Fotran90,91)5
ifort -O3 -qopenmp foo.f• �<Intel C,91)5
icc -O3 -qopenmp foo.c• ��
• OpenMP! ����6:2"����• ,91)5 $%������ $&.7/0����!�����&��&������#�&
• OpenMP! ����&�"OpenMP $&.7/0���� ������'",91)5 $&�����
• �<Intel Fortran90,91)5ifort -O3 -qparallel -qopenmp foo.f
.1,928+539+;=<�;�< 262018/5/8
OpenMP7�#�!H>?N7�#• OpenMP7IPAML=BQG?N.2��.1�#�!H>?N7�#8)07H>?N=��/<-43#+
• CODE�=)� ��OMP_NUM_THREADS3��• �TOpenMP6;<�#�!H>?N,a.out7�
$ export OMP_NUM_THREADS=16$ ./a.out918
$ env OMP_NUM_THREADS=16 ./a.out• ��
• %�BQG?N7IPAML4)OpenMP6;<IPAML7�#&�,)OMP_NUM_THREADS=16.2:) 5<-4,*<S�$T• -7��8)OpenMP�6;<��7��S@RFRJDET• (CODE�#3)-7@RFRJDE6;<&���,'"�• IPAMKQA7��3��!
CGBQIPAMKQASUT)S�T 272018/5/8
OpenMP������
����� ���������� 282018/5/8
OpenMP�� %�',C��-� �*")�&$*�,/-�,�- 29
!)��0#pragma omp parallel4!)��1
5!)��2
OpenMP��
!)��0
!)��1 !)��1 !)��1…
!)��2
�(�����
�(��.,#��+�(��- �(��/ �(��3-1
�(�����
��(���3�����OMP_NUM_THREADS������
2018/5/8
OpenMP�� %�',Fortran��-� �*")�&$*�,/-�,�- 30
!)��0!$omp parallel!)��1
!$omp end parallel!)��2
OpenMP��
!)��0
!)��1 !)��1 !)��1…
!)��2
�(�����
�(��.,#��+�(��- �(��/ �(��3-1
�(�����
��(���3�����OMP_NUM_THREADS������
2018/5/8
Work sharing��• parallel���27#1���2>F?@-��*8��1$",�OpenMP-��:��*8��IBG?;BJ2��:���(parallel region).4
• ���:� ),�>F?@�-����*8��:��*8OpenMP2��:Work sharing��.4
• Work sharing��3��&'�(,��2L�%!8 1. ����-��*862
• for��Ido��J
• sections��
• single�� (master��)�0/2. parallel���.�5�9+862
• parallel for �� (parallel do��)• parallel sections���0/
>A=HCG<EDH<IKJ�I�J 312018/5/8
�������
��� ����������� 322018/5/8
For� ;do� <.1,928+539+;><�;�< 33
#pragma omp parallel forfor (i=0; i<100; i++){a[i] = a[i] * b[i];
} ��&��
for (i=0; i<25; i++){a[i] = a[i] * b[i];
}
for (i=25; i<50; i++){a[i] = a[i] * b[i];
}
��&��
.7/0&��
.7/0= .7/0> .7/03
.7/0&��
.7/02for (i=50; i<75; i++){a[i] = a[i] * b[i];
}
for (i=75; i<100; i++){a[i] = a[i] * b[i];
}
��� *��6:2���* "(�� ���%$)�#*4:-���!)�
�Fortran��&�'!$omp parallel do?!$omp end parallel do
2018/5/8
For��(��#�&��
01/726.437.9;:�9�: 34
for (i=0; i<100; i++) {a[i] = a[i] +1;b[i] = a[i-1]+a[i+1];
}
•582����� ,$���$����&,9a[i-1]�� �-"�&�����,:
for (i=0; i<100; i++) {a[i] = a[ ind[i] ];
}
•ind[i](�'*+�582���#�,�%���),
•a[ind[i]]��'� �-!�#&�$��582���#�,
2018/5/8
Sections���� ��������������� 35
#pragma omp sections{ #pragma omp section
sub1();#pragma omp section
sub2();#pragma omp section
sub3();#pragma omp section
sub4();}
sub1();����� ����� ����3����2
sub2(); sub3(); sub4();
l��������
sub1();
����� ����� ����2
sub2(); sub3();
sub4();
l��������
�Fortran�����!$omp sections�!$omp end sections
2018/5/8
Critical����
• ������1�� '!"����������
#�)$(�&%)�*-+�*�+ 36
#pragma omp critical{s = s + x;
}
s = s + x
'!", '!"- '!"3 '!"2
s = s + x
s = s + x
s = s + x
�Fortran������!$omp critical.!$omp end critical
2018/5/8
Private�� ��
&)%/*.$,+/$031�0�1 37
#pragma omp parallel for private(c)for (i=0; i<100; i++){a[i] = a[i] + c * b[i];
}
�� ��
for (i=0; i<25; i++){a[i] = a[i] + c0*b[i];
}
for (i=25; i<50; i++){a[i] = a[i] + c1*b[i];
}
�� ��
&-'( ��
&-'(2 &-'(3 &-'(3
&-'( ��
&-'(2for (i=50; i<75; i++){a[i] = a[i] + c2*b[i];
}
for (i=75; i<100; i++){a[i] = a[i] + c3* b[i];
}
���4�&-'(�� ��#������→���"!
2018/5/8
Private�����.��HK��I;?:F@E9BAF9HJI H�I 38
#pragma omp parallel for private( j )for (i=0; i<100; i++) {for (j=0; j<100; j++) {a[ i ] = a[ i ] + amat[ i ][ j ]* b[ j ];
}
•CG@� L $ �;D<>*�.�5��&)��%43!•private( j ) $,"� �;D<>* ��� j .86F=5� -�()&/#'0 100�.CG@��-,1,"!
→����$��+�,2 7BG+,3!
2018/5/8
Private�����.��HFortran��I;?:F@E9BAF9HJI H�I 39
!$omp parallel do private( j )do i=1, 100
do j=1, 100a( i ) = a( i ) + amat( i , j ) * b( j )
enddoenddo!$omp end parallel do
•CG@� K $ �;D<>*�.�5��&)��%43!•private( j ) $,"� �;D<>* ��� j .86F=5� -�()&/#'0 100�.CG@��-,1,"!
→����$��+�,2 7BG+,3!
2018/5/8
SKFIQV!����WC"$X• ���98*JTLM��;��C%1/>*Y5;��C�4+ �:��2@• �#;%1/><JTLM�:)��:90A@• reduction!����,�+7*ddot<9@����:9@4?*��� 6'�;��7�B9.9.9@
JNHVOUGRPVGWYX*W�X 40
#pragma omp parallel for reduction(+: ddot )for (i=1; i<=100; i++) {
ddot += a[ i ] * b[ i ]}
ddot; �<JER��;>#&�W(�<#&6-=3DX
2018/5/8
RJEHPU ����VFortran!#W• ���87)ISKL��:��B$0.=)X4:��B�3*�9��1?• �":$0.=;ISKL�9(��98/@?• reduction ����+�*6)ddot;�� �98?3>)����5&�:��6A8-8-8?
2018/5/8 IMGUNTFQOUFVXW)V�W 41
!$omp parallel do reduction(+: ddot )do i=1, 100
ddot = ddot + a(i) * b(i)enddo!$omp end parallel do
ddot:��;IDQ �:="%�V'�;"%5,<2CW
lbZ^jp1��+!L&�• reduction1��+!M;��)K�,A0SRQLI;�/A�=• .9)K;8_nce�T3?Q��;�/��A'D=
• ��LP>K;ddot(L7T*�DH5%I�,EQ"A:6J��O<QrFGD;�8]V`;fqeWXU��s
_g\pho[kip[rts;r�s 42
!$omp parallel do private ( i ) do j=0, p-1do i=istart( j ), iend( j )
ddot_t( j ) = ddot_t( j ) + a(i) * b(i)enddo
enddo!$omp end parallel doddot = 0.0d0do j=0, p-1
ddot = ddot + ddot_t( j )enddo
_nce LmqhT��u#�v_nce�(
�_nceIUZa_EQVpdcZ_-�T� K2�
�_nceI(=Q;oqYmJddot(L7ddot_t()T*�D;0K�$�DH@B
5%I4DCN
2018/5/8
������OpenMP��
����������������� 432018/5/8
�����������
• ����������� omp_get_num_threads()������
• ��integer (Fortran�) int (C�)
����������� "! �! 44
use omp_libInteger :: nthreads
nthreads = omp_get_num_threads()
l Fortran90���
#include <omp.h>int nthreads;
nthreads = omp_get_num_threads();
lC���
2018/5/8
������� �
• ����������omp_get_thread_num() ������
• ��integer (Fortran��)�int (C��)
��� ����� �!#"�!�" 45
use omp_libInteger :: myid
myid = omp_get_thread_num()
l Fortran90����#include <omp.h>int myid;
myid = omp_get_thread_num();
lC����
2018/5/8
������• �������omp_get_wtime()������• ��double precision (Fortran� )�double (C� )
��� ����� �!#"�!�" 46
use omp_libreal(8) :: dts, dte
dts = omp_get_wtime()����
dte = omp_get_wtime()print *, “Elapse time [sec.] =”,dte-dts
l Fortran90� ��#include <omp.h>double dts, dte;
dts = omp_get_wtime();����
dte = omp_get_wtime();printf(“Elapse time [sec.] = %lf ¥n”,
dte-dts);
lC� ��
2018/5/8
������
����� ���������� 472018/5/8
Single��• Single�� ��# ��, 6=3.-�$,�C!'1<34&�*�"+
• $'1<34&�*�"),+�(��#�%�• nowait�� ��-�,%��*�����+
150>7=/;9>/@CA�@�A 48
#pragma omp parallel forF6=3.A
#pragma omp single { 6=3.B }…}
7=/;:'�
6=3.A 6=3.A 6=3.A…
1<34'��
1<34B@812?1<34A
1<34C 1<34E
����6=3.D
�Fortran��'�(!$omp singleG!$omp end single
2018/5/8
Master��
•���/�single�����*�#•&'"�master�����)� "&��H�0+.�.�?E<7B�.��I/��$A:;G:D<=-�4�(5
•���.���� �3,�•%.&1��-24��!65
:>9F@E8CBF8HJI�H�I 492018/5/8
Flush#�• &'cdfLO�4�W U• Flush#�K��DVJ;U��OS9GO��K�4�W U:GV��O!��O�P9cdf�O�LO�4�P%;:
j$-."Pg[\]�N��DVUHA:cdfN2-."W ?6XK;M;k
• IRT9flush1����W =M;L9\g^_7K��N5EBXH.">9�0CLN*MU:
• barrier1����9critical1����O���9parallel#�O��9for9sections9single#�O��KP9�8+NflushDVJ;U:
• FlushW�<L�/P�@MU:K?UHA(;M;:
\`ZiahYebiYjlk9j�k 50
#pragma omp flush (�3LMU���O�Q) ,)FUL9JO��>�3
2018/5/8
Threadprivate��• /:13�"$58+6=2� $�(��/:13�!��*,0/!�(� )���(�
• /:13�"$�#(�)' ��� %�$���• �"�&�/:13�"$�#(9=5%��"���%�
/4.<5;-87<->@?�>�? 51
…
void main() {
…
#pragma omp parallel private (myid,
nthreds, istart, iend) {
nthreds = omp_num_threds();
myid = omp_get_thread_num();
istart = myid * (n/nthreads);
iend = (myid+1)*(n/nthreads);
if (myid == (nthreads-1)) {
nend = n;
}
kernel();
}
#include <omp.h>
int myid, nthreds, istart, iend;
#pragma omp threadprivate(istart, iend)…
void kernel() {
int i;
for (i=istart; i<iend; i++) {
for (j=0; j<n; j++) {
a[ i ] = a[ i ] + amat[ i ][ j ] * b[ j ];
}
}
}
…
/:13�$�#(�)� ��� )�parallel���!��(
2018/5/8
�������
����� ����������� 522018/5/8
IFHOVQUE;?W7>YX
• Parallel do��:?+�&RVM> W�/@Y[Z>)4XD+!=ISJK���=��W("6BA.=��X59+����D6B,
2018/5/8 ILGUMTEPNUEWYX+W�X 53
1 n
} 3>;1+�ISJK:��58RVM=�6B%�'#0��:<-;+ISJK�$�>����0�2<B
1 n
ISJK0 ISJK1 ISJK2 ISJK3 ISJK4
ISJK0 ISJK1 ISJK2 ISJK3 ISJK4
%�'#
RVM��>�CW��*X
FBEOTQSA04U-3XV
2018/5/8 FKCSLRAPMSAUWV#U�V 54
} ����;��,924#�8�/!";�)+#'.#��,97&2�8�/:57%$
1 n
} � 1#�8�/!"UHNS@D=G076V4#���JTI>?<0#�019��2�,9$
} ��3�8�/;�&�����(� *:/%9$
����
DI>403AICG/#����J�#MK• schedule (static, n)
• DI>�*[email protected]+5�����4E7:0��'��"J4E7:L�4E7:M�HHH ��&�"�B,G:F<G�� �%K�� �)&�"�(��)�n"[email protected]+5*���)�
• Schedule����*���!� �#8=-D9$�static����[email protected]+5$�DI>�/4E7:�
2018/5/8 4;1G>F/B?G/JMK�J�K 55
14E7:0 4E7:1 4E7:2 4E7:3
6:0*&)3:59%����;��><• schedule(dynamic, n)
• 6:0�",29$(#+�������� ���*7-.�����������"� ��!�n�,29$(#+"���!�
2018/5/8 */'908%419%;=<�;�< 56
1*7-.0 *7-.1 *7-.2 *7-.3
TXNEADQXSW@4�����Y,4\Z• schedule(guided, n)
• TXN�<GPW?C=F1��+ �"3GPW?C=F<�*)+2'8 ��'��+-EUHK&8�$��.1 ��<�9�0:!n3GPW?C=F<��1(:!
• GPW?C=F4��'14� �94����<EUHK�1�/-%%7,4�'GPW?4C=F32:!
• GPW?C=F5 1 3�&/0���3�*)2:!
• GPW?C=F3 1 79 ($ k <��+-� GPW? C=F5���3 k 61�*)2:' ��4GPW?5 k 79�*)2:�'#:!
• GPW?C=F'��*;0$2$� IM>TJ5 1 32:!
2018/5/8 ELBWNV@ROW@Y[Z Y�Z 57
1EUHK0 EUHK1 EUHK2 EUHK3
BE=;8:?EAD7.�����.�&�
;<9D=C7@>D7FHG#F�G 58
!$omp parallel do private( j, k ) schedule(dynamic,10)do i=1, n
do j=indj(i), indj (i+1)-1y( i ) = amat( j ) * x( indx( j ) )
enddoenddo!$omp end parallel do
l Fortran90��.�
lC��.� #pragma omp parallel for private( j, k ) schedule(dynamic,10)for (i=0; i<n; i++) {
for ( j=indj(i); j<indj (i+1); j++) {y[ i ] = amat[ j ] * x[ indx[ j ]];
}}
j-BE=.�� ("���-13�/4.+#i-BE=.�� �(��+%4'��$��-)'#�� �.��(5'2,&*0#dynamic;8:?EAD76!�
2018/5/8
pth]X\mtosWJ:?QhrWnjsW�K �• dynamic4guidedK_lsVZT^L�&J�<>�1• _lsVZT^;�@B=QG.(ens]L'>IQ;�04�#�EKUtei`c;�<>IQ5
• ��4_lsVZT^;�<B=G.(ens];�>IQ�04�#�EKUtei`c;�@>IQ5
• �,K�%KbqtcUg;6Q5• �)�K_lsVZT^K_mtdsW;�2F4_mtdsWY]b;�9Q5
• staticKOF3/�+;F<Qu��;6Qv• dynamicIHK�)�]X\mtosWL4[]akKUteti`c;�Q;4staticLUteti`cLuNGSHv!75
• �J.(��;�*GIQpth$�R-MC�F4static]X\mtosWR�8G4�P";'7 &�;6Q5
• CDA4hrWnjsWKY]bL��BQ
2018/5/8 ]fYshrWnjsWuwv4u�v 59
StaticWTV`gbfSGKC,$[afWQ�&:=N�'�• �%�-^RYc I/�;?�h*"H�-i
2018/5/8 W\Uf]eSa_fShji1h�i 60
!$omp parallel do private(S,J_PTR,I)DO K=1,NUM_SMP
DO I=KBORDER(K-1)+1,KBORDER(K)S=0.0D0DO J_PTR=IRP(I),IRP(I+1)-1
S=S+VAL(J_PTR)*X(ICOL(J_PTR))END DOY(I)=S
END DOEND DO
!$omp end parallel do
WdXZ���Gcg]hWdXZ9DGcg]��!�
Q�N?LF�(i
��F+JB)�;B43?1,$��6�&DENWdXZ9DGcg]!�
h�WdXZH1.#;B3N61��&Ecg]!�Q)�i
�%�F1�WdXZ6��<Ncg]!�FA3B1.#<N�M�BC15A1>OC,$6�&<N 0F/�C7N2��%�F,$6�F�P@B38��H/�C7E3
OpenMP����� ����������
2018/5/8 ������ ��������� 61
OpenMP),-7;198<1�*���• OpenMP���+�
parallel��0��%�(for:=7���
!�)(-"'!��• ��(OpenMP���+7;198<1234! -*&�OpenMP*7;198<1�*��!�/.-
• parallel��),-���+
private�� ��*�#����0��#(�'�51!�$->
2018/5/8 362<7;198<1?A@�?�@ 62
Private$ �!�:)2?��V3;XW• OpenMP7<+�&89?�';QUM����<+private��7��19-*>+6���:9?,• GLBQH;��<+ERFI(7��: �14��79-
2018/5/8 EKDTMSCPOTCVXW+V�W 63
!$omp parallel dodo i=1, 100
do j=1, 100tmp = b(i) + c(i) a( i ) = a( i ) + tmp
enddoenddo!$omp end parallel do
lQUM��:)2?���;�
�%91:MPANUH��816 �0@?;<+/;i-QUM��;=
/;j-QUM��<+private�%917<���:9?←ERFI(7�-"�57��←���#�:JC
/;��tmp<+private�%917<���:9?←ERFI(7�-"�57�.��←���#�:JC
Private)�&�F1>O �e?Ghf• Private)�&�F,.>O��Q!M>@J3�-/�Q1��=37A3?G1�G��Q�L>D31��I�=�08��=3VaWY���G�Q%�>O;D84O
2018/5/8 V[Uc\bT`_cTegf3e�f 64
!$omp parallel dodo i=1, 100
call foo(i,arg1,arg2,arg3,arg4,arg5, ….., arg100)
enddo!$omp end parallel do
l�I�=1�G��8�5�
1���H'�$F\`R^dX��FEO@J3private)�&�F,.>O��Q�!C9O← =7=31��I�=�GSdZd]WY8��>O
← VaWY�(�F65BK31��I�=GSdZd]WY8#*C9E:EN3 ��8�2<PO
�+��i����C�9"=B��Q�!
Private$��#�:)1B �;?8@• OpenMP7<+�&32:�"1B��<+1>6����Ushared variableV:9B
• C&(;����+Fortran90&(;common��+module��<+4;??7<����:9B• MPENTJ��:05,��<+Threadprivate�&-�%
• parallel��7)��=006,B��+4;)��7RTFQ:�&06,B��A+����:9B• 4;??7<+��!7�� �09,• /CD*.:<+��;HTK;��-�%
• �';RTFQ��D��:05)��=0D�B• �';RTFQ��D����:06+Threadprivate�&1B
2018/5/8 ILHSMRGPOSGUWV+U�V 65
Parallel��)�.�(�"-��=#)?>
• Parallel��*�do�����%��!$��%�-• ?9<6��)�����"-&do�����)��%9<6 &(fork"-2<4/� "-2;508��,������"-����-
2018/5/8 352;6:187;1=?>�=�> 66
!$omp parallel!$omp do private(j,tmp)do i=1, 100
do j=1, 100tmp = b( @ ) + c( @ ) a( i ) = a( i ) + tmp
enddoenddo!$omp end do!$omp end parallel
!$omp parallel do private(j,tmp)do i=1, 100
do j=1, 100tmp = b( @ ) + c( @ ) a( i ) = a( i ) + tmp
enddoenddo!$omp end parallel do
Parallel��)��?9<6'+ parallel do %�
Parallel��2�6 1!*5��E+2HF
• Parallel��3$do�����/�").��/(5• ��AD>2��7��),&�3$�"),�'# 105• ,-)$��AD>7��/(5�3+2�'��'�&• ��AD>1<D;��'%4$��/(0&�
2018/5/8 :=9C>B8@?C8EGF$E�F 67
do i=1, n!$omp parallel do
do j=1, n<��/(5�>
enddo!$omp end parallel do enddo
!$omp parallel do i=1, n!$omp do
do j=1, n<��/(5�>
enddo!$omp end doenddo!$omp end parallel
NYL��1�F�7PI@?E�
• 0� !63EGXNMHK@�7;%8E�• 0� !AQLYX25DC2KVMO�(ALGSXI�$<2.��"=&�6�'72 ���9=�/48E��63E• ",#@B0/:;4E
• OpenMPA����B2NYL�-�A�+B7?4• NYL�-�A�+@B2critical)���?>A��6�*
2018/5/8 KQJXRWIUSXIZ\[2Z�[ 68
!$omp parallel do private( j )do i=1, n
j = indx( i )a( j ) = a( j ) + 1
enddo!$omp end parallel do
lPI@?ERWIUT�!$omp parallel do private( j )do i=1, n
j = indx( i )!$omp critical
a( j ) = a( j ) + 1!$omp end criticalenddo!$omp end parallel do
Critical3��.!TY\8���(1/2)• �6UYDT@critical3��.!^�]SCRCHSC��@*T?evkm QU�2Q�1G��L\
• ?�1�L\TV@�%-TV_uctfq^�#L\KFSCA• IU��@��U}OU_owyjGB\A1. evkm_`geUXT;�K@critical3��.!^VML• :��)J]\lyhTOCP@+5-T@�[�PZ]NevkmUlyhKF_`geKSCYDT@_uctfq^�#L\
2. evkm:_`ge^$��• CriticalU��>�T�"T�\evkm G'\YDT@:��)L\lyh^� T4W@:��)L\lyhU=,^�#L\A
3. evkm:_`ge9�^uyoFZ�<K@7&+TL\• �{/(0TEH\ti`drx3���!
2018/5/8 enbxowaspxaz|{@z�{ 69
Critical�����#$����(2/2)• ��!1+�� : omp atomic %��
• ���.6-'(&�#$�����$7!�8
• ������� 1� "x = x op
• op: +, -, *, / , ��
2018/5/8 ,/*504)325)798�7�8 70
!$omp parallel do private( j )do i=1, n
j = indx( i )!$omp atomic
a( j ) = a( j ) + 1enddo!$omp end parallel do
OpenMP^%>J��S!$zIS|{• OpenMPT +Qvyp^��H[EOR�C• �%`pueyhsxR?D[0:QvypT;ISUUNTOpenMP�R�>M>Q>EOA=[<
1. private/��(��R�@\[���S�A-�RQ[• ��vyp@ZOpenMP��H[��;�8N�LM>[��S�A�>EOA=[
• private��uimR��^�B�\MW;fxoatRY[btyT�Q><z��S4�TrygR=[JV{
• �.H[O;jaqxdR��G2*, A5"O&Q[<PEA96LM>[@]@ZQ>SN;lnkcA��RQ[<
• 1#)}fxoatRYLMT;�7���^�H[EOANB[<IS��@Z;KX_Oprivate�F\M>[@'3H[<
2018/5/8 iofxpwdtqxdz|{;z�{ 71
OpenMPZ+BI���P&)zHP}{2. >cvgh�1 O�/ENB��Pfryix`E�=
• �0O?8cvgh#(KQ�/EWE?8cvgh��K�/E �GW@
1. 9�Pkyh\][Qpqt[_ecP�/E�B2. uymHPSPO���ENBzuym;E,B{
• 5'GWOQ?[ubtdoP�!?�2P�!?E�4ONV?OpenMPP�)KAW��Nmw`snx`Z�NC
3. 3<Ncvghmw`snx`OQ�DNB
• �.N��6-P^yjuuymZ?parallel for$�K78GW�:K�%E�UXJBWzL�YXW{
• 3<N*Q?PthreadNMPnativeNcvghAPIK"FRCETVTGB
2018/5/8 claxmw`snx`z|{?z�{ 72
��������������-���OpenMP�
������� ��������� 732018/5/8
��-���&1A8?8@/=:(OpenMP�)&���• C����('Fortran���&6+,?�
Mat-Mat-openmp-ofp.tar.gz•2<73.>846+,?mat-mat-openmp.bash�&-;B�* lecture-flat �)
lecture5-flat (������)/?B8* gt00�)gt05%� "�qsub "
$ #!��• lecture-flat : ����&-;B
• lecture5-flat: ����&-;B
2018/5/8 350A8@/=9A/CED�C�D 74
�- ���#3*1*2!/-� • ���"+3'� ��
$ cdw$ cp /work/gt05/z30105/Mat-Mat-openmp-ofp.tar.gz ./$ tar xvfz Mat-Mat-openmp-ofp.tar.gz$ cd Mat-Mat-openmp
• �������� $ cd C : C������$ cd F : 7:;<;89������
• �����$ make
• $.)% 0*&������$ pjsub mat-mat-openmp.bash
• ���������� ��$ cat mat-mat-openmp.bash.oXXXXX
2018/5/8 %("3*2!/,3!465�4�5 75
��-���#,&*&+!)(���•��������������-C �.(����OpenMP������� )
N = 2000Mat-Mat time = 1.386665 [sec.]11538.476510 [MFLOPS]OK!N = 2000Mat-Mat time = 1.386445 [sec.]11540.305945 [MFLOPS]OK!
$%",&+!)',!-/.�-�. 762018/5/8
��-���#,&*&+!)(���•��������������-Fortran �.(����OpenMP������� )
N = 2000Mat-Mat time[sec.] = 9.86477398872375MFLOPS = 1621.93274553408OK!N = 2000Mat-Mat time[sec.] = 7.95836710929871MFLOPS = 2010.46266650720OK!
$%",&+!)',!-/.�-�. 772018/5/8
����
1. MyMatMat��/<F93�&*,����"� ���• ��'6G=D=F4B@(Mat-Mat-noopt-
ofp.tar.gz)/�!"� ���OpenMP #(�,)�0I
• 5G;2B'��E>D/0&�"�,)�• 5G;2B&*-��/�.%�#� ����&*-1GFHCG4'���%�%,)��
2. MyMatMat��/�OpenMP��"����"� ���• �'6G=D=F4B@(Mat-Mat-openmp-ofp.tar.gz)/�!"� ���
• �+&1GFHCG4�<F93�%$'8AH:G4/�"� ���
2018/5/8 7;5G=F4B?G4JLK�J�K 78
��-����OpenMP����
��� ����������� 792018/5/8
��-��������OpenMP����� �• ����� ���� �
2018/5/8 ������������������ 80
#pragma omp parallel for private (j, k)for(i=0; i<n; i++) {for(j=0; j<n; j++) {for(k=0; k<n; k++) {C[i][j] += A[i][k] * B[k][j];
}}
}
��-��������OpenMP�����Fortran�• ����� ���� �
2018/5/8 ������������������ 81
!$omp parallel do private (j, k)do i=1, n
do j=1, ndo k=1, n
C(i, j) = C(i, j) + A(i, k) * B(k, j)enddo
enddoenddo
c^fX+4
1. [Ljl] [dWS�Q%A?%�-%�!GUfYF�<5RedfaeTQbf\FB7C�<�$Q,�>M6%�G :;gNhQ��;>5NF�<C50�ERedfaeT��Q,�>M6
2. [Ljl] OpenMP�<?%�-%�!GUfYF�<5[dWS�DRedfaeTQ�<�$Q,�>M6
2018/5/8 VZUe\dT`_eTgjh5g�h 82
�4Gc]bF2=N*.m•L00: :PKC"�E�46•L10m @LAD#8OIP9N�46•L20m ���E�46•L30m ��1 ��(D=N�46•L40m �/1 ��(D=N�46'3E�&Q�(D=N6•L50m �9� ��(D=N�46�)��4QJ6�Lki��H5-�Q��=NF�=N�46
�#��
3. OpenMP����������'�"���!������OpenMP�OpenACC���
2018/5/8 ���"�!���"�$&% $�% 83
����
• �� ][^Z����=G-D?I- �!������[^Y\SNRQONOQUTONU�R,TOO�
• Kevin Dowd��������� 8*J9;+K>I1J.I:BK5)I-X][^CHK,15K/CI%��&��(�"
�#$'&��!�*I3K7/C7FJ6@2IJ9<E4/I-J0A9I�[^Y\SNWOOUPVNORNR�S,SOO�
19.I=G-D?I-LPM�L�M 842018/5/8
������-�����
2018/5/8 �� ��������������� 85