formal methods for program analysis and generationengelen/researchseminar2009.pdf · 2009-11-08 ·...

FormalMethodsforProgramAnalysisandGeneration

RobertvanEngelen

10/1/09 ResearchSeminar

First,alittlestory…

Step0:School

Welearnedtoprograminschool…

Step1:College

…thentoldtoforgetwhatwelearnedandstartover…

//Assignment1:cupsof.java//Submittedby:*bucks//ShowsgoodJavacodingstyleandcommenting

importjava.lang.*;

publicclasscupsof{publicstaticvoidmain(String[]arg){//print500timessomethingtocheeraboutfor(intcount=0;count<500;count++)System.out.println(count+“cupsofjavaonthewall”);}}

Step2:Graduation

…allthewhiledoingourbesttoimpressprofessors…

Step3:Business

…tofindourdreamjob!

TheExpertsToldUs…

Carefullydesignyourprograms!“Controllingcomplexityistheessenceofcomputerprogramming.”

(BrianKernigan)

Don’thack!“Ifdebuggingistheprocessofremovingbugs,

thenprogrammingmustbetheprocessofputtingthemin.”(EdsgerW.Dijkstra)

Butdon’tfeeltoobadaboutmistakes?“Thebestthingaboutabooleanisevenifyouarewrong,youareonlyoffbyabit.”

(Anonymous)

Otherprogramminglanguagesmayoffersalvation,butwedon’tusethem“Thereareonlytwokindsofprogramminglanguages:

thosepeoplealwaysbitchaboutandthosenobodyuses.”(BjarneStroustrup)

Programming=?

=Solvingproblems?– Specifytheproblemandfinda(software)tooltosolveit…

– …ifonlywehadtoolsthatpowerfultosolveanything!•  Forcertaindomains:Excel,R,Mathematica,Maple,MATLAB,TurboTax,Garmin/TomTom,Blackboard,etc.

– …otherwise,ifwecan’tuseatoolorlibrarywedesignanalgorithm

Programming=?

=Writingcode?– Noonereallywritescode…– …wewriteabstractspecificationsthatweusuallycallprograms

– …onlythecompiler/interpreterwritescodebytranslatingourprogramintomachineinstructions

– Thecompilercomplainswhenyourspecificationhassyntacticerrorsorstaticsemanticerrors

Programming=?

=Documenting,testing,anddebugging?– Run‐timeerrorsandlogicalerrorsarenotcaughtbycompilers

– Wemustspecifyunitandregressiontests

– UnlessweuseDilbert’sagileprogramming;‐)

Programming=Specifying?

1.  Specifyanalgorithm(design)

2.  Specifyaprogram(implementation)

3.  Specifytests(testing)

Lather,rinse,repeat…

Questions

•  Howcanwell‐designedprogramminglanguagespreventprogrammingmistakes?

•  Howcanweusestaticanalysisandformalmethodstoanalyzeprogramsforerrors?

•  Howcanweformallyspecifyanalgorithmandgenerateefficientcodeforit?

SomeCommentsonProgrammingLanguageDesign

•  Principleofleastsurprise(POLS)•  UniformAccessPrinciple(UAP)forOOP•  Nopointerarithmetic,referencesareOK•  Statictyping•  Orthogonalityofprogrammingconstructs•  Exceptionhandling•  Supportforassertionsandinvariants(asinEiffel)•  Andalso…(thisdependsonthetargetapps)–  Referentialtransparency(functionalprogramming)–  Immutableobjects(functionalprogramming)

ThereareLotsofToolsoutTheretoCheckYourCode

MosttoolstargetC,C++,C#,and/orJava

Staticanalysisofsourcecode(bugsniffing): Lintandsplint GNUtoolforbug‐sniffingC PC‐lint byGimpelforbug‐sniffingC++

Modelchecking(stepsthrougheverypossibleexecutionpath): Klocwork C/C++,C#,andJavaanalysis Coverity C/C+analysis

Dynamicanalysis(detectingmemoryleaksand/orraceconditions) Valgrind,Dmalloc,Insure++,TotalView

…andmanymore

StaticAnalysis

FormalVerificationModelChecking

LogicalInference

TheoremProving

AbstractInterpretation

AxiomaticSemanticsDenotationalSemanticsOperationalSemantics

DataFlowAnalysis

StaticSemanticAnalysis

Compilers

ProgramVerifiers

Semi‐a

utom

ation

BetterProgrammingwithTools?

Aprogrammeroncewrote

for(inti=0;i<5;i++)p++;

Heprobablymeanttoincreasethepointerpby5

BetterProgrammingwithTools?

for(inti=0;i<5;i++)p++;

Fortunately,acompilerwithgoodstaticanalysiswilloptimizethistop+=5

Canitdothesameforthefollowingloop?Ifso,how?

for(inti=0;i<n;i++)p++;

BetterProgrammingwithTools?Let’srewrite

for(inti=0;i<n;i++) p++;

intothede‐sugaredform

inti=0; while(i<n){ p=p+1; i=i+1; }

isthisthesameasp=p+n?

FormalProof:AxiomaticSemantics{0<n∧p=q} weakestprecondition{0<n∧p–q=0}i=0; applyassignmentrule{i<n∧p–q=i} loopinvariantwhile(i<n){{i<n∧p–q=i} i<n∧loopinvariant{i+1<n∧p+1–q=i+1}p=p+1; applyassignmentrule{i+1<n∧p–q=i+1}i=i+1; applyassignmentrule{i<n∧p–q=i} loopinvariant}{i>n∧i<n∧p–q=i} i>n∧loopinvariant{p=q+n} postcondition

FormalProof:AxiomaticSemantics{Q[V\E]} weakestpreconditionV:=E{Q} postcondition

{(!C∨P1)∧(C∨P2) } weakestpreconditionif(C){{P1} preconditionofS1S1{Q} postcondition}else{{P2} preconditionofS2S2{Q} postcondition}{Q} postcondition

FormalProof:AxiomaticSemantics{P1} preconditionS1;{Q1} postconditions.t.Q1impliesP2{P2} preconditionS2;{Q2} postcondition

{Inv } weakestprecondition(Inv=theloopinvariant)while(C){{C∧Inv} preconditionofSS{Inv} postconditionofS}{!C∧Inv} postcondition

TheGood,theBad,andtheUgly

for(inti=0;i<5;i++) p++;

isoptimizedbyagoodcompilertop+=5;

Whenwepreferelegantcode,weshouldnothavetooptimizeitbyhandtouglyfastcode:thecompilerdoesthisforyouinmost(butnotall)cases

intgcd(inta,intb){ if(0==b) returna; returngcd(b,a%b);}

intgcd(inta,intb){ while(b!=0) { registerintt=b; b=a%b; a=t; } returna;}


Manyinefficienciescanbeoptimizedawaybycompilers

Butcompilersoptimizewithoutregardtoparallelexecution!

x=1; //compilerremovesthisdeadcode x=0;


Manyinefficienciescanbeoptimizedawaybycompilers

Butcompilersoptimizewithoutregardtoparallelexecution!

Process0 Process1 x=1; //removed if(x==1) x=0; exit(0);

SyntacticMistakesareEasytoDetectbyCompilers/StaticCheckingTools

1inta[10000];23voidf()4{5inti;67for(i=0;i<10000;i++);8a[i]=i;9}

SyntacticMistakesareEasytoDetectbyCompilers/StaticCheckingTools

1if(x!=0)2if(p)*p=*p/x;3else4if(p)*p=0;

Compilers/StaticCheckingToolsWarnAboutDataTypeUsageMistakes

4unsigneda[100]={0};56intmain()7{8charbuf[200];9unsignedn=0;1011while(fgets(buf,200,stdin))12{13if(n<100)a[n++]=strlen(buf);14}15while(‐‐n>=0)16{17printf("%d\n",a[n]);18}19return0;20}

StaticCheckingToolsWarnAboutDataCompatibilityMistakes

1x=4;2if(x>=0)3x=(x>0);4else5x=‐1;6x=x%2;

StaticCheckingToolsWarnAboutExecutionOrderMistakes

3voidout(intn)4{5cout<<n<<"\n";6}78voidshow(inta,intb,intc)9{10out(a);out(b);out(c);11}1213intmain()14{15inti=1;16show(i++,i++,i++);17return0;18}

StaticCheckingToolsWarnAboutArithmetic/LogicMistakes3voidprint_mod(inti,intn)4{5if(n==0&&i==0)return;6printf("%dmod%d==%d\n",i,n,i%n);7}89intmain()10{11for(inti=0;i<10;i++)12for(intj=0;j<10;j++)13print_mod(i,j);14return0;15}

StaticCheckingToolsWarnAboutLoopIndexMistakes

1inta[10];23i=0;4for(i=0;i<10;i++)5sum=sum+a[i];6weighted=a[i]*sum;

StaticCheckingToolsWarnAboutDataFlowMistakes

1intshamrock_count(intleaves,doubleleavesPerShamrock)2{3doubleshamrocks=leaves;4  shamrocks/=leavesPerShamrock;5returnleaves;6}78  intmain()9  {10printf("%d\n",shamrock_count(314159,3.14159));11return0;12}

MoreDifficult:IncorrectAPILogic

1intmain()2{3FILE*fd=fopen("data","r");4charbuf[100]="";5if(getline(fd,buf))6fclose(fd);7printf("%s\n",buf);8}910intgetline(FILE*fd,char*buf)11{12if(fd)13{14fgets(buf,100,fd);15return0;16}17return1;18}

MoreDifficult:DynamicTyping/DataFlowMistakes

1classBankAccount23defaccountName4@accountName="JohnSmith"5end67defdeposit8@deposit9end1011defdeposit=(dollars)12@deposit=dollars13end1415definitialize()16@deposet=100.0017end1819deftest_method20puts"Theclassisworking"21putsaccountName22end23end


int[]a=newint[10];i=0;

while(i<10){

…a[i]…

i=i+1;}

Whatistherangeofiateachprogrampoint?

Istherangeofisafetoindexa[i]?


int[]a=newint[10];i=0;

while(i<10){

…a[i]…

i=i+1;}

i=[0,0]

i=[0,0]⊓[‐∞,9]=[0,0]i=[0,0]⊔[1,1]⊓[‐∞,9]=[0,1]…useaccelerationtodeterminefiniteconvergencei=[0,9]

Definealattice:[a,b]⊔[a’,b’]=[min(a,a’),max(b,b’)][a,b]⊓[a’,b’]=[max(a,a’),min(b,b’)]

i=[1,1]⊓[‐∞,9]=[1,1]i=[1,1]⊔[2,2]⊓[‐∞,9]=[1,2]…useaccelerationtodeterminefiniteconvergencei=[1,10]

i=[1,10]⊓[10,+∞]=[10,10]

p0:

p1:

p2:

p3:

ModelChecking

while(x>0){useresourcex=x+1;}sleep;

Process1

x=0;useresourceforever

Process2

Q:startingwithx>1,willp1everconcurrentlyexecutewithq1?

Q:willexecutionreachastatewherexstays0?

p0:p1:

p2:

q0:q1:

p0,q0,x>0

p1,q0,x>0

p1,q1,x>0

p0,q1,x>0

p1,q1,x>0

p0,q1,x=0

p2,q1,x=0

Model(usingabstracttraceswherexispositiveor0)

RelatedExamples

for(i=1;i<N/2;i++){k=2*i–1;assert(k>=0&&k<N);a[k]=…

for(i=1;i<N/2;i++){assert(i>0&&2*i<N+1);k=2*i+1;a[k]=…

Cwithassertion Programmermovedassertion

Canwemovetheassertionbeforetheloop?Ifso,how?

RelatedExamples

Eiffel“designbycontract”indexing...classCOUNTERfeature...invariant

item>=0

end

decrementis‐‐Decreasecounterbyone.requireitem>0doitem:=item‐1ensureitem=olditem‐1end

Methodsmustobeyinvariant

GeneratingCorrectandEfficientCodefromHigh‐LevelSpecifications

Ctadel

parallelHPFcode

(self-)commuting operators. Note that some of the CSEs

require an index substitution, denoted as [i ← a] wherethe i-index is to be replaced with expression a. The nota-tion

!bi=i+c (where e.g.

!=

") denotes the use of two

distinct i-indexes: a local i-index for the aggregate opera-tion (e.g. summation) from the current value of the global

i-index in the outer context of the operation increased by c,ranging up to b. That is,

!bi=i+c represents an exclusive

scan or reversed prefix operation. By introducing this no-

tational convention, reduction and scan operations can be

easily distinguished and optimized accordingly.

The presented list of CSEs is not exhaustive. In addition

to the CSEs shown in Table 1, other CSEs can be found by

using any combinations of the listed ‘primitive’ CSEs. For

example, let E1 = !t(!t(ui,j,k, i = 1 . . n), j = 1 . . m)andE2 = !t(ui,j,k+1, j = 1 . . m) be two (sub)expressions.Assume that the fft-operator is declared as an instance of the

self-commuting operator class. Then, with DICE we obtain

E1 = !t(E2[k←k−1], i = 1 . . n) thereby saving an FFToperation. Upon removing CSEs, storage is allocated for

the CSEs in the form of temporary variables.

3. Reduction and Scan Optimization

The optimization of reduction and scan operations for

efficient serial and parallel computing is an example of a

combined use of symbolic algebra techniques for commu-

nication optimization. The optimization techniques are il-

lustrated by means of a problem example. The examples

are simplified problems derived from the documentation of

the HIRLAM weather forecast model [5] and are illustrative

for the type of problems solved by CTADEL. Consider

p =# 1

0

# 1

y

!u

!xdy dz ∀(x, y) ∈ "x,y (1)

where p(x, y) and u(x, y, z) are dependent variables, x, y, zare the independent variables on the domain " = [0, 1]3.Discretization of Eq. (1) using finite differences yields

pi,j =!$

k=1

m$

j=j+1

1h

(ui+1,j,k − ui,j,k) ∀(i, j) ∈ D(p)

(2)

where D(p) is the discretized domain or grid of p and theu and p fields have effectively become functions of the dis-crete (i, j, k)-grid with domain [1, n]× [1, m]× [1, "] ⊆ ZZ3

.

Here, the double integration is replaced with a double sum-

mation using the midpoint quadrature formula, and the par-

tial derivative is replaced by a difference quotient assuming

grid-point distance h in the x-direction.Eq. (2) can be simplified using an algebraic simplifier.

For algebraic simplification in CTADEL, we use the built-in

GPAS algebraic simplifier, giving

pi,j =1h

!$

k=1

m$

j=j+1

(ui+1,j,k − ui,j,k) (3)

Note that the result is just slightly different from Eq. (2)

while at least " multiplications are saved.Although the expression is algebraically simpler, the

RHS of Eq. (3) is still not optimal for generating parallel

code. Assume that the (i, j, k)-grid domain of the problemis block-wise distributed in the j-direction. Then, excessivecommunication results because the parallel scan operation

is executed before the serial reduction. When the summa-

tions are interchanged, which is an algebraically valid trans-

formation, the data volume communicated in the parallel

scan will be significantly reduced, see Fig. 1.

!k j!

step 1:"m

j=j+1step 2:

"!

k=1

" (interchange")

!k j!

step 1:"!

k=1step 2:

"m

j=j+1

Figure 1. Reducing communication between

two processors P0 and P1 by interchanging

the order of the summations.

The interchange of the summations is automatically per-

formed by GPAS while in other SACs this can only be ac-

complished by writing ad-hoc procedures. To this end,

GPAS uses the abstract notion of commutativity defined by

the class of commuting operators as briefly mentioned in

Section 2.1. More specifically, two operator instances of

this class commute if and only if an explicit commutativity

relationship between the operators is defined by the system

or (re)defined by the user. The commutativity relationships

induce a default functional composition order on the oper-

ator instances of the class of commuting operators. This

way, the application order can be controlled while still al-

lowing the functional composition of the operators to be

interchanged which is necessary for application of rewrite

3

PROGRAM P1REAL u(0:n+1,m,l),g(0:n+1),h,p(0:n+1,m),q(0:n+1)REAL s(0:n+1,0:m,l),t(0:n+1,m)...DO 2330 j = m,1,-1FORALL(i=0:n+1,k=1:l) s(i,j-1,k)=s(i,j,k)+u(i,j,k)

2330 CONTINUECMIC$ PARALLEL ...CMIC$ CASE

DO 2340 k = 1,lFORALL(i=1:n+1,j=1:m) t(i,j)=s(i,j,k)+t(i,j)

2340 CONTINUECMIC$ CASE

FORALL(i=1:n) q(i)=g(i)*(s(i,0,l)-s(i-1,0,l))/hCMIC$ END CASECMIC$ END PARALLEL

FORALL(i=1:n+1,j=1:m) t(i,j)=t(i,j)/hFORALL(i=1:n,j=1:m) p(i,j)=t(i+1,j)-t(i,j)

PROGRAM P2REAL u(0:n+1,m,l),g(0:n+1),h,p(0:n+1,m),q(0:n+1)REAL s(0:n+1),t(0:n+1,m),T1(1:n+1,m)...

CMIC$ PARALLEL ...CMIC$ CASE

DO 2270 j = 1,mFORALL(i=0:n+1) s(i)=s(i)+u(i,j,l)


DO 2280 k = 1,lFORALL(i=1:n+1) T1(i,m)=0DO 2290 j = m,2,-1FORALL(i=1:n+1) T1(i,j-1)=T1(i,j)+u(i,j,k)

2290 CONTINUEFORALL(i=1:n+1,j=1:m) t(i,j)=t(i,j)+T1(i,j)

2280 CONTINUECMIC$ END CASECMIC$ END PARALLELCMIC$ PARALLEL ...CMIC$ CASE

FORALL(i=1:n) q(i)=g(i)*(s(i)-s(i-1))/hCMIC$ CASE

FORALL(i=1:n+1,j=1:m) t(i,j)=t(i,j)/hCMIC$ END CASECMIC$ END PARALLEL

FORALL(i=1:n,j=1:m) p(i,j)=t(i+1,j)-t(i,j)

PROGRAM P3REAL u(0:n+1,m,l),g(0:n+1),h,p(0:n+1,m),q(0:n+1)REAL s(0:n+1),t(0:n+1,m)...

CMIC$ PARALLEL ...CMIC$ CASE

DO 2300 j = 1,mFORALL(i=0:n+1) s(i)=s(i)+u(i,j,l)


DO 2310 j = m,2,-1FORALL(i=1:n+1) t(i,j-1)=t(i,j)DO 2320 k = 1,lFORALL(i=1:n+1) t(i,j-1)=t(i,j-1)+u(i,j,k)

2320 CONTINUE2310 CONTINUECMIC$ END CASECMIC$ END PARALLELCMIC$ PARALLEL ...CMIC$ CASE

FORALL(i=1:n) q(i)=g(i)*(s(i)-s(i-1))/hCMIC$ CASE

FORALL(i=1:n+1,j=1:m) t(i,j)=t(i,j)/hCMIC$ END CASECMIC$ END PARALLEL

FORALL(i=1:n,j=1:m) p(i,j)=t(i+1,j)-t(i,j)

Figure 2. Three alternative programs, P1, P2,

and P3, for solving the example problem.

Program P1 corresponds to Eq. (7), P2 to Eq. (8), and P3

to Eq. (9). Note that the type of source-code level transfor-

mations required to perform analogous transformations be-

tween the programs shown in Fig. 2 requires a very power-

ful restructuring compiler capable of recognizing reduction

and scan operations and capable of data-structure transfor-

mations for reshaping array variables (array s in Fig. 2).

With n = 254, m = 255, and ! = 20, performance re-sults on a HP 712/60, SGI Indy, CRAY-C98, and a MasPar-

MP1 (SIMD, 1024 PEs) are shown in Table 2.

On the HP 712, page swapping significantly degrades

performance of program P1; P1 requires the largest amount

of memory. On CRAY-C98 with 2 CPUs, the parallel

sections are executed in parallel, but unfortunately, over-

head prohibits performance gain with respect to 1 CPU. For

P1 P2 P3

HP 712/60 2.30 0.502 0.361

SGI Indy 0.622 0.553 0.362

CRAY-C98, 1 CPU 0.011 0.012 0.006

2 CPUs 0.013 0.014 0.008

MasPar-MP1, i-distrib. 0.523 0.340 0.179

j-distrib. 237. 243. 164.

Table 2. Total elapsed time (sec).

the MasPar code with the j-direction distributed, excessivecommunication between the front-end and the DPU proces-

sor mesh results due to the fact that the j-reduction/scanare not parallelized. The performance is disappointingly

poor, especially for P1 and P2 in which the scan over re-

duction optimization is prohibited by scan-reduction reuse

(P1) or not applied (P2). Program P3 has the best perfor-

mance compared to programs P1 and P2 on all platforms

investigated; P3 is also the program generated by CTADEL.

4. Conclusions

In this paper we have briefly described methods for gen-

erating efficient codes for PDE-based problems. The pre-

sented techniques have been used for the generation of effi-

cient codes for the HIRLAM weather forecast system using

a prototype version of the CTADEL application driver. The

optimization of multiple reductions and scans, however, is

a new approach. Since parallel reductions and scans re-

quire expensive collective communications, the presented

approach may yield a significant speedup of the generated

code. This is mainly due to the use of a high-level prob-

lem description in which all of the high-level information

is readily available for exploitation by the algebraic simpli-

fier and common-subexpression eliminator within CTADEL.

This information may be difficult to retrieve or may even

be lost in the low-level architecture-specific code, thereby

hampering the effective use of a restructuring compiler to

obtain similar results.

References

[1] G.O. Cook, Jr. and J.F. Painter. ALPAL: A Tool to Gener-

ate Simulation Codes from Natural Descriptions, Expert Sys-

tems for Scientific Computing, E.N. Houstis, J.R. Rice, and

R. Vichnevetsky (eds), Elsevier, 1992.

[2] R. van Engelen and L.Wolters. A Comparison of Parallel Pro-

gramming Paradigms and Data Distributions for a Limited

Area Numerical Weather Forecast Routine, proc. of the 9th

ACM Int’l Conf. on Supercomp.:357–364, ACM Press, New

York, July 1995.[3] R. van Engelen, L. Wolters, and G. Cats. Ctadel: A Generator

of Multi-Platform High Performance Codes for PDE-based

Scientific Applications, proc. of the 10th ACM Int’l Conf. on

Supercomp.:86–93, ACM Press, New York, May 1996.

5

GeneratingCorrectandEfficientCodefromHigh‐LevelSpecifications

FLAME librarycode

SomeConcludingRemarks

•  Staticcheckingrequiresallcompilerstagesexcepttheback‐end

•  Staticcheckersfinddeviationsfrom“bestpractices”

•  Staticcheckersmayfindmanyfalsealarms•  Thereareveryfewcheckersforscriptinglanguages(Perl,Python,Ruby,…)

•  Functionallanguages(Haskell,ML,…)aregenerallysafer,butdonotpreventlogicalprogrammingerrors

formal methods for program analysis and generationengelen/researchseminar2009.pdf · 2009-11-08 ·...

Documents