schema-based program synthesis and the autobayes system part ii johann schumann sgt, nasa ames

29
Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Upload: kent-bushell

Post on 14-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Schema-based Program Synthesis and the AutoBayes System

Part II

Johann Schumann

SGT, NASA Ames

Page 2: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Example

• Generate a program that finds the maximum value of a function f(x): max f(x) wrt x

univariate multivariate

Note: the function might be given as a formula or a vector of data

Page 3: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Schemas for univariate optimizationschema(max F wrt X, C) :- ... as before

schema(max F wrt X, C) :- length(X, 1), % F is a vector of data points F(0..n) C = let(sequence([ assign(mymax,0), for(idx(I,0,n), if(select(F,I) > mymax, assign(mymax, select(F,I)), skip)... ]), comment([‘The maximum is found by iterating...’]), mymax).

schema(max F wrt X, C) :- length(X, 1), % instantiate numeric solution algorithm % e.g., golden section search C = ...schema(max F wrt X, C) :- ...

..

Page 4: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Schema for univariate optimization

1. build the derivative: df/dx

2. set it to 0: 0 = df/dx

3. solve that equation for x

4. the solution is the desired maximum

schema(max F wrt X, C) :- % INPUT (Problem), OUTPUT (Code fragment) % guards

length(X, 1),

% calculate the first derivative simplify(deriv(F, X), DF),

% solve the equation solve(true, x, 0 = DF, S), % possibly more checks % is that really a maximum? simplify(deriv(DF, X), DDF), (solve(true, x, 0 > DDF, _) -> true ; writeln(‘Proof obligation not solved automatically’) ), XP = [‘The maximum for‘, expr(F), ‘is calculated ...’], V = pv_fresh, C = let(assign(V, C, [comment(XP)]), V).

..

Page 5: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Demo• Generation of multiple programs

– -maxprog– -maxprog N -fastest (coarse approximation)

• Control for numeric solvers– pragma schema_control_arbitrary_init_values– pragma schema_control_use_generic_optimize

• Tracing pragmas• The necessity of constraints

Page 6: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Multivariate Optimization

• Task: minimize function F(X) wrt X

• Algorithm: double* minimze(F){ double* x0 = pick_start(); int converging = 1; while (converging){ double step_length = 0.1; double step_dir = -gradient(F,x0); x1 = x0 + step_length * step_dir; if (fabs(F(x1) - F(x0)) < 0.001) converging = 0; else x0 = x1; }}

•start somewhere•go down along the steepest slope•when you come to a flat area, return that (local) minimum•Many design decisions

• where to start?• how to move?• when to stop?

Page 7: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Multivariate Optimizationschema(max F wrt X, C) :- % IN, OUT % guards: here none

length(X,Y), Y > 1,

% divide and solve subproblemsschema(getStartValue(F,X), C_Start), % recursive schema callsschema(getStepDirection(F,X), C_Dir),schema(getStepSize(F,X), C_Size),

% assemble code segment X0=pvar_new(X), % get a new PROGRAM variable C = block([local(X0,double)], series( [ assign(X0, C_start), while_converging(X0, assign(X0, +([X0, *([C_Dir, C_Size]))) ]) ).

Page 8: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Multivariate optimization II

• The schemas generate code in an intermediate language

• procedural elements

• local variables, lambda blocks

• sum(..), while_converging(..) --> loops

X0=pvar_new(X), C = block([local(X0,double)], series( [ assign(X0, C_start), while_converging(X0, assign(X0, +([X0, *([C_Dir, C_Size]))) ]) ).

double v_0;double E;v_0 = -99;E = 1e10;while (E > 0.001){ y = sin(v_0); v_0 = V_0 - cos(v_0) * 0.01; E = fabs(y - sin(v_0)); }

generated code for max sin(v) wrt v

Important: variables in specification or program are NOT Prolog variables

Page 9: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Why schema-based synthesis?

Multiple algorithm variants can be automatically constructedThe “best” one is chosen by the user or selected via constraints

some possibilities for getStepDir

Page 10: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

AB Schema Hierarchies• Schemas to break down statistical problem

– Bayesian independence theorems -- works on Bayesian graphs

• Schemas to solve complex statistical problems– instantiate (iterative) clustering algorithms– handling of time series problems

• Schemas to solve atomic problems– instantiate PDF and maximize (symbolically)– instantiate numerical solvers (see last slides)

• auxiliary schemas– initialization of clustering algorithms– data pre-processing (e.g., [0..1] normalization)

Page 11: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

AB Schema Hierarchy• Static tree structure

• AB uses two kinds of schemas– schemas for

probabilistic problems

– schemas for formula

Page 12: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Schemas and AB Model• The AB schemas have to use all information from the input

specification, which is stored in the Prolog data base (AB model)

• Problem: schemas can modify the model, which must be undone during backtracking– add new statistical variables– remove dependencies for subproblems

• Solutions:– add model as parameters: schema(Prob, C, M_in, M_out) and

everywhere else– keep a model stack (similar to the dynamic calling environments in

procedural languages) and use backtrackable asserts/retracts

Page 13: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Backtrackable Global Stuff• Global data in Prolog are handled using assert/retract or flags.

All other data are local to each clause p(X) :- q(X,Z), r(Z). % X, Y, Z local to clause

• Asserts are not backtrackablep(X) :- assert(keep(X)), ..., fail.The “keep(X)” is kept in the data base even after backtracking

• Work-around: add global variables as parameter to all predicates (impractical)p(X, GL_in, GL_out) :- GL_out = [keep(X)|GL_in], ...

• Backtrackable bassert/bretract requires some low-level additional C-programs (but has clean semantics)

Page 14: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Schema Control• schema applicability is controlled via guards• order of application: order in Prolog file• How to enforce/avoid certain schemas

– autobayes pragmas, but that’s not really fun– doesn’t work for nested applications:

• inner loop: symbolic solutions only• outer loop: enable numeric loop

– generate them all and decide later or pick “fastest”

• schema control language is a research topic– extend declarative AB language– how to talk about selection of iterative algorithm in a purely

declarative language?

Page 15: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

The AB Infra Structure

• term utilties• rewriting engine• symbolic system:

– simplifier– abstraction (range, sign, definedness)– solver

• pretty printer (code, intermediate language)• comment generation

Page 16: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Term utilities• implemented on top of Prolog a lot of

functional-programming style predicates for– lists, sets, bags, relations– terms, AC-terms

• operations– term_substitute, subsumption, differences

between term sets

• ...

Page 17: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Rewriting Engine• A lot of stuff in AB is done using rewriting (but not

all)• small rewriting engine implemented in Prolog

– rewriting rules are Prolog clauses– conditional rewriting, AC-style rewriting– Evaluation:

• eager: apply first top-down• lazy: apply bottom up

– continuation: pure bottom-up or dove-tailing– handle for attachment of prover/constraint solver– compilation of rewriting rules for higher efficiency

Page 18: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Rewriting Rules

• Can combine pure rewriting with Prolog programming in the body of the rewrite rule

% NAME, STRATEGY, PROVER, ASSUMPTIONS, IN, OUTtrig_simplify('sin-of-0', [eval=lazy|_] ,_,_, sin(0), 0) :- !. trig_simplify('sin-of-pi-over-6',[eval=lazy|_],_,_,sin(*([1/6, pi])), 1/2) :- !. trig_simplify('cos^2+sin^2',[eval=eager|_],_,_, +(Args), +([1|Args3])) :- select(cos(X)**2, Args, Args2), select(sin(X)**2, Args2, Args3), !.

Page 19: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Compilation and Rewriting• Group and compile rewrite rules (statically) ?- rwr_compile(my_simplifications, [trig_simplify, remove_const_rules ] ).

• Call the rewriting engine rwr_cond(my_simplifications, true, S, T).

• Calling with time-out

Page 20: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Symbolic System• Symbolic system implemented on top of the rewriting

engine + Prolog code for solvers, etc• assumption-based rewriting

– X/Y -- (not(Y = 0)) --> X

• simplification (lots of rules)• calculation of derivatives (deriv(F,X) as operator)• Taylor-series expansion, ...• equation solver

– polynomial solver– Gauss-elimination for sets of linear equations– sequentialization of equation systems

Page 21: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

The AB Intermediate language

• strict separation between synthesis and code generation• small procedural intermediate language with some extensions

– sum(..), prod(..), simul_assign(..), while_converging(...)– Annotations for comments, and pre/post/inv formulas

• code generator for different languages/targets– C++/Octave– C/Matlab, C/standalone– ADA/SparkADA, Java (both “unsupported/in work/bad shape”)

• Pretty-printer to ASCII, HTML, LaTeX

Page 22: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Extending AutoBayes• some extensions are straight-forward: add text-book

formulas• additional symbolic simplification rules might be

required• adding schemas requires substantial work

– “hard-coded” schema as first step– applicability constraints and control– functional mechanisms to handle scalar/vector/matrix cases

are available– support for documentation generation– no schema language, Prolog syntax used

Page 23: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Non-Gaussian PDF• Data characteristics are modeled using

probability density functions (PDFs)

• Example: Gaussians, exponential, ...

• AB contains a number of built-in PDFs, which can be extended (hands-on demo)

• Having multiple PDFs adds a lot of power over libraries

Page 24: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Example• For clustering, often Gaussian distribution of data is used.• How about angles: 0 == 360• you get 5 clusters

• A different distribution (vonMises-Fisher) automatically solves this problem

• In AutoBayes: just replace the “gauss” by “vonmises1” -- no programming required

• multiple PDFs in one spec

Page 25: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Sample Generation• We have used:

– MODEL ---> P ---(data)--> parameters

• The model can be read the other way round: generate me random data, which are consistent with the model– MODEL ---> P ---(parameters)--> data

• Very useful for– model debugging/development– debugging and assessment of synthesized algorithms

Page 26: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

AutoBayes and Correctness• practical synthesis: forget about correct-by-

construction, but• detailed math derivations, which can be checked

externally (e.g., by Mathematica)• literature references in documentation/comments• generation of test harness and sample data• checking of safety properties (“AutoCert”)

[Cade2002 slide set]

Page 27: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

AutoBayes as a Prolog Program• AutoBayes is a pretty large program

– ~180 prolog files, 100,000LoC (with AutoFilter)

• Heavy use of– meta-programming (call, etc.)– rewriting (using an engine implemented in Prolog)– functional programming elements for all sorts of list/vector/array

handling– backtracking and backtrackable global data structures– procedural (non-logical) elements, e.g., file I/O, flags, etc.

• no use of modules but naming conventions• everything SWI Prolog + few C extensions to handle

backtrackable global counters and flags

Page 28: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

AutoBayes Weak Points• The input parser is very inflexible (uses Prolog operators)• Very bad error messages–often just “no”• no “schema language”: AutoBayes extension only by union of

Prolog/domain specialist• Only primitive control of schema selection: need for a schema-

selection mechanism• not all schemas are fully documented• large code-base, which needs to be maintained

Page 29: Schema-based Program Synthesis and the AutoBayes System Part II Johann Schumann SGT, NASA Ames

Summary• AutoBayes suitable for a wide range of data analysis tasks• AutoBayes generated customized algorithms• AutoBayes schema-based program synthesis + symbolic• logic + functional + procedural elements used • AutoBayes extension: easy to very hard• AutoBayes debugging: a pain, but explanations and LaTeX output

very helpful• AutoBayes is NASA OpenSource: bugfixes/extensions always

welcome• AutoBayes has a 160+ pages Users manual• AutoBayes useful for classroom projects to PhD projects