machine learning for automated reasoning: an overview

Introduction Background Different approaches Latest successful projects Conclusion

Machine Learning for Automated Reasoning: AnOverview

Vincenzo Lomonaco

Alma Mater Studiorum - University of Bologna

[email protected]

January 27, 2015

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview


Index

1 Introduction

2 BackgroundITPs and ATPsMachine learning

3 Different approachesML for premises selectionML for heuristics selection

4 Latest successful projectsML4PGMaShMaLAReaMaLeCoPMaLeS

5 Conclusion




Summary

In recent years, development of interactive and automated theo-rem provers has led to creation of big data sets of formal mathemat-ical libraries and varied infrastructures for proofs and software/hardwareverification.

At the same time, machine learning techniques has shown to per-form well on a large number of tasks in the field of artificial intelli-gence and Automated Reasoning.

In this talk we cover a number of successful approaches that aim toexploit this increasing amount of data, learning inductively fromprevious proofs.




Introduction I

In Principia Mathematica [18], Whitehead and Russell set outto show by example that all of mathematics can be derivedfrom a small set of axioms using an appropriate logicalcalculus.

Even though Godel later showed that no effectively generatedconsistent axiom system can capture all mathematical truth[6], Principia Mathematica showed that most of normalmathematics can indeed be catered by a formal system.

With the advent of computers, formal mathematics became amore realistic proposal




Introduction II

In the last few decades the exponential raise in computerpower and Computer commodities has lead to an increasinginterest and hope in interactive and automated theoremproving (ITP and ATP) softwares resumable in the strongquote by Art Quaife [16] in 1992:

The time will come when such crushers asRiemann’s hypothesis and Goldbach’s conjecture willbe fair game for automated reasoning programs. Forthose of us who arrange to stick around, endless funawaits us in the automated development andeventual enrichment of the corpus of mathematics.




Introduction III

Before the pioneer work of Josef Urban applying first-orderlogic ATP methods on large corpus of formal mathematicalproofs (Mizar Mathematical Library also known as MML) in2003 [22] the field was slowing down.

Then, an increasing number of projects about linking ITPlibraries to ATP emerged and led to a new hope.

Last recent advances in the fields of Artificial Intelligence(AI) and Machine Learning (ML) are now shaping the way ofthinking about theorem proving and automated reasoning ingeneral.




Introduction IV

The novel idea

The novel idea is to take statistical inferences about previous proofsinto consideration and merge this kind of inductive reasoning withthe classical deductive reasoning used in ATP and ITP.




Background

In this section we provide a brief background for covering both as-pects of Machine Learning and Theorem proving.




ITPs and ATPs

ITPs

Interactive theorem provers (ITP), or proof assistants, arecomputer programs that support the creation of formal proofs.Proofs are written in the input language of the ITP, which canbe thought of as being at the intersection between aprogramming language, a logic, and a mathematicaltypesetting system.

ACL2 [10], Coq [3], HOL4 [21], HOL Light [8], Isabelle [13],Mizar [7], PVS [15] and Matita [2] are perhaps the mostwidely used ITPs.




ITPs and ATPs

ATPs

In contrast to interactive theorem provers, automatedtheorem provers (ATPs) work without human interaction.They take a problem as input, consisting of a set of axiomsand a conjecture, and attempt to deduce the conjecture fromthe axioms.

E [19], SPASS [25], Vampire [17], and Z3 [5] are well-knownATPs for classical first-order logic.




Machine learning

Machine Learning I

Machine learning concerns itself with extracting informationfrom data [1].The results of a learning algorithm is aprediction function that takes a new datapoint and returns atarget value.

Features are the input of the prediction function and shoulddescribe the relevant attributes of the datapoint. A datapointcan have several possible feature representations. Featureengineering concerns itself with identifying relevant features[12].




Machine learning

Machine Learning II

From a mathematical point of view, most machine learningproblems can be reduced to an optimization problem:

Let D ⊆ X × T be a training dataset consisting of datapointsand their corresponding target value.

Let ϕ : X → Ω be a feature function that maps a datapoint toits feature representation in the feature space Ω (usually asubset of Rn for some n ∈ R).

Furthermore, let F ⊆ (Ω→ T ) be a set of functions that mapfeatures to the target space and s a (convex) score functions : D × F → R.




Machine learning

Machine Learning III

One possible goal is to find the function f ∈ F thatmaximizes the average score over the training set D.

The main differences between various learning algorithms arethe function space F and the score function s they use.




Different approaches I

The AI fields of deductive reasoning and inductive reasoning (rep-resented by machine learning, data mining, knowledge discovery indatabases, etc.) have so far benefited relatively little from eachother’s progress.

This is an obvious deficiency in comparison with the human mind,which can both inductively suggest new ideas and problem solu-tions based on analogy, memory, statistical evidence, etc., and alsoconfirm, adjust, and even significantly modify these ideas and prob-lem solutions by deductive reasoning and explanation, based on theunderstanding of the world.




Different approaches II

In recent years, a number of different actions and approacheshave been taken in this direction. We can categorize them in twomain branches:

ML for premises selection

ML for heuristics selection






Premise selection can be useful as a standalone service for the ITPs(suggesting relevant lemmas), or in conjunction with ATP methodsthat can attempt to find a proof from the relevant premises.





Guideline

In the training phase, the learning algorithm is allowed to learn fromthe proofs of all previously proved theorems. For all theorems in thetraining set, their corresponding dependencies should be ranked ashigh as possible. I.e., the score function should optimize the ranksof the premises that were used in the proof.

To do this all learning algorithms require a set of features as inputdata codified as a real vector. Therefore a method is needed totranslate formula trees into real vectors that tries to characterizethe formula.





Dependencies graph and Formula Tree examples





Features to use

The symbols that appear in a formula can be seen as itsbasic characterization and hence a simple approach is to takethe set of symbols of a formula as its feature set.

The symbols correspond to the node labels in the formula tree.In addition to the symbols, one can also include as featuresthe subterms and subformulas of the formula to prove.

Since the formalisms supported by the vast majority of ITPsystems are typed (or sorted) adding the types that appear inthe formula tree as additional features is reasonable.

Adding the feature vectors of some of the last previouslyproved theorems to the feature vector of the conjecture, in aweighted fashion, is a way to add information about thecontext.





Math point of view

The problem could be seen as a classification problem where for eachpremise p ∈ Γ we learn a real-valued classifier function:

Cp(·) := Γ→ R (1)

which, given a conjecture c , estimates how useful p is for proving c .The premises for a conjecture c ∈ Γ are then ranked by the valuesof Cp(c).






Automated theorem proving is a search problem. Many differentapproaches exist, and most of them have parameters that can betuned. Examples of such parameterizations are clause weightingand selection schemes, term orderings, and sets of inference andreduction rules used.

A specific choice of parameters defines a search strategy. Thechoice of a strategy can often make the difference between findinga proof in a few milliseconds or not at all.





Guideline

The strategy selection problem consists of three subproblems:

Finding a good set of preselected strategies <.

Defining features Ω which are easy to compute (via a featurefunction ϕ , but also expressive enough to distinguish differenttypes of problems.

Determining a method which given the features of a problemcreates a strategy schedule.





Math point of view

Machine learning in this case is applied to predict the runtime ofan ATP over a specific class of problems in order to automaticallychoose the best suitable strategy for a given unknown problem. Foreach strategy s in the preselected strategies S , we are searching fora function:

ρs : P → R (2)

such that for all problems p ∈ P the predicted values are close tothe actual runtimes: ρs(p) ∼ τ(p, s).




Latest successful projects I

ML4PG (machine learning extension for Proof General) [9] isan interactive tool that provides statistical proof hints duringthe process of Coq/SSReflect proof development.

MaSh (Machine Learning for Sledgehammer) [11], now partof the default Isabelle installation, offers an alternative toMePo (default relevance filter in Sledgehammer) by learningfrom successful proofs.

MaLARea (Machine Learner for Automated Reasoning) [23]is a metasystem, which turns out to have so far the bestperformance on large theory benchmarks like the MPTPChallenge and MPTP2078.




Latest successful projects II

MaLeCoP (Machine Learning Connection Prover) [24] is anevolution of MaLARea where the learned knowledge is usedfor guiding the proof search mechanisms inside a modifiedversion of leanCoP [14].

MaLeS (Machine Learning of Strategies) [11] is a frameworkthat develops strategies for ATPs and creates suitableschedules of strategies for individual problems.




ML4PG

ML4PG

ML4PG is an extension to Proof General (an Emacs based genericinterface for theorem provers) that uses state-of-the-art machinelearning techniques to interactively find proof patterns from Coqand SS-Reflect proofs.




ML4PG

How it works

It works on the background of Proof General, and extractssome simple,low-level features from interactive proofs inCoq/SSReflect;

On user’s request, it sends the gathered statistics to a chosenmachine-learning interface and triggers execution of aclustering algorithm of the user’s choice;

It does some gentle post-processing of the results given by themachine-learning tool, and displays families of relatedproofs to the user.




ML4PG

Extracted Features: An example I




ML4PG

Extracted Features: An example II

Every machine learning engine has its concrete format to representfeature vectors; therefore, it is necessary to define translators toadapt ML4PG’s internal encoding of feature vectors to the concreterepresentation of the machine learning engine.




ML4PG

ML engine

ML4PG engine is flexible to use all sorts of learning algorithms. Upto now, it has been connected ML4PG to a variety of clusteringalgorithms a family of unsupervised learning methods. Clusteringtechniques divide data into n groups of similar objects (called clus-ters), where the value of n is provided by the user.

The ML4PG user can interactively select different clustering algo-rithms available in Matlab and Weka.




MaSh

MaSh

MaSh, offers an alternative to MePo by learning from successfulproofs and not only ranking relevant promises based on syntacticsimilarity.




MaSh

MaSh’s heart

MaSh’s heart is a Python program that implements a custom ver-sion of a weighted sparse naive Bayes algorithm that is fasterthan the naive Bayes algorithm implemented in the SNoW [4]. ThisPython program is used within a Standard ML module that inte-grates machine learning with Isabelle. MaSh follows the ”four zeros”philosophy meaning:

”Zero-configuration”

”Zero-click”

”Zero-maintenance”

”Zero-overhead”.




MaSh

features used I

For each term in the formula, excluding the outer quantifiers, con-nectives, and equality, the features are derived from the nontrivialfirst-order patterns up to a given depth. Variables are replaced bythe wildcard (underscore). Given a maximum depth of 2, the termg (h x a), where constants g , h, a originate from theories T , U, V ,yields the patterns:

T .g( ) T .g(U.h( ; )) U.h( ; ) U.h( ;V .a) V .a

which are simplified and encoded respectively into the features:

T .g T .g(U.h) U.h U.h(V .a) V .a




MaSh

features used II

Types, excluding those of propositions, Booleans, and functions, areencoded using an analogous scheme.

Type variables constrained by type classes give rise to features cor-responding to the specified type classes and their superclasses.

Finally, various pieces of metainformation are encoded as features:the theory to which the fact belongs; the kind of rule (e.g., introduc-tion, simplification); whether the fact is local; whether the formulacontains any existential quantifiers or λ-abstractions.




MaSh

Results

It was found that MaSh outperforms MePo on different datasets andtheir combination (as a ensemble model) increases the number ofsolved problems in the Judgement Day benchmark by 4.2% [11].




MaLARea

MaLARea

The closed loop between using deductive methods to find proofs,and using inductive methods to learn from the existing proofs andsuggest new proof directions, is the main idea behind the MaLAReametasystem.




MaLARea




MaLARea

ML in MaLARea

There are many kinds of information that such an autonomous meta-system can try to use and learn. The second version of MaLAReaalready uses also structural and semantic features of formulas fortheir characterization and for improving the axiom selection.

Successful runs provide additional data for learning (useful for solvingrelated problems), while unsuccessful runs can yield countermodels,which can be re-used for semantic pre-selection and as additionalinput features for learning.




MaLARea

high-level approach

The communication between learning and the ATP systems is high-level: The learned relevance is used to try to solve problems withvaried limited numbers of the most relevant axioms.

Pro:

MaLARea gives a generic inductive (learning)/deductive(ATP) metasystem to which any ATP can be easily pluggedas a blackbox (E and SPASS by default).

Con:

it does not attempt to use the learned knowledge for guidingthe ATP search process once the axioms are selected.




MaLeCoP

MaLeCoP

While in MaLARea learning-based axiom selection is done outsideunmodified theorem provers, in MaLeCoP the learning-based selec-tion is done inside the prover, and the interaction between learningof knowledge and its application is much finer.




MaLeCoP

General architecture




MaLeCoP

ML in MaLeCoP I

The basic learning in MaLARea is used to associate conjecture sym-bols with premises used in the conjecture’s proof. This learningmode can be easily reproduced by MaLeCoP.

For learning clause selection on branches, instead, can be usedanother information supplied by the prover: successful clause choicesdone for particular paths in the proof.




MaLeCoP

ML in MaLeCoP II

The information extracted from subtrees also contains the cost(again in terms of inference numbers) of finishing the subtree.

In the original project the authors did not use this information yet inlearning, however They plan to use learning on this data for graduallyovercoming the most costly bad clause choices.




MaLeS

MaLeS

MaLeS is a framework that develops strategies for automatedtheorem provers (ATPs) and creates suitable schedules of strate-gies for individual problems. The framework can be used in a push-button way to develop such strategies and schedules for an arbitraryATP.




MaLeS

MaLeS Solutions

With respect to the three main subproblems inherent the strategyselection problem, MaLeS:

Perform a stochastic local search by taking previouslyhuman-defined strategies as starting points of the search tofind a set of good preselected strategies.

Choose to use the well-known set of features designed bySchulz for clause-normal-form and first order problems todescribe well each problem.

Uses kernels to learn the runtime prediction function andschedule the strategies coherently.




MaLeS

Features used




MaLeS

ML in MaLeS I

Kernels are a very popular machine learning method that has suc-cessfully been applied in many domains [20]. A kernel can be seenas a similarity function between feature vectors.

The kernel used in this project is the well-known Gaussian kernelk with parameter σ of two problems p, q ∈ P with feature vectorsϕ(p), ϕ(q) ∈ Ω ⊆ Rn for some n ∈ N is defined as:

K (p, q) := exp

(−ϕ(p)Tϕ(p)− 2ϕ(p)Tϕ(q) + ϕ(q)Tϕ(q)

σ2

)(3)




MaLeS

ML in MaLeS II

Let t ∈ R be a time limit. For each preselected strategy s ∈ S ,the ATP is run with strategy s and time limit t on each problem inPtrain. For each strategy Ps

train ⊆ Ptrain is the set of problems thatthe ATP can solve within the time limit t with strategy s. In kernelbased machine learning, the prediction function s has the form:

ρs(p) =∑

q∈Pstrain

αsqK (p, q) (4)

Then, having defined the prediction functions, for each new prob-lem, MaLeS uses the prediction functions to select the strategy andruntime that is most likely to solve the problem. If the predictedstrategy does not solve the problem, MaLeS updates all predictionfunctions with this new information.




Conclusion I

In this talk, we have been discussing a rapidly emerging researchtrend that aims to bring machine learning to theorem proving and,more in general, to automated reasoning.

Early results are promising, considering the fact that very few peopleare working in this direction.

Then, we have presented different approaches taken in this contextand a few successful project as use cases.




Conclusion II

Talking about future directions, the next step could be, of course,to try more advanced ML algorithms along with unsupervised fea-ture extraction methods bringing more expertise from the AI/MLcommunity.

On the long run, the heuristic and machine learning methods, andcombined AI metasystems, have a very long way to go. This isno longer only about mathematics: all kinds of more or less formallarge knowledge bases are becoming available in other sciences, andautomated reasoning could become one of the strongest methodsfor general reasoning in sciences when sufficient amount of formalknowledge exists.




References I

Ethem Alpaydin.

Introduction to machine learning.

MIT press, 2004.

Andrea Asperti, Wilmer Ricciotti, Claudio Sacerdoti Coen, and EnricoTassi.

The matita interactive theorem prover.

In Automated Deduction–CADE-23, pages 64–69. Springer, 2011.

Yves Bertot and Pierre Casteran.

Interactive theorem proving and program development: Coq’Art: thecalculus of inductive constructions.

springer, 2004.




References II

Andrew Carlson, Chad Cumby, Jeff Rosen, and Dan Roth.

The snow learning architecture.

Technical report, Technical report UIUCDCS, 1999.

Leonardo De Moura and Nikolaj Bjørner.

Z3: An efficient smt solver.

In Tools and Algorithms for the Construction and Analysis of Systems,pages 337–340. Springer, 2008.

Kurt Godel.

Uber formal unentscheidbare satze der principia mathematica undverwandter systeme i.

Monatshefte fur mathematik und physik, 38(1):173–198, 1931.




References III

Adam Grabowski, Artur Kornilowicz, and Adam Naumowicz.

Mizar in a nutshell.

Journal of Formalized Reasoning, 3(2):153–245, 2010.

John Harrison.

Hol light: A tutorial introduction.

In Formal Methods in Computer-Aided Design, pages 265–269. Springer,1996.

Jonathan Heras and Ekaterina Komendantskaya.

Ml4pg: proof-mining in coq.

CoRR, 2013.

Matt Kaufmann, J Strother Moore, and Panagiotis Manolios.

Computer-aided reasoning: an approach.

Kluwer Academic Publishers, 2000.




References IV

Daniel A Kuhlwein.

Machine learning for automated reasoning.

2013.

Huan Liu and Hiroshi Motoda.

Feature selection for knowledge discovery and data mining.

Springer, 1998.

Tobias Nipkow, Lawrence C Paulson, and Markus Wenzel.

Isabelle/HOL: a proof assistant for higher-order logic, volume 2283.

Springer, 2002.

Jens Otten and Wolfgang Bibel.

leancop: lean connection-based theorem proving.

Journal of Symbolic Computation, 36(1):139–161, 2003.




References V

Sam Owre and Natarajan Shankar.

A brief overview of pvs.

In Theorem Proving in Higher Order Logics, pages 22–27. Springer, 2008.

Arthur William Quaife et al.

Automated development of fundamental mathematical theories.

1990.

Alexandre Riazanov and Andrei Voronkov.

The design and implementation of vampire.

AI communications, 15(2):91–110, 2002.

Bertrand Russell and Alfred North Whitehead.

Principia mathematica vol.

1925.




References VI

Stephan Schulz.

E-a brainiac theorem prover.

Ai Communications, 15(2):111–126, 2002.

John Shawe-Taylor and Nello Cristianini.

Kernel methods for pattern analysis.

Cambridge university press, 2004.

Konrad Slind and Michael Norrish.

A brief overview of hol4.

In Theorem Proving in Higher Order Logics, pages 28–32. Springer, 2008.

Josef Urban.

Translating mizar for first order theorem provers.

In Mathematical Knowledge Management, pages 203–215. Springer, 2003.




References VII

Josef Urban.

Malarea: a metasystem for automated reasoning in large theories.

ESARLT, 257, 2007.

Josef Urban, Jirı Vyskocil, and Petr Stepanek.

Malecop machine learning connection prover.

In Automated Reasoning with Analytic Tableaux and Related Methods,pages 263–277. Springer, 2011.

Christoph Weidenbach, Dilyana Dimova, Arnaud Fietzke, Rohit Kumar,Martin Suda, and Patrick Wischnewski.

Spass version 3.5.

In Automated Deduction–CADE-22, pages 140–145. Springer, 2009.



machine learning for automated reasoning: an overview

Science

automated reasoning

bolognamachine learning

automated theoremproving

formal mathematics

automated theorem provers

machine learning techniques

formal system

corpus of mathematics