machine learning for automated reasoning: an overview

58
Introduction Background Different approaches Latest successful projects Conclusion Machine Learning for Automated Reasoning: An Overview Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna [email protected] January 27, 2015 Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna Machine Learning for Automated Reasoning: An Overview

Upload: vincenzo-lomonaco

Post on 17-Jul-2015

706 views

Category:

Science


5 download

TRANSCRIPT

Page 1: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Machine Learning for Automated Reasoning: AnOverview

Vincenzo Lomonaco

Alma Mater Studiorum - University of Bologna

[email protected]

January 27, 2015

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 2: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Index

1 Introduction

2 BackgroundITPs and ATPsMachine learning

3 Different approachesML for premises selectionML for heuristics selection

4 Latest successful projectsML4PGMaShMaLAReaMaLeCoPMaLeS

5 Conclusion

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 3: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Summary

In recent years, development of interactive and automated theo-rem provers has led to creation of big data sets of formal mathemat-ical libraries and varied infrastructures for proofs and software/hardwareverification.

At the same time, machine learning techniques has shown to per-form well on a large number of tasks in the field of artificial intelli-gence and Automated Reasoning.

In this talk we cover a number of successful approaches that aim toexploit this increasing amount of data, learning inductively fromprevious proofs.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 4: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Introduction I

In Principia Mathematica [18], Whitehead and Russell set outto show by example that all of mathematics can be derivedfrom a small set of axioms using an appropriate logicalcalculus.

Even though Godel later showed that no effectively generatedconsistent axiom system can capture all mathematical truth[6], Principia Mathematica showed that most of normalmathematics can indeed be catered by a formal system.

With the advent of computers, formal mathematics became amore realistic proposal

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 5: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Introduction II

In the last few decades the exponential raise in computerpower and Computer commodities has lead to an increasinginterest and hope in interactive and automated theoremproving (ITP and ATP) softwares resumable in the strongquote by Art Quaife [16] in 1992:

The time will come when such crushers asRiemann’s hypothesis and Goldbach’s conjecture willbe fair game for automated reasoning programs. Forthose of us who arrange to stick around, endless funawaits us in the automated development andeventual enrichment of the corpus of mathematics.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 6: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Introduction III

Before the pioneer work of Josef Urban applying first-orderlogic ATP methods on large corpus of formal mathematicalproofs (Mizar Mathematical Library also known as MML) in2003 [22] the field was slowing down.

Then, an increasing number of projects about linking ITPlibraries to ATP emerged and led to a new hope.

Last recent advances in the fields of Artificial Intelligence(AI) and Machine Learning (ML) are now shaping the way ofthinking about theorem proving and automated reasoning ingeneral.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 7: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Introduction IV

The novel idea

The novel idea is to take statistical inferences about previous proofsinto consideration and merge this kind of inductive reasoning withthe classical deductive reasoning used in ATP and ITP.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 8: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Background

In this section we provide a brief background for covering both as-pects of Machine Learning and Theorem proving.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 9: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ITPs and ATPs

ITPs

Interactive theorem provers (ITP), or proof assistants, arecomputer programs that support the creation of formal proofs.Proofs are written in the input language of the ITP, which canbe thought of as being at the intersection between aprogramming language, a logic, and a mathematicaltypesetting system.

ACL2 [10], Coq [3], HOL4 [21], HOL Light [8], Isabelle [13],Mizar [7], PVS [15] and Matita [2] are perhaps the mostwidely used ITPs.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 10: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ITPs and ATPs

ATPs

In contrast to interactive theorem provers, automatedtheorem provers (ATPs) work without human interaction.They take a problem as input, consisting of a set of axiomsand a conjecture, and attempt to deduce the conjecture fromthe axioms.

E [19], SPASS [25], Vampire [17], and Z3 [5] are well-knownATPs for classical first-order logic.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 11: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Machine learning

Machine Learning I

Machine learning concerns itself with extracting informationfrom data [1].The results of a learning algorithm is aprediction function that takes a new datapoint and returns atarget value.

Features are the input of the prediction function and shoulddescribe the relevant attributes of the datapoint. A datapointcan have several possible feature representations. Featureengineering concerns itself with identifying relevant features[12].

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 12: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Machine learning

Machine Learning II

From a mathematical point of view, most machine learningproblems can be reduced to an optimization problem:

Let D ⊆ X × T be a training dataset consisting of datapointsand their corresponding target value.

Let ϕ : X → Ω be a feature function that maps a datapoint toits feature representation in the feature space Ω (usually asubset of Rn for some n ∈ R).

Furthermore, let F ⊆ (Ω→ T ) be a set of functions that mapfeatures to the target space and s a (convex) score functions : D × F → R.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 13: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Machine learning

Machine Learning III

One possible goal is to find the function f ∈ F thatmaximizes the average score over the training set D.

The main differences between various learning algorithms arethe function space F and the score function s they use.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 14: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Different approaches I

The AI fields of deductive reasoning and inductive reasoning (rep-resented by machine learning, data mining, knowledge discovery indatabases, etc.) have so far benefited relatively little from eachother’s progress.

This is an obvious deficiency in comparison with the human mind,which can both inductively suggest new ideas and problem solu-tions based on analogy, memory, statistical evidence, etc., and alsoconfirm, adjust, and even significantly modify these ideas and prob-lem solutions by deductive reasoning and explanation, based on theunderstanding of the world.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 15: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Different approaches II

In recent years, a number of different actions and approacheshave been taken in this direction. We can categorize them in twomain branches:

ML for premises selection

ML for heuristics selection

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 16: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML for premises selection

ML for premises selection

Premise selection can be useful as a standalone service for the ITPs(suggesting relevant lemmas), or in conjunction with ATP methodsthat can attempt to find a proof from the relevant premises.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 17: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML for premises selection

Guideline

In the training phase, the learning algorithm is allowed to learn fromthe proofs of all previously proved theorems. For all theorems in thetraining set, their corresponding dependencies should be ranked ashigh as possible. I.e., the score function should optimize the ranksof the premises that were used in the proof.

To do this all learning algorithms require a set of features as inputdata codified as a real vector. Therefore a method is needed totranslate formula trees into real vectors that tries to characterizethe formula.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 18: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML for premises selection

Dependencies graph and Formula Tree examples

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 19: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML for premises selection

Features to use

The symbols that appear in a formula can be seen as itsbasic characterization and hence a simple approach is to takethe set of symbols of a formula as its feature set.

The symbols correspond to the node labels in the formula tree.In addition to the symbols, one can also include as featuresthe subterms and subformulas of the formula to prove.

Since the formalisms supported by the vast majority of ITPsystems are typed (or sorted) adding the types that appear inthe formula tree as additional features is reasonable.

Adding the feature vectors of some of the last previouslyproved theorems to the feature vector of the conjecture, in aweighted fashion, is a way to add information about thecontext.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 20: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML for premises selection

Math point of view

The problem could be seen as a classification problem where for eachpremise p ∈ Γ we learn a real-valued classifier function:

Cp(·) := Γ→ R (1)

which, given a conjecture c , estimates how useful p is for proving c .The premises for a conjecture c ∈ Γ are then ranked by the valuesof Cp(c).

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 21: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML for heuristics selection

ML for heuristics selection

Automated theorem proving is a search problem. Many differentapproaches exist, and most of them have parameters that can betuned. Examples of such parameterizations are clause weightingand selection schemes, term orderings, and sets of inference andreduction rules used.

A specific choice of parameters defines a search strategy. Thechoice of a strategy can often make the difference between findinga proof in a few milliseconds or not at all.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 22: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML for heuristics selection

Guideline

The strategy selection problem consists of three subproblems:

Finding a good set of preselected strategies <.

Defining features Ω which are easy to compute (via a featurefunction ϕ , but also expressive enough to distinguish differenttypes of problems.

Determining a method which given the features of a problemcreates a strategy schedule.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 23: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML for heuristics selection

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 24: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML for heuristics selection

Math point of view

Machine learning in this case is applied to predict the runtime ofan ATP over a specific class of problems in order to automaticallychoose the best suitable strategy for a given unknown problem. Foreach strategy s in the preselected strategies S , we are searching fora function:

ρs : P → R (2)

such that for all problems p ∈ P the predicted values are close tothe actual runtimes: ρs(p) ∼ τ(p, s).

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 25: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Latest successful projects I

ML4PG (machine learning extension for Proof General) [9] isan interactive tool that provides statistical proof hints duringthe process of Coq/SSReflect proof development.

MaSh (Machine Learning for Sledgehammer) [11], now partof the default Isabelle installation, offers an alternative toMePo (default relevance filter in Sledgehammer) by learningfrom successful proofs.

MaLARea (Machine Learner for Automated Reasoning) [23]is a metasystem, which turns out to have so far the bestperformance on large theory benchmarks like the MPTPChallenge and MPTP2078.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 26: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Latest successful projects II

MaLeCoP (Machine Learning Connection Prover) [24] is anevolution of MaLARea where the learned knowledge is usedfor guiding the proof search mechanisms inside a modifiedversion of leanCoP [14].

MaLeS (Machine Learning of Strategies) [11] is a frameworkthat develops strategies for ATPs and creates suitableschedules of strategies for individual problems.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 27: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML4PG

ML4PG

ML4PG is an extension to Proof General (an Emacs based genericinterface for theorem provers) that uses state-of-the-art machinelearning techniques to interactively find proof patterns from Coqand SS-Reflect proofs.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 28: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML4PG

How it works

It works on the background of Proof General, and extractssome simple,low-level features from interactive proofs inCoq/SSReflect;

On user’s request, it sends the gathered statistics to a chosenmachine-learning interface and triggers execution of aclustering algorithm of the user’s choice;

It does some gentle post-processing of the results given by themachine-learning tool, and displays families of relatedproofs to the user.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 29: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML4PG

Extracted Features: An example I

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 30: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML4PG

Extracted Features: An example II

Every machine learning engine has its concrete format to representfeature vectors; therefore, it is necessary to define translators toadapt ML4PG’s internal encoding of feature vectors to the concreterepresentation of the machine learning engine.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 31: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

ML4PG

ML engine

ML4PG engine is flexible to use all sorts of learning algorithms. Upto now, it has been connected ML4PG to a variety of clusteringalgorithms a family of unsupervised learning methods. Clusteringtechniques divide data into n groups of similar objects (called clus-ters), where the value of n is provided by the user.

The ML4PG user can interactively select different clustering algo-rithms available in Matlab and Weka.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 32: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaSh

MaSh

MaSh, offers an alternative to MePo by learning from successfulproofs and not only ranking relevant promises based on syntacticsimilarity.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 33: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaSh

MaSh’s heart

MaSh’s heart is a Python program that implements a custom ver-sion of a weighted sparse naive Bayes algorithm that is fasterthan the naive Bayes algorithm implemented in the SNoW [4]. ThisPython program is used within a Standard ML module that inte-grates machine learning with Isabelle. MaSh follows the ”four zeros”philosophy meaning:

”Zero-configuration”

”Zero-click”

”Zero-maintenance”

”Zero-overhead”.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 34: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaSh

features used I

For each term in the formula, excluding the outer quantifiers, con-nectives, and equality, the features are derived from the nontrivialfirst-order patterns up to a given depth. Variables are replaced bythe wildcard (underscore). Given a maximum depth of 2, the termg (h x a), where constants g , h, a originate from theories T , U, V ,yields the patterns:

T .g( ) T .g(U.h( ; )) U.h( ; ) U.h( ;V .a) V .a

which are simplified and encoded respectively into the features:

T .g T .g(U.h) U.h U.h(V .a) V .a

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 35: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaSh

features used II

Types, excluding those of propositions, Booleans, and functions, areencoded using an analogous scheme.

Type variables constrained by type classes give rise to features cor-responding to the specified type classes and their superclasses.

Finally, various pieces of metainformation are encoded as features:the theory to which the fact belongs; the kind of rule (e.g., introduc-tion, simplification); whether the fact is local; whether the formulacontains any existential quantifiers or λ-abstractions.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 36: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaSh

Results

It was found that MaSh outperforms MePo on different datasets andtheir combination (as a ensemble model) increases the number ofsolved problems in the Judgement Day benchmark by 4.2% [11].

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 37: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLARea

MaLARea

The closed loop between using deductive methods to find proofs,and using inductive methods to learn from the existing proofs andsuggest new proof directions, is the main idea behind the MaLAReametasystem.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 38: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLARea

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 39: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLARea

ML in MaLARea

There are many kinds of information that such an autonomous meta-system can try to use and learn. The second version of MaLAReaalready uses also structural and semantic features of formulas fortheir characterization and for improving the axiom selection.

Successful runs provide additional data for learning (useful for solvingrelated problems), while unsuccessful runs can yield countermodels,which can be re-used for semantic pre-selection and as additionalinput features for learning.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 40: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLARea

high-level approach

The communication between learning and the ATP systems is high-level: The learned relevance is used to try to solve problems withvaried limited numbers of the most relevant axioms.

Pro:

MaLARea gives a generic inductive (learning)/deductive(ATP) metasystem to which any ATP can be easily pluggedas a blackbox (E and SPASS by default).

Con:

it does not attempt to use the learned knowledge for guidingthe ATP search process once the axioms are selected.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 41: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLeCoP

MaLeCoP

While in MaLARea learning-based axiom selection is done outsideunmodified theorem provers, in MaLeCoP the learning-based selec-tion is done inside the prover, and the interaction between learningof knowledge and its application is much finer.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 42: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLeCoP

General architecture

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 43: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLeCoP

ML in MaLeCoP I

The basic learning in MaLARea is used to associate conjecture sym-bols with premises used in the conjecture’s proof. This learningmode can be easily reproduced by MaLeCoP.

For learning clause selection on branches, instead, can be usedanother information supplied by the prover: successful clause choicesdone for particular paths in the proof.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 44: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLeCoP

ML in MaLeCoP II

The information extracted from subtrees also contains the cost(again in terms of inference numbers) of finishing the subtree.

In the original project the authors did not use this information yet inlearning, however They plan to use learning on this data for graduallyovercoming the most costly bad clause choices.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 45: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLeS

MaLeS

MaLeS is a framework that develops strategies for automatedtheorem provers (ATPs) and creates suitable schedules of strate-gies for individual problems. The framework can be used in a push-button way to develop such strategies and schedules for an arbitraryATP.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 46: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLeS

MaLeS Solutions

With respect to the three main subproblems inherent the strategyselection problem, MaLeS:

Perform a stochastic local search by taking previouslyhuman-defined strategies as starting points of the search tofind a set of good preselected strategies.

Choose to use the well-known set of features designed bySchulz for clause-normal-form and first order problems todescribe well each problem.

Uses kernels to learn the runtime prediction function andschedule the strategies coherently.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 47: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLeS

Features used

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 48: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLeS

ML in MaLeS I

Kernels are a very popular machine learning method that has suc-cessfully been applied in many domains [20]. A kernel can be seenas a similarity function between feature vectors.

The kernel used in this project is the well-known Gaussian kernelk with parameter σ of two problems p, q ∈ P with feature vectorsϕ(p), ϕ(q) ∈ Ω ⊆ Rn for some n ∈ N is defined as:

K (p, q) := exp

(−ϕ(p)Tϕ(p)− 2ϕ(p)Tϕ(q) + ϕ(q)Tϕ(q)

σ2

)(3)

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 49: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

MaLeS

ML in MaLeS II

Let t ∈ R be a time limit. For each preselected strategy s ∈ S ,the ATP is run with strategy s and time limit t on each problem inPtrain. For each strategy Ps

train ⊆ Ptrain is the set of problems thatthe ATP can solve within the time limit t with strategy s. In kernelbased machine learning, the prediction function s has the form:

ρs(p) =∑

q∈Pstrain

αsqK (p, q) (4)

Then, having defined the prediction functions, for each new prob-lem, MaLeS uses the prediction functions to select the strategy andruntime that is most likely to solve the problem. If the predictedstrategy does not solve the problem, MaLeS updates all predictionfunctions with this new information.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 50: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Conclusion I

In this talk, we have been discussing a rapidly emerging researchtrend that aims to bring machine learning to theorem proving and,more in general, to automated reasoning.

Early results are promising, considering the fact that very few peopleare working in this direction.

Then, we have presented different approaches taken in this contextand a few successful project as use cases.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 51: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

Conclusion II

Talking about future directions, the next step could be, of course,to try more advanced ML algorithms along with unsupervised fea-ture extraction methods bringing more expertise from the AI/MLcommunity.

On the long run, the heuristic and machine learning methods, andcombined AI metasystems, have a very long way to go. This isno longer only about mathematics: all kinds of more or less formallarge knowledge bases are becoming available in other sciences, andautomated reasoning could become one of the strongest methodsfor general reasoning in sciences when sufficient amount of formalknowledge exists.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 52: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

References I

Ethem Alpaydin.

Introduction to machine learning.

MIT press, 2004.

Andrea Asperti, Wilmer Ricciotti, Claudio Sacerdoti Coen, and EnricoTassi.

The matita interactive theorem prover.

In Automated Deduction–CADE-23, pages 64–69. Springer, 2011.

Yves Bertot and Pierre Casteran.

Interactive theorem proving and program development: Coq’Art: thecalculus of inductive constructions.

springer, 2004.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 53: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

References II

Andrew Carlson, Chad Cumby, Jeff Rosen, and Dan Roth.

The snow learning architecture.

Technical report, Technical report UIUCDCS, 1999.

Leonardo De Moura and Nikolaj Bjørner.

Z3: An efficient smt solver.

In Tools and Algorithms for the Construction and Analysis of Systems,pages 337–340. Springer, 2008.

Kurt Godel.

Uber formal unentscheidbare satze der principia mathematica undverwandter systeme i.

Monatshefte fur mathematik und physik, 38(1):173–198, 1931.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 54: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

References III

Adam Grabowski, Artur Kornilowicz, and Adam Naumowicz.

Mizar in a nutshell.

Journal of Formalized Reasoning, 3(2):153–245, 2010.

John Harrison.

Hol light: A tutorial introduction.

In Formal Methods in Computer-Aided Design, pages 265–269. Springer,1996.

Jonathan Heras and Ekaterina Komendantskaya.

Ml4pg: proof-mining in coq.

CoRR, 2013.

Matt Kaufmann, J Strother Moore, and Panagiotis Manolios.

Computer-aided reasoning: an approach.

Kluwer Academic Publishers, 2000.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 55: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

References IV

Daniel A Kuhlwein.

Machine learning for automated reasoning.

2013.

Huan Liu and Hiroshi Motoda.

Feature selection for knowledge discovery and data mining.

Springer, 1998.

Tobias Nipkow, Lawrence C Paulson, and Markus Wenzel.

Isabelle/HOL: a proof assistant for higher-order logic, volume 2283.

Springer, 2002.

Jens Otten and Wolfgang Bibel.

leancop: lean connection-based theorem proving.

Journal of Symbolic Computation, 36(1):139–161, 2003.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 56: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

References V

Sam Owre and Natarajan Shankar.

A brief overview of pvs.

In Theorem Proving in Higher Order Logics, pages 22–27. Springer, 2008.

Arthur William Quaife et al.

Automated development of fundamental mathematical theories.

1990.

Alexandre Riazanov and Andrei Voronkov.

The design and implementation of vampire.

AI communications, 15(2):91–110, 2002.

Bertrand Russell and Alfred North Whitehead.

Principia mathematica vol.

1925.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 57: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

References VI

Stephan Schulz.

E-a brainiac theorem prover.

Ai Communications, 15(2):111–126, 2002.

John Shawe-Taylor and Nello Cristianini.

Kernel methods for pattern analysis.

Cambridge university press, 2004.

Konrad Slind and Michael Norrish.

A brief overview of hol4.

In Theorem Proving in Higher Order Logics, pages 28–32. Springer, 2008.

Josef Urban.

Translating mizar for first order theorem provers.

In Mathematical Knowledge Management, pages 203–215. Springer, 2003.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview

Page 58: Machine Learning for Automated Reasoning: An Overview

Introduction Background Different approaches Latest successful projects Conclusion

References VII

Josef Urban.

Malarea: a metasystem for automated reasoning in large theories.

ESARLT, 257, 2007.

Josef Urban, Jirı Vyskocil, and Petr Stepanek.

Malecop machine learning connection prover.

In Automated Reasoning with Analytic Tableaux and Related Methods,pages 263–277. Springer, 2011.

Christoph Weidenbach, Dilyana Dimova, Arnaud Fietzke, Rohit Kumar,Martin Suda, and Patrick Wischnewski.

Spass version 3.5.

In Automated Deduction–CADE-22, pages 140–145. Springer, 2009.

Vincenzo Lomonaco Alma Mater Studiorum - University of Bologna

Machine Learning for Automated Reasoning: An Overview