diagnosing type errors with class - penn state …dbz5017/pub/pldi15-slides.pdfdiagnosing type...
TRANSCRIPT
Diagnosing Type Errors with Class
PLDI 2015
Distinguished Paper Award
Danfeng ZhangAndrew C. Myers
Cornell University
Dimitrios VytiniotisSimon Peyton-Jones
MSR Cambridge
2
“It is a truism that most bugs are detected
only at a great distance from their source.”Mitchell Wand
Finding the source of type errors, POPL’86
Error localization is difficult for ML type systems
Even worse in sophisticated type systems
The Glasgow Haskell Compiler (GHC) Type classes Type families GADTs Type signatures
3
GHC: Bool is not a numerical type
Actual mistake: ‘==’ should be ‘−’
Error messages are sometimes confusing
Inference Engine
SHErrLoc: Static Holistic Error Locator
4
Most likely error cause
A general, expressive and accurate error localization method, which handles the highly expressive type system of GHC
General Error Localization [Zhang&Myers’14]
5
Programs
JifOCaml Others
Based on Bayesian
interpretation
Cannot diagnose Haskell errors
Constraints Analysis
Constraints
The error cause is likely to be• Simple• Able to explain all errors• Not used often on correct paths
General Diagnosis Heuristics
Cause
Haskell Program
Constraints Analysis
ConstraintsA highly expressive
constraint language
A decidable and
efficient constraint
analysis algorithm
Key Contributions
Bayesian reasoning
A Bayesian model that
accounts for the richer
graph representation
fact n = if n == 0 then 1
else n * fac (n == 1)
Cause
Cause
Roadmap
7
Haskell Program
ConstraintsA highly expressive
constraint language
fact n = if n == 0 then 1
else n * fac (n == 1)
Type Checking as Constraint Solving
• ML type system
– Constraint elements: types
– Constraints: type equalities
8
Constructors: Int, Bool, ListVariables:𝛼, 𝛽, 𝛾
𝐸 ∷= 𝛼 𝑐𝑜𝑛 𝐸1, … , 𝐸𝑛𝐼 ∷= 𝐸1 = 𝐸2 𝐶 ∷= 𝑖 𝐼𝑖
Syntax of Constraints
Constraint
Element
Type Classes
9
Instances of a type class, called Num
Intuitively, a set of types
10
𝐸 ∷= 𝛼 𝑐𝑜𝑛 𝐸1, … , 𝐸𝑛𝐼 ∷= 𝐸1 = 𝐸2 𝑐𝑙𝑎 𝐸1, … , 𝐸𝑛 𝐶 ∷= 𝑖 𝐼𝑖
Syntax of Constraints
Modeling Type Class Constraints
𝐸 ∷= 𝛼 𝑐𝑜𝑛 𝐸1, … , 𝐸𝑛𝐼 ∷= 𝐸1 ≤ 𝐸2 𝐶 ∷= 𝑖 𝐼𝑖𝐼 ∷= 𝐸1 ≤ 𝐸2
A type class is a set of its instances
𝐍𝐮𝐦 𝛼 ≔ 𝛼 ≤ 𝐍𝐮𝐦
𝐸1 = 𝐸2 ≔ 𝐸1 ≤ 𝐸2 ∧ 𝐸2 ≤ 𝐸1
Our constraint language
11
𝐸 ∷= 𝛼 𝑐𝑜𝑛 𝐸1, … , 𝐸𝑛𝐼 ∷= 𝐸1 = 𝐸2 𝑐𝑙𝑎 𝐸1, … , 𝐸𝑛 𝐶 ∷= 𝑖 𝐼𝑖
Syntax of Constraints
Modeling Type Class Constraints
𝐸 ∷= 𝛼 𝑐𝑜𝑛 𝐸1, … , 𝐸𝑛𝐼 ∷= 𝐸1 ≤ 𝐸2 𝐶 ∷= 𝑖 𝐼𝑖𝐼 ∷= 𝐸1 ≤ 𝐸2
Concise; inequalities directly map to edges in a graph
Our constraint languageA type class is a
set of its instances
Types are Checked Under Hypotheses
• Type signatures and GADTs introduce hypotheses
12
Constraints
a≤ Num ⊢ a≤ Numdouble :: Num a => a -> a
double n = n * 2
Haskell Program
Hypothesis: a is an instance of Num
Constraint hypothesis
Constraints are checked under hypotheses
Types are Checked Under Axioms
• Instance declaration may introduce (global) axioms
13
instance Eq a => Eq [a]
where {...}
Haskell Program
For all a, a is an instance of Eq implies
list of a is an instance of Eq
Int ≤ Eq ∧ ∀𝑎. 𝑎 ≤ Eq ⇒ 𝑎 ≤ Eq ⊢ Int ≤ Eq
Hypothesis Axiom
Constraint example with axioms:
Int ≤ Eq ⇒ Int ≤ Eq
Modeling Hypothesis and Axioms
• Constraints (𝐶): inequalities under quantified axioms
• Quantified axioms (𝑄): implication rules
• Hypotheses (e.g., Int ≤ Num): degenerate axioms
14
𝐸 ∷= 𝛼 𝑐𝑜𝑛 𝐸1, … , 𝐸𝑛 𝑄 ∷= ∀𝑎. 𝑖 𝐼𝑖 ⇒ 𝐼𝐼 ∷= 𝐸1 ≤ 𝐸2 𝐶 ∷= 𝑗 ( 𝑖𝑄𝑖 ⊢ 𝐼𝑗)
Syntax of SHErrLoc Constraints
The Full Constraint Language
• Also supports
– Functions on constraint elements
– Nested universally and existentially quantified variables
• Is expressive enough to model
– type classes, type families, GADTs, type signatures
15
(refer to the paper for more details)
Haskell Program
Constraints Analysis
ConstraintsA highly expressive
constraint language
A decidable and
efficient constraint
analysis algorithm
Roadmap
fact n = if n == 0 then 1
else n * fac (n == 1)
Cause
Constraint Graph in a Nutshell
• Graph construction (simple case)
– Node: constraint element
– Directed edge: partial ordering
17
18
Constraint Analysis in a Nutshell
Bool is not an instance of Num
fact n = if n == 0 then 1
else n * fac (n == 1)n == 1
n 0
Num
Limitations of Previous Algorithms[Barrett et al.’00, Melski&Reps’00,Zhang&Myers’10]
19
Int ≤ C ⊢ 𝛼 = Bool ∧ [𝛼] ≤ C
Previous algorithms under-saturates the graph
Satisfiable:
𝛼 = Int
Satisfiable:
𝛼 = Bool
Previous algs only addedges ([Bool] is not in the graph!)
Bool C≰a type class
C
[𝛼]
𝛼 Bool
New Algorithm
20
Key idea: add new edges and nodes during saturation
New algorithm adds new nodes
Key challenge: naive algorithms either fail to terminate, or under-saturate the graph
Int ≤ C ⊢ 𝛼 = Bool ∧ [𝛼] ≤ C
C
[𝛼]
𝛼 Bool
[Bool]
New Algorithm in a Nutshell
21
Lemma: the algorithm always terminates
Black node: node before saturationWhite node: added during saturation
Nodes added based on patterns 1. one edge with two black nodes 2. a black/white node
Recursion check: if white node, not added based on the edge in pattern
C
[𝛼]
𝛼 Bool
[Bool]
Constraint Analysis
• The analysis also handles
– Functions on constraint elements
– Hypotheses
– Quantified axioms
• Performance
– Empirically: quadratic in graph size
22
(refer to the paper for more details)
Roadmap
23
Haskell Program
Constraints Analysis
ConstraintsA highly expressive
constraint language
A decidable and
efficient constraint
analysis algorithm
Bayesian reasoning
A Bayesian model that
accounts for the richer
graph representation
fact n = if n == 0 then 1
else n * fac (n == 1)
Cause
Cause
Likelihood Estimation [Zhang&Myers’14]
A ranking metric based on Bayesian reasoning
(𝑃1, 𝑃2 are tunable parameters)
• Simplifying assumption
– Satisfiability of paths are independent
𝑃1 𝐸 𝑃2
1 − 𝑃2
𝑘𝐸 # sat paths using locations in E
Explanation: a set of locations
White nodes break this assumption
• Observation: some paths using white nodes provide neither positive nor negative evidence
25
Satisfiability depends on edges between 𝛼 and Bool
Lemma: the satisfiability of any redundant path depends on non-redundant paths
Redundant Paths (definition in paper)
C
[𝛼]
𝛼 Bool
[Bool]
New Ranking Metric
• Intuitively,
• Top candidates returned by an efficient A* algorithm [Zhang&Myers’14]
26
The error cause is likely to be• Simple• Able to explain all errors• Not used often on correct
non-redundant paths
General Diagnosis Heuristics
non-redundant
𝑃1 𝐸 𝑃2
1 − 𝑃2
𝑘𝐸# sat paths use constraints in E
Evaluation
• Implementation
– From Haskell programs to constraints
– SHErrLoc
27
Modified GHC
GHC Constraints
little effort
Constraint Graph
Error Diagnosis
Reports
SHErrLoc
Constraint Translator
SHErrLocConstraints
50 atop 20K+ LOC
~400 LOC~7500 LOC
Evaluation Setup
• Benchmarks
– CE Benchmark: analyzed 77 Haskell programs collected from papers about type-error diagnosis, used in [Chen&Erwig’14]
– Helium benchmark: analyzed 228 programs with type-checking errors, logged by the Helium tool [Hage’14]
• Ground truth
– CE Benchmark: already well-marked
– Helium benchmark: user’s actual fix
• Correctness
– only when the programmer mistake is returned by tools
28
Accuracy on the CE Benchmark
29
Comparison with GHC
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Comparison with the Helium tool[Heeren et al.’03]
Other tool finds the correct errorSHErrLoc misses the error
Both find the correct error
Both miss the correct error
SHErrLoc finds the correct errorOther tool misses the error
SHErrLoc uses no Haskell-specific heuristics!
Accuracy on the Helium Benchmark
30
Comparison with GHC
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Other tool finds the correct errorSHErrLoc misses the error
Both find the correct error
Both miss the correct error
SHErrLoc finds the correct errorOther tool misses the error
Comparison with the Helium tool
Related Work
• General error localization [Zhang&Myers’14]
– Cannot handle the type system of GHC
– Simpler constraints and constraint analysis algorithm
• Program analyses as constraint solving [e.g., Aiken’99, Foster et al.’06]
– No support for hypotheses and axioms
• Diagnosing Haskell error [e.g., Heeren et al’03,Hage&Heeren’07,Chen&Erwig’14]
– Haskell-specific heuristics
– Unable to handle all of the sophisticated features of GHC
31
SHErrLoc
General, expressive and accurate error localization
– Applies to the highly expressive type system of GHC
– A novel graph-based constraint analysis algorithm
– Bayesian reasoning => more accurate error locations than with existing tools
32
Type classesType familiesGADTsType signatures
GHC
SHErrLoc