Symbolic Bounds Analysis of Pointers, Array Indices, and
Accessed Memory Regions
Radu Rugina and Martin RinardLaboratory for Computer Science
Massachusetts Institute of Technology
Outline• Examples
• Key Problem: Extracting Symbolic Bounds for Accessed Memory Regions
• Key Technology: Formulating and Solving Systems of Symbolic Inequality Constraints
• Results• Conclusion
Example - Divide and Conquer Sort
47 6 1 53 8 2
8 2536 147
Example - Divide and Conquer Sort
47 6 1 53 8 2
Divide
2 8531 674
8 2536 147
47 6 1 53 8 2
Example - Divide and Conquer Sort
Conquer
Divide
Example - Divide and Conquer Sort
2 8531 674 Conquer
8 2536 147 Divide
47 6 1 53 8 2
41 6 7 32 5 8Combine
Example - Divide and Conquer Sort
2 8531 674 Conquer
8 2536 147 Divide
47 6 1 53 8 2
41 6 7 32 5 8Combine
21 3 4 65 7 8
“Sort n Items in d, Using t as Temporary Storage”
void sort(int *d, int *t, int n) if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+2*(n/2),t+2*(n/2),n/4);sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/
4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Sort n Items in d, Using t as Temporary Storage”
void sort(int *d, int *t, int n) if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+2*(n/2),t+2*(n/2),n/4);sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/
4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n); Motivating ProblemExploit parallelism in this code
void sort(int *d, int *t, int n) if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+2*(n/2),t+2*(n/2),n/4);sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/
4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Recursively Sort Four Quarters of d”
Divide array into subarrays and recursively sort
subarrays
47 6 1 53 8 2
void sort(int *d, int *t, int n) if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+2*(n/2),t+2*(n/2),n/4);sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/
4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Recursively Sort Four Quarters of d”
dd+n/4d+n/2
d+3*(n/4)
Subproblems Identified
Using Pointers Into Middle of Array
47 6 1 53 8 2
void sort(int *d, int *t, int n) if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+2*(n/2),t+2*(n/2),n/4);sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/
4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Recursively Sort Four Quarters of d”
dd+n/4d+n/2
d+3*(n/4)
74 1 6 53 2 8
void sort(int *d, int *t, int n) if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+2*(n/2),t+2*(n/2),n/4);sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/
4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Recursively Sort Four Quarters of d”
dd+n/4d+n/2
d+3*(n/4)
Sorted Results Written Back Into
Input Array
void sort(int *d, int *t, int n) if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+2*(n/2),t+2*(n/2),n/4);sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/
4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Merge Sorted Quarters of d Into Halves of t”
74 1 6 53 2 8
41 6 7 32 5 8d
tt+n/2
void sort(int *d, int *t, int n) if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+2*(n/2),t+2*(n/2),n/4);sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/
4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Merge Sorted Halves of t Back Into d”
21 3 4 65 7 8
41 6 7 32 5 8d
tt+n/2
void sort(int *d, int *t, int n) if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+2*(n/2),t+2*(n/2),n/4);sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/
4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Use a Simple Sort for Small Problem Sizes”
47 6 1 53 8 2
dd+n
void sort(int *d, int *t, int n) if (n > CUTOFF) {
sort(d,t,n/4); sort(d+n/4,t+n/4,n/4);sort(d+2*(n/2),t+2*(n/2),n/4);sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));merge(d,d+n/4,d+n/2,t);merge(d+n/2,d+3*(n/
4),d+n,t+n/2);merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
“Use a Simple Sort for Small Problem Sizes”
47 1 6 53 8 2
dd+n
Parallel Sortvoid sort(int *d, int *t, int n) if (n > CUTOFF) {
spawn sort(d,t,n/4); spawn sort(d+n/4,t+n/4,n/4);spawn sort(d+2*(n/2),t+2*(n/2),n/4);spawn sort(d+3*(n/4),t+3*(n/4),n-
3*(n/4));sync;spawn merge(d,d+n/4,d+n/2,t);spawn
merge(d+n/2,d+3*(n/4),d+n,t+n/2);sync;merge(t,t+n/2,t+n,d);
} else insertionSort(d,d+n);
What Do You Need To Know To Exploit This Form of Parallelism?
What Do You Need To Know To Exploit This Form of Parallelism?
Symbolic Information About Accessed Memory Regions
Calls to sort access disjoint parts of d and tTogether, calls access [d,d+n-1] and [t,t+n-1]
sort(d,t,n/4);
sort(d+n/4,t+n/4,n/4);
sort(d+n/2,t+n/2,n/4);
sort(d+3*(n/4),t+3*(n/4), n-3*(n/4));
Information Needed To Exploit Parallelism
dt
dt
dt
dt
d+n-1t+n-1
d+n-1t+n-1
d+n-1t+n-1
d+n-1t+n-1
First two calls to merge access disjoint parts of d,t
Together, calls access [d,d+n-1] and [t,t+n-1]
merge(d,d+n/4,d+n/2,t);
merge(d+n/2,d+3*(n/4), d+n,t+n/2);
merge(t,t+n/2,t+n,d);
dt
dt
d+n-1t+n-1
d+n-1t+n-1
dt
d+n-1t+n-1
Information Needed To Exploit Parallelism
dt
d+n-1t+n-1
Information Needed To Exploit Parallelism
Calls to insertionSort access [d,d+n-1]
insertionSort(d,d+n);
What Do You Need To Know To Exploit This Form of Parallelism?
sort(p,n) accesses [p,p+n-1]insertionSort(p,n) accesses [p,p+n-1]merge(l,m,h,d) accesses [l,h-1], [d,d+(h-l)-1]
Symbolic Information About Accessed Memory Regions:
How Hard Is It To Figure These Things Out?
Challenging
How Hard Is It To Figure These Things Out?
How Hard Is It To Figure These Things Out?
void insertionSort(int *l, int *h) {int *p, *q, k;for (p = l+1; p < h; p++) {
for (k = *p, q = p-1; l <= q && k < *q; q--)*(q+1) = *q;
*(q+1) = k;}
}Not immediately obvious that
insertionSort(l,h) accesses [l,h-1]
void merge(int *l1, int*m, int *h2, int *d) {int *h1 = m; int *l2 = m;while ((l1 < h1) && (l2 < h2))
if (*l1 < *l2) *d++ = *l1++;else *d++ = *l2++;
while (l1 < h1) *d++ = *l1++;while (l2 < h2) *d++ = *l2++;
}
Not immediately obvious that merge(l,m,h,d)
accesses [l,h-1] and [d,d+(h-l)-1]
How Hard Is It To Figure These Things Out?
Issues• Heavy Use of Pointers
• Pointers into Middle of Arrays• Pointer Arithmetic• Pointer Comparison
• Multiple Procedures• sort(int *d, int *t, n)• insertionSort(int *l, int *h)• merge(int *l, int *m, int *h, int *t)
• Recursion
How the Compiler Does It
Compiler StructurePointer Analysis
Bounds Analysis
Region Analysis
Parallelization
Disambiguate References at Granularity of Allocation Blocks
Symbolic Upper and LowerBounds for Each Memory Access in Each Procedure
Symbolic Regions AccessedBy Execution of Each Procedure
Independent Procedure CallsThat Can Execute in Parallel
Example – Array Incrementvoid f(char *p, int n)
if (n > CUTOFF) {f(p, n/2); /* increment first half */f(p+n/2, n/2); /* increment second half */} else {/* base case: initialize small array */int i = 0;while (i < n) { *(p+i) += 1; i++; }}
Intra-procedural Bounds Analysis• For each integer variable at each program
point, derive lower and upper bounds
• Bounds are symbolic expressions• variables represent initial values of
parameters of enclosing procedure• bounds are linear combinations of
variables
• Example expression for f(p,n): p+n-1
What are upper and lower bounds for region accessed by while loop in base
case?
int i = 0;while (i < n) { *(p+i) += 1; i++; }
Bounds Analysis
Bounds Analysis, Step 1Build control flow graph
i = 0
i < n
*(p+i) += 1
i = i+1
Set up bounds at beginning of basic blocks
Bounds Analysis, Step 2
l1 i u1i = 0
i < n
*(p+i) += 1
i = i+1
l2 i u2
l3 i u3
Compute transfer functionsBounds Analysis, Step 3
l1 i u1i = 0
i < n
*(p+i) += 1
i = i+1
l2 i u2
l3 i u3
0 i 0
l3 i u3
l3+1 i u3+1
l2 i n-1 l2 i u2
l2 i u2
Compute transfer functionsBounds Analysis, Step 3
l1 i u1i = 0
i < n
*(p+i) += 1
i = i+1
l3 i u3
0 i 0
l3 i u3
l3+1 i u3+1
Set up constraints for boundsBounds Analysis, Step 4
l2 i n-1 l2 i u2
l2 i u2
l1 i u1i = 0
i < n
*(p+i) += 1
i = i+1
l3 i u3
0 i 0
l3 i u3
l3+1 i u3+1
l2 0l2 l3+1l3 l2
0 u2
u2+1 u2
n-1 u3
Set up constraints for boundsBounds Analysis, Step 4
l2 i n-1 l2 i u2
l2 i u2
- i +i = 0
i < n
*(p+i) += 1
i = i+1
l3 i u3
0 i 0
l3 i u3
l3+1 i u3+1
l2 0l2 l3+1l3 l2
0 u2
u2+1 u2
n-1 u3
Generate symbolic expressions for bounds
Goal: express bounds in terms of parametersl2 = c1p + c2n + c3
l3 = c4p + c5n + c6
Bounds Analysis, Step 5
u2 = c7p + c8n + c9
u3 = c10p + c11n + c12
c1p + c2n + c3 0c1p + c2n + c3 c4p + c5n + c6 +1c4p + c5n + c6 c1p + c2n + c3
Substitute expressions into constraintsBounds Analysis, Step 6
0 c7p + c8n + c9
c10p + c11n + c12 +1 c7p + c8n + c9
c7p + c8n + c9 c10p + c11n + c12
Goal
Solve Symbolic Constraint Systemfind values for constraint variables c1, ..., c12 that satisfy the inequality constraints
Maximize Lower Bounds
Minimize Upper Bounds
Reduce symbolic inequalities to linear inequalities
c1p + c2n + c3 c4p + c5n + c6
if
c1 c4, c2 c5, and c3 c6
Bounds Analysis, Step 7
Apply reduction and generate a linear programc1 0 c2 0 c3 0c1 c4 c2 c5 c3 c6+1c4 c1 c5 c2 c6 c3
lower bounds upper bounds
Bounds Analysis, Step 7
Objective Function:max: (c1 + ••• + c6) - (c7 + ••• + c12)
0 c7 0 c8 0 c9
c10 c7 c11 c8 c12+1
c9
c7 c10 c8 c11 c9 c12
Apply reduction and generate a linear program
• This is a linear program (LP), not an integer linear program (ILP)
• The coefficients in the symbolic expressions are rational numbers
• Rational coefficients are needed for expressions like middle of an array: low+(high - low)/2
Bounds Analysis, Step 7
Solve linear program to extract boundsc1=0 c2 =0 c3 =0 c4=0 c5 =0 c6 =0 c7=0 c8 =1 c9 =0 c10=0 c11=1 c12=-1
Bounds Analysis, Step 8
u2 = 0u3 = n-1
l2 i n-1 l2 i u2
l2 i u2
- i +i = 0
i < n
*(p+i) += 1
i = i+1
l3 i u3
0 i 0
l3 i u3
l3+1 i u3+1
l2 = 0l3 = 0
Solve linear program to extract boundsBounds Analysis, Step 8
0 i n-1 0 i n
0 i n
- i +i = 0
i < n
*(p+i) += 1
i = i+1
0 i n-1
0 i 0
0 i n-1
1 i n
c1=0 c2 =0 c3 =0 c4=0 c5 =0 c6 =0 c7=0 c8 =1 c9 =0 c10=0 c11=1 c12=-1
u2 = 0u3 = n-1
l2 = 0l3 = 0
Solve linear program to extract boundsBounds Analysis, Step 8
0 i n-1 0 i n
0 i n
- i +i = 0
i < n
*(p+i) += 1
i = i+1
0 i n-1
0 i 0
0 i n-1
1 i n
c1=0 c2 =0 c3 =0 c4=0 c5 =0 c6 =0 c7=0 c8 =1 c9 =0 c10=0 c11=1 c12=-1
u2 = 0u3 = n-1
l2 = 0l3 = 0
Region AnalysisGoal: Compute Accessed Regions of Memory
• Intra-Procedural• Use bounds at each load or store• Compute accessed region
• Inter-Procedural• Use intra-procedural results• Set up another symbolic constraint
system• Solve to find regions accessed by entire
execution of the procedure
Basic Principle of Inter-Procedural Region Analysis
• For each procedure• Generate symbolic expressions for
upper and lower bounds of accessed regions
• Constraint System• Accessed regions include regions
accessed by statements in procedure• Accessed regions include regions
accessed by invoked procedures
void f(char *p, int n) if (n > CUTOFF) {
f(p, n/2);
f(p+n/2, n/2);} else {
int i = 0;while (i < n) { *(p+i) += 1; i++; }
}
l(f,p,n) l(f,p,n/2)u(f,p,n) u(f,p,n/2)l(f,p,n) l(f,p+n/2,n/2)u(f,p,n) u(f,p+n/2,n/2)
l(f,p,n) pu(f,p,n) p+n-1
Inter-Procedural Constraints in Example
Accesses [ l(f,p,n), u(f,p,n) ]
Derive Constraint System• Generate symbolic expressions
l(f,p,n) = C1p + C2n + C3
u(f,p,n) = C4p + C5n + C6
• Build constraint systemC1p + C2n + C3 pC4p + C5n + C6 p + n -1C1p + C2n + C3 C1p + C2(n/2) + C3 C4p + C5n + C6 C4p + C5(n/2) + C6 C1p + C2n + C3 C1(p+n/2) + C2(n/2) + C3 C4p + C5n + C6 C4(p+n/2) + C5(n/2) + C6
• Simplify Constraint SystemC1p + C2n + C3 pC4p + C5n + C6 p + n -1C2n C2(n/2)C5n C5(n/2) C2(n/2) C1(n/2)C5(n/2) C4(n/2)
• Generate and Solve Linear Programl(f,p,n) = pu(f,p,n) = p+n-1
• Access region: [p, p+n-1]
Solve Constraint System
Parallelization
• Dependence Testing of Two Calls• Do accessed regions intersect?• Based on comparing upper and lower
bounds of accessed regions
• Parallelization• Find sequences of independent calls• Execute independent calls in parallel
Details• Inter-procedural positivity analysis
• Verify that variables are positive• Required for correctness of reduction
• Correlation analysis• Integer division
• Basic idea : (n-1)/2 n/2 n/2• Generalized : (n-m+1)/m n/m
n/m• Linear system decomposition
Comparison to Dataflow Analysis
• Dataflow analysis:• Uses iterative algorithms• Cannot handle lattices with infinite
ascending chains, because termination is not guaranteed
• Our framework • Reduces the analysis to a linear program• Works for lattices with infinite ascending
chains like integers, rational numbers or polynomials
• No possibility of non-termination
Automatic ParallelizationOf Sequential Programs
Data Race DetectionFor Parallel Programs
Array Bounds CheckingFor Unsafe Programs
Bounds Checks EliminationFor Safe Programs
Transformations Verifications
Uses of Symbolic Bounds Information
Application of Analysis Framework
• Bitwidth Analysis:• Computes minimum number of bits
to represent computed values • Important for hardware synthesis
from high level languages
• For our framework:• Bitwidth analysis is a special case:
Compute precise numeric bounds• Constraint system = linear program
Experimental Results• Implementation - SUIF, lp_solve, Cilk
• Parallelization speedups:
Application
Number of Processors1 2 4 6 8
Fibonacci 0.76 1.52 3.03 4.55 6.04Quicksort 1.00 1.99 3.89 5.68 7.36Mergesort 1.00 2.00 3.90 5.70 7.41Heat 1.03 2.02 3.89 5.53 6.83BlockMul 0.97 1.86 3.84 5.70 7.54NoTempMul
1.02 2.01 4.03 6.02 8.02
LU 0.98 1.95 3.89 5.66 7.39
• Implementation - SUIF, lp_solve, Cilk
• Parallelization speedups:• Close to linear speedups• Most of parallelism detected
Experimental Results
• Implementation - SUIF, lp_solve, Cilk
• Parallelization speedups:• Close to linear speedups• Most of parallelism detected
• Compiler also verified that:• Parallel versions were free of data races• Benchmarks do not violate the array bounds
Experimental Results
Experimental Results• Implementation - SUIF, lp_solve
• Bitwidth reduction:
0
20
40
60
80
100
percentage ofeliminatedregister bitspercentage ofeliminatedmemory bits
Context• Mainstream parallelizing compilers
• Loop nests, dense matrices• Affine access functions
• Our framework focuses on:• Recursion, dynamically allocated arrays• Pointers, pointer arithmetic• Key problems: pointer analysis, symbolic
region analysis, solving linear programs
Conclusion• Novel framework for symbolic bounds
analysis• Uses symbolic constraint systems• Reduces problem to linear programs• More powerful than iterative approaches
• Analysis uses:• Parallelization, data race detection• Detecting array bounds violations• Array bounds check elimination• Bitwidth analysis