2011 ecoop
DESCRIPTION
Code profilers are used to identify execution bottlenecks and understand the cause of a slowdown. Execution sampling is a monitoring technique commonly employed by code profilers because of its low impact on execution. Regularly sampling the execution of an application estimates the amount of time the interpreter, hardware or software, spent in each method execution time. Nevertheless, this execution time estimation is highly sensitive to the execution environment, making it non reproductive, non-deterministic and not comparable across platforms.On our platform, we have observed that the number of messages sent per second remains within tight (±7%) bounds across a basket of 16 applications. Using principally the Pharo platform for experimentation, we show that such a proxy is stable, reproducible over multiple executions, profiles are comparable, even when obtained in different execution contexts. We have produced Compteur, a new code profiler that does not suffer from execution sampling limitations and have used it to extend the SUnit testing framework for execution comparison.TRANSCRIPT
Counting Messages as a Proxy for Average
Execution Time in Pharo
ECOOP 2011 - Lancaster
Alexandre BergelPleiad lab, DCC, University of Chile
http://bergel.eu
2
www.pharo-project.org
The Mondrian Visualization Engine
3
“I like the cool new features of Mondrian, but in my setting, drawing a canvas takes 10 seconds, whereas it took only 7 yesterday. Please do something!”
-- A Mondrian user, 2009 --
4
“I like the cool new features of Mondrian, but in my setting, drawing my visualization takes 10 seconds, whereas it took only 7 yesterday. Please do something!”
-- A Mondrian user, 2009 --
54.8% {11501ms} MOCanvas>>drawOn: 54.8% {11501ms} MORoot(MONode)>>displayOn: 30.9% {6485ms} MONode>>displayOn: | 18.1% {3799ms} MOEdge>>displayOn: ... | 8.4% {1763ms} MOEdge>>displayOn: | | 8.0% {1679ms} MOStraightLineShape>>display:on: | | 2.6% {546ms} FormCanvas>>line:to:width:color: ... 23.4% {4911ms} MOEdge>>displayOn: ...
Result of Pharo profiler
5
32.9% {6303ms} MOCanvas>>drawOn: 32.9% {6303ms} MORoot(MONode)>>displayOn: 24.4% {4485ms} MONode>>displayOn: | 12.5% {1899ms} MOEdge>>displayOn: ... | 4.2% {1033ms} MOEdge>>displayOn: | | 6.0% {1679ms} MOStraightLineShape>>display:on: | | 2.4% {546ms} FormCanvas>>line:to:width:color: ... 8.5% {2112ms} MOEdge>>displayOn: ...
Yesterday version
6
7
On my machine I find 11 and 6 seconds. What’s going on?
“I like the cool new features of Mondrian, but in my setting, drawing my visualization takes 10 seconds, whereas it took only 7 yesterday. Please do something!”
-- A Mondrian user, 2009 --
How profilers work
Sampling the method call stack every 10 ms
A counter is associated to each frame
Each counter is incremented when being sampled
8
How profilers work
Sampling the method call stack every 10 ms
A counter is associated to each frame
Each counter is incremented when being sampled
Canvas drawOn: (1)MORoot displayOn: (1)MONode displayOn: (1)
Time = t
method call stack
9
How profilers work
Sampling the method call stack every 10 ms
A counter is associated to each frame
Each counter is incremented when being sampled
Canvas drawOn: (2)MORoot displayOn: (2)MONode displayOn: (2)
Time = t + 10 ms
MOEdge displayOn: (1)
method call stack
10
How profilers work
Sampling the method call stack every 10 ms
A counter is associated to each frame
Each counter is incremented when being sampled
Canvas drawOn: (3)MORoot displayOn: (3)MONode setCache (1)
Time = t + 20 ms
method call stack
11
How profilers work
The counter is used to estimate the amount of time spent
MONode setCache (1)
MOEdge displayOn: (1)
MONode displayOn: (2)
MORoot displayOn: (3)
Canvas drawOn: (3)
12
How profilers work
The counter is used to estimate the amount of time spent
MONode setCache (1) => 10 ms
MOEdge displayOn: (1) => 10 ms
MONode displayOn: (2) => 20 ms
MORoot displayOn: (3) => 30 ms
Canvas drawOn: (3) => 30 ms
13
Problem with execution sampling #1
Strongly dependent on the executing environment
CPU, memory management, threads, virtual machine, processes
Listening at a mp3 may perturb your profile
14
Problem with execution sampling #2
Non-determinism
Even using the same environment does not help
“30000 factorial” takes between 3 803 and 3 869 ms
15
Problem with execution sampling #3
Lack of portability
Profiles are not reusable across platform
Buying a new laptop will invalidate the profile you made yesterday
16
Counting messages to the rescue
Pharo is a Smalltalk dialect
Intensively based on sending message
Almost “Optimization-free compiler”
Why not to count messages instead of execution time?
17
Counting messages
Wallet >> increaseByOne money := money + 1
Wallet >> addBonus self increaseByOne; increaseByOne; increaseByOne.
aWallet addBonus=> 6 messages sent
18
Does this really work?
What about the program?
MyClass >> main self waitForUserClick
We took scenarios from unit tests, which do not rely on user input
19
Experiment A
The number of sent messages related to the average execution time over multiple executions
0
100000000
200000000
300000000
400000000
0 10000 20000 30000 40000
times (ms)
mes
sage
sen
ds
100 x 106
200 x 106
300 x 106
400 x 106
application
20
Experiment B
The number of sent messages more stable than the execution time over multiple executions
Application time taken (ms) # sent messages ctime% cmessages%Collections 32 317 334 359 691 16.67 1.05Mondrian 33 719 292 140 717 5.54 1.44Nile 29 264 236 817 521 7.24 0.22Moose 25 021 210 384 157 24.56 2.47SmallDude 13 942 150 301 007 23.93 0.99Glamour 10 216 94 604 363 3.77 0.14Magritte 2 485 37 979 149 2.08 0.85PetitParser 1 642 31 574 383 46.99 0.52Famix 1 014 6 385 091 18.30 0.06DSM 4 012 5 954 759 25.71 0.17ProfStef 247 3 381 429 0.77 0.10Network 128 2 340 805 6.06 0.44AST 37 677 439 1.26 0.46XMLParser 36 675 205 32.94 0.46Arki 30 609 633 1.44 0.35ShoutTests 19 282 313 5.98 0.11
Average 13.95 0.61
Table 2. Applications considered in our experiment (second and third columnsare average over 10 runs)
Estimating the sample regression line. For sake of completeness and providingeasy-to-reproduce results, we provide the necessary statistical material. Comple-mentary information may be easily obtained from standard statistical books [11].
For the least squares regression line y = a+b x, we have the following formulasfor estimating a sample regression line:
b =SS
xy
SSxx
a = y � b x
where y and x are the average of all y values and x values, respectively. They variable corresponds to the # sent messages column and x to time taken
(ms) in the table given above.
SSxy
=X
xy � (P
x)(P
y)
n
SSxx
=X
x
2 � (P
x)2
n
where n is number of samples (i.e., 16, the number of applications we haveprofiled). SS stands for “sum of squares.” The standard deviation of error forthe sample data is obtained from:
s
e
=
rPSS
yy
� b SS
xy
n� 2where SS
yy
=X
y
2 � (P
y)2
n
In the above formula, n�2 represent the degrees of freedom for the regressionmodel. Finally, the standard deviation of b is obtained with s
b
= s
epSS
xx
.
24
21
0
2500000
5000000
7500000
10000000
0 75 150 225 300time (ms)
num
ber o
f met
hod
invo
catio
ns
2.5 x 106
5.0 x 106
7.5 x 106
10.0 x 106
method
Experiment C
The number of sent messages as useful as the execution time to identify an execution bottleneck
22
Compteur
23
CompteurMethod>> run: methodName with: args in: receiver | oldNumberOfCalls v | oldNumberOfCalls := self getNumberOfCalls.
v := originalMethod valueWithReceiver: receiver arguments: args.
numberOfCalls := (self getNumberOfCalls - oldNumberOfCalls) + numberOfCalls - 5. ˆ v
New primitive in the VM
24
CompteurMethod>> run: methodName with: args in: receiver | oldNumberOfCalls v | oldNumberOfCalls := self getNumberOfCalls.
v := originalMethod valueWithReceiver: receiver arguments: args.
numberOfCalls := (self getNumberOfCalls - oldNumberOfCalls) + numberOfCalls - 5. ˆ v
Cost of the instrumentation
25
0
750
1500
2250
3000
0 10000 20000 30000 400001
10
100
1000
10000
0 10000 20000 30000 40000
(a) Linear scale (b) Logarithmic scale
Overhead (%) Overhead (%)
Execution time (ms) Execution time (ms)
Contrasting Execution Sampling with Message Counting
No need for sampling
Independent from the execution environment
Stable measurements
26
Application #1Counting messages in unit testing
CollectionTest>>testInsertion self assert: [ Set new add: 1] fasterThan: [Set new add: 1; add: 2]
27
MondrianSpeedTest>> testLayout2 | view1 view2 | view1 := MOViewRenderer new. view1 nodes: (Collection allSubclasses). view1 edgesFrom: #superclass. view1 treeLayout.
view2 := MOViewRenderer new. view2 nodes: (Collection withAllSubclasses). view2 edgesFrom: #superclass. view2 treeLayout.
self assertIs: [ view1 root applyLayout ] fasterThan: [ view2 root applyLayout ]
28
Application #1Counting messages in unit testing
29
Application #2Differencing profiling
Comparison of two successive versions of a software
(not in the paper)
30
Application #2Differencing profiling
Comparison of two successive versions of Mondrian
(not in the paper)
More in the paper
Linear regression model
We replay some optimizations we had in our previous work
A methodology to evaluate profiler stability over multiple run
All the material to reproduce the experiments
31
Summary
Counting method invocation is a more advantageous profiling technique, in Pharo
Stable correlation between message sending and average execution time
32
Closing words
The same abstractions are used to profile applications written in C and in Java
Which objects is responsible of a slowdown?
Which arguments make a method call slow?
...
33
34
Counting message as a proxy for average execution timeAlexandre Bergelhttp://bergel.eu
0
100000000
200000000
300000000
400000000
0 10000 20000 30000 40000
times (ms)
mes
sage
sen
ds
100 x 106
200 x 106
300 x 106
400 x 106
0
2500000
5000000
7500000
10000000
0 75 150 225 300time (ms)
num
ber o
f met
hod
invo
catio
ns
2.5 x 106
5.0 x 106
7.5 x 106
10.0 x 106
CollectionTest>>testInsertionself assert: [Set new add: 1] fasterThan: [Set new add: 1; add: 2]
0
750
1500
2250
3000
0 10000 20000 30000 400001
10
100
1000
10000
0 10000 20000 30000 40000
(a) Linear scale (b) Logarithmic scale
Overhead (%) Overhead (%)
Execution time (ms) Execution time (ms)