measurement & performancewsumner/teaching/745/03-measurement.pdf · benchmarking we must reason...
TRANSCRIPT
![Page 2: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/2.jpg)
Performance & Measurement
● Real development must manage resources
![Page 3: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/3.jpg)
Performance & Measurement
● Real development must manage resources– Time– Memory– Open connections– VM instances– Energy consumption– ...
![Page 4: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/4.jpg)
Performance & Measurement
● Real development must manage resources– Time– Memory– Open connections– VM instances– Energy consumption– ...
● Resource usage is one form of performance– Performance – a measure of nonfunctional behavior of a program
![Page 5: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/5.jpg)
Performance & Measurement
● Real development must manage resources– Time– Memory– Open connections– VM instances– Energy consumption– ...
● Resource usage is one form of performance– Performance – a measure of nonfunctional behavior of a program
● We often need to assess performance or a change in performanceData Structure A Data Structure Bvs
![Page 6: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/6.jpg)
Performance & Measurement
● Real development must manage resources– Time– Memory– Open connections– VM instances– Energy consumption– ...
● Resource usage is one form of performance– Performance – a measure of nonfunctional behavior of a program
● We often need to assess performance or a change in performanceData Structure A Data Structure Bvs
How would you approach this in a data structures course?
![Page 7: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/7.jpg)
Performance & Measurement
● Performance assessment is deceptively hard[Demo/Exercise]
![Page 8: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/8.jpg)
Performance & Measurement
● Performance assessment is deceptively hard– Modern systems involve complex actors
![Page 9: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/9.jpg)
Performance & Measurement
● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate
![Page 10: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/10.jpg)
Performance & Measurement
● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves
![Page 11: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/11.jpg)
Performance & Measurement
● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves
● Good performance evaluation should be rigorous & scientific
![Page 12: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/12.jpg)
Performance & Measurement
● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves
● Good performance evaluation should be rigorous & scientific– The same process applies in development as in good research
![Page 13: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/13.jpg)
Performance & Measurement
● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves
● Good performance evaluation should be rigorous & scientific– The same process applies in development as in good research
![Page 14: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/14.jpg)
Performance & Measurement
● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves
● Good performance evaluation should be rigorous & scientific– The same process applies in development as in good research
1) Clear claims2) Clear evidence3) Correct reasoning from evidence to claims
![Page 15: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/15.jpg)
Performance & Measurement
● Performance assessment is deceptively hard– Modern systems involve complex actors– Theoretical models may be too approximate– Even with the best intentions we can deceive ourselves
● Good performance evaluation should be rigorous & scientific– The same process applies in development as in good research
1) Clear claims2) Clear evidence3) Correct reasoning from evidence to claims
– And yet this is challenging to get right!
![Page 16: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/16.jpg)
Performance & Measurement [Blackburn et al.]
![Page 17: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/17.jpg)
Performance & Measurement [Blackburn et al.]
Scope ofEvaluation
Scope ofClaim/Conclusion
![Page 18: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/18.jpg)
Performance & Measurement [Blackburn et al.]
Scope ofEvaluation
Scope ofClaim/ConclusionValidity
![Page 19: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/19.jpg)
Performance & Measurement [Blackburn et al.]
● Inscrutability– Lack of clarity on actors or relationships– Omission, Ambiguity, Distortion
![Page 20: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/20.jpg)
Performance & Measurement [Blackburn et al.]
● Inscrutability– Lack of clarity on actors or relationships– Omission, Ambiguity, Distortion
● Irreproducibility– Lack of clarity in steps taken or data– Causes:
● Omission of steps
![Page 21: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/21.jpg)
Performance & Measurement [Blackburn et al.]
● Inscrutability– Lack of clarity on actors or relationships– Omission, Ambiguity, Distortion
● Irreproducibility– Lack of clarity in steps taken or data– Causes:
● Omission of steps● Incomplete understanding of factors
![Page 22: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/22.jpg)
Performance & Measurement [Blackburn et al.]
● Inscrutability– Lack of clarity on actors or relationships– Omission, Ambiguity, Distortion
● Irreproducibility– Lack of clarity in steps taken or data– Causes:
● Omission of steps● Incomplete understanding of factors● Confidentiality & omission of data
Example ...
![Page 23: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/23.jpg)
Performance & Measurement [Blackburn et al.]
static int i = 0, j = 0, k = 0;int main() { int g = 0, inc = 1; for (; g<65536; g++) { i += inc; j += inc; k += inc; } return 0;}
Compare gcc -O2 vs -O3
![Page 24: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/24.jpg)
Performance & Measurement [Blackburn et al.]
static int i = 0, j = 0, k = 0;int main() { int g = 0, inc = 1; for (; g<65536; g++) { i += inc; j += inc; k += inc; } return 0;}
Compare gcc -O2 vs -O3
One person may see adeterministic improvement..
![Page 25: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/25.jpg)
Performance & Measurement [Blackburn et al.]
static int i = 0, j = 0, k = 0;int main() { int g = 0, inc = 1; for (; g<65536; g++) { i += inc; j += inc; k += inc; } return 0;}
Compare gcc -O2 vs -O3
One person may see adeterministic improvement..
Another may see adeterministic degradation.
![Page 26: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/26.jpg)
Performance & Measurement [Blackburn et al.]
static int i = 0, j = 0, k = 0;int main() { int g = 0, inc = 1; for (; g<65536; g++) { i += inc; j += inc; k += inc; } return 0;}
Compare gcc -O2 vs -O3
One person may see adeterministic improvement..
Another may see adeterministic degradation.
Both are right.
![Page 27: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/27.jpg)
Performance & Measurement [Blackburn et al.]
static int i = 0, j = 0, k = 0;int main() { int g = 0, inc = 1; for (; g<65536; g++) { i += inc; j += inc; k += inc; } return 0;}
Compare gcc -O2 vs -O3
![Page 28: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/28.jpg)
Performance & Measurement [Blackburn et al.]
● Ignorance – disregarding data or evidence against a claim– Ignoring data points
![Page 29: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/29.jpg)
Performance & Measurement [Blackburn et al.]
● Ignorance – disregarding data or evidence against a claim– Ignoring data points
![Page 30: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/30.jpg)
Performance & Measurement [Blackburn et al.]
● Ignorance – disregarding data or evidence against a claim– Ignoring data points
![Page 31: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/31.jpg)
Performance & Measurement [Blackburn et al.]
● Ignorance – disregarding data or evidence against a claim– Ignoring data points
![Page 32: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/32.jpg)
Performance & Measurement [Blackburn et al.]
● Ignorance – disregarding data or evidence against a claim– Ignoring data points– Ignoring distributions
![Page 33: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/33.jpg)
Performance & Measurement [Blackburn et al.]
● Ignorance – disregarding data or evidence against a claim– Ignoring data points– Ignoring distributions
Gmail latency
![Page 34: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/34.jpg)
Performance & Measurement [Blackburn et al.]
● Ignorance – disregarding data or evidence against a claim– Ignoring data points– Ignoring distributions
Gmail latency
If we reason about average latency,why is it misleading?
![Page 35: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/35.jpg)
Performance & Measurement [Blackburn et al.]
● Ignorance – disregarding data or evidence against a claim– Ignoring data points– Ignoring distributions
Gmail latency
If we reason about average latency,why is it misleading?
What is better?
![Page 36: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/36.jpg)
Performance & Measurement [Blackburn et al.]
● Inappropriateness – claim is derived from facts not present
![Page 37: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/37.jpg)
Performance & Measurement [Blackburn et al.]
● Inappropriateness – claim is derived from facts not present– Bad metrics (e.g. execution time vs. power)
![Page 38: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/38.jpg)
Performance & Measurement [Blackburn et al.]
● Inappropriateness – claim is derived from facts not present– Bad metrics– Biased samples
![Page 39: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/39.jpg)
Performance & Measurement [Blackburn et al.]
● Inappropriateness – claim is derived from facts not present– Bad metrics– Biased samples– ...
![Page 40: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/40.jpg)
Performance & Measurement [Blackburn et al.]
● Inappropriateness – claim is derived from facts not present– Bad metrics– Biased samples– ...
● Inconsistency – comparing apples to oranges
![Page 41: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/41.jpg)
Performance & Measurement [Blackburn et al.]
● Inappropriateness – claim is derived from facts not present– Bad metrics– Biased samples– ...
● Inconsistency – comparing apples to oranges– Workload variation (e.g. learner effects, time of day)
![Page 42: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/42.jpg)
Performance & Measurement [Blackburn et al.]
● Inappropriateness – claim is derived from facts not present– Bad metrics– Biased samples– ...
● Inconsistency – comparing apples to oranges– Workload variation (e.g. learner effects, time of day)– Incompatible measures (e.g. performance counters across platforms)
![Page 43: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/43.jpg)
AssessingPerformance
![Page 44: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/44.jpg)
Benchmarking
● We must reason rigorously about performance duringassessment, investigation, & improvement
![Page 45: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/45.jpg)
Benchmarking
● We must reason rigorously about performance duringassessment, investigation, & improvement
● Assessing performance is done through benchmarking
![Page 46: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/46.jpg)
Benchmarking
● We must reason rigorously about performance duringassessment, investigation, & improvement
● Assessing performance is done through benchmarking– Microbenchmarks
● Focus on cost of an operation in isolation● Can help identify core performance details & explain causes
![Page 47: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/47.jpg)
Benchmarking
● We must reason rigorously about performance duringassessment, investigation, & improvement
● Assessing performance is done through benchmarking– Microbenchmarks
● Focus on cost of an operation in isolation● Can help identify core performance details & explain causes
– Macrobenchmarks● Real world system performance
![Page 48: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/48.jpg)
Benchmarking
● We must reason rigorously about performance duringassessment, investigation, & improvement
● Assessing performance is done through benchmarking– Microbenchmarks
● Focus on cost of an operation in isolation● Can help identify core performance details & explain causes
– Macrobenchmarks● Real world system performance
– Workloads (inputs) must be chosen carefully either way.● representative, pathological, scenario driven, ...
![Page 49: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/49.jpg)
Benchmarking
● We must reason rigorously about performance duringassessment, investigation, & improvement
● Assessing performance is done through benchmarking– Microbenchmarks
● Focus on cost of an operation in isolation● Can help identify core performance details & explain causes
– Macrobenchmarks● Real world system performance
– Workloads (inputs) must be chosen carefully either way.● representative, pathological, scenario driven, ...
Let’s dig into a common approach to consider issues
![Page 50: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/50.jpg)
Benchmarking
● Suppose we want to run a microbenchmarkstartTime = getCurrentTimeInSeconds();doWorkloadOfInterest();endTime = getCurrentTimeInSeconds();reportResult(endTime – startTime);
![Page 51: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/51.jpg)
Benchmarking
● Suppose we want to run a microbenchmarkstartTime = getCurrentTimeInSeconds();doWorkloadOfInterest();endTime = getCurrentTimeInSeconds();reportResult(endTime – startTime);
What possible issues do you observe?
![Page 52: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/52.jpg)
Benchmarking
● Suppose we want to run a microbenchmark
– Granularity of measurement– Warm up effects– Nondeterminism– Size of workload– System interference– Frequency scaling?– Interference of other workloads?– Alignment?
startTime = getCurrentTimeInSeconds();doWorkloadOfInterest();endTime = getCurrentTimeInSeconds();reportResult(endTime – startTime);
![Page 53: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/53.jpg)
Benchmarking
● Granularity & Units– Why is granularity a problem?– What are alternatives to getCurrentTimeInSeconds()?
![Page 54: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/54.jpg)
Benchmarking
● Granularity & Units– Why is granularity a problem?– What are alternatives to getCurrentTimeInSeconds()?– What if I want to predict performance on a different machine?
![Page 55: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/55.jpg)
Benchmarking
● Granularity & Units– Why is granularity a problem?– What are alternatives to getCurrentTimeInSeconds()?– What if I want to predict performance on a different machine?
● Using cycles instead of wall clock time can be useful, but has its own limitations
![Page 56: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/56.jpg)
Benchmarking
● Warm up time– Why is warm up time necessary in general?
![Page 57: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/57.jpg)
Benchmarking
● Warm up time– Why is warm up time necessary in general?– Why is it especially problematic for systems like Java?
![Page 58: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/58.jpg)
Benchmarking
● Warm up time– Why is warm up time necessary in general?– Why is it especially problematic for systems like Java?– How can we modify our example to facilitate this?
![Page 59: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/59.jpg)
Benchmarking
● Warm up time– Why is warm up time necessary in general?– Why is it especially problematic for systems like Java?– How can we modify our example to facilitate this?
for (…) doWorkloadOfInterest();startTime = getCurrentTimeInSeconds();doWorkloadOfInterest();endTime = getCurrentTimeInSeconds();reportResult(endTime – startTime);
![Page 60: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/60.jpg)
Benchmarking
● Nondeterministic behavior– Will getCurrentTimeInSeconds() always return the same
number?
Why/why not?
![Page 61: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/61.jpg)
Benchmarking
● Nondeterministic behavior– Will getCurrentTimeInSeconds() always return the same
number?– So what reflects a meaningful result?
● Hint: The Law of Large Numbers!
![Page 62: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/62.jpg)
Benchmarking
● Nondeterministic behavior– Will getCurrentTimeInSeconds() always return the same
number?– So what reflects a meaningful result?
● Hint: The Law of Large Numbers!
● By running the same test many times,the arithmetic mean will converge on the expected value
![Page 63: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/63.jpg)
Benchmarking
● Nondeterministic behavior– Will getCurrentTimeInSeconds() always return the same
number?– So what reflects a meaningful result?
● Hint: The Law of Large Numbers!
● By running the same test many times,the arithmetic mean will converge on the expected value
Is this always what you want?
![Page 64: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/64.jpg)
Benchmarking
● A revised (informal) approach:
for (…) doWorkloadOfInterest();startTime = getCurrentTimeInNanos();for (…) doWorkloadOfInterest();endTime = getCurrentTimeInNanos();reportResult(endTime – startTime);
![Page 65: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/65.jpg)
Benchmarking
● A revised (informal) approach:
● This still does not solve everything– Frequency scaling?– Interference of other workloads?– Alignment?
for (…) doWorkloadOfInterest();startTime = getCurrentTimeInNanos();for (…) doWorkloadOfInterest();endTime = getCurrentTimeInNanos();reportResult(endTime – startTime);
![Page 66: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/66.jpg)
Benchmarking
● Now we have a benchmark, how do we interpret/report it?– We must compare
![Page 67: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/67.jpg)
Benchmarking
● Now we have a benchmark, how do we interpret/report it?– We must compare
● Benchmark vs expectation/mental model● Different solutions● Over time
![Page 68: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/68.jpg)
Benchmarking
● Now we have a benchmark, how do we interpret/report it?– We must compare
● Benchmark vs expectation/mental model● Different solutions● Over time
● Results are often normalized against the baseline
![Page 69: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/69.jpg)
Benchmarking
● Now we have a benchmark, how do we interpret/report it?– We must compare– We must remember results are statistical
![Page 70: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/70.jpg)
Benchmarking
● Now we have a benchmark, how do we interpret/report it?– We must compare– We must remember results are statistical
● Show the distribution (e.g. violin plots)
[Seaborn Violinplot]
![Page 71: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/71.jpg)
Benchmarking
● Now we have a benchmark, how do we interpret/report it?– We must compare– We must remember results are statistical
● Show the distribution (e.g. violin plots)● Summarize the distribution (e.g. mean and confidence intervals, box & whisker)
[Seaborn Violinplot] [Seaborn Boxplot][Seaborn Barplot]
![Page 72: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/72.jpg)
Benchmarking
● A benchmark suite comprises multiple benchmarks
Old
New
T1 T2 T3 T4 T5 T6
![Page 73: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/73.jpg)
Benchmarking
● A benchmark suite comprises multiple benchmarks
● Now we have multiple results, how should we consider them?
Old
New
T1 T2 T3 T4 T5 T6
![Page 74: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/74.jpg)
Benchmarking
● A benchmark suite comprises multiple benchmarks
● Now we have multiple results, how should we consider them?– 2 major senarios
● Hypothesis testing– Is solution A different than B?
Old
New
T1 T2 T3 T4 T5 T6
![Page 75: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/75.jpg)
Benchmarking
● A benchmark suite comprises multiple benchmarks
● Now we have multiple results, how should we consider them?– 2 major senarios
● Hypothesis testing– Is solution A different than B?– You can use ANOVA
Old
New
T1 T2 T3 T4 T5 T6
![Page 76: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/76.jpg)
Benchmarking
● A benchmark suite comprises multiple benchmarks
● Now we have multiple results, how should we consider them?– 2 major senarios
● Hypothesis testing● Summary statistics
Old
New
T1 T2 T3 T4 T5 T6
![Page 77: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/77.jpg)
Benchmarking
● A benchmark suite comprises multiple benchmarks
● Now we have multiple results, how should we consider them?– 2 major senarios
● Hypothesis testing● Summary statistics
– Condensing a suite to a single number– Intrinsically lossy, but can still be useful
Old
New
T1 T2 T3 T4 T5 T6
![Page 78: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/78.jpg)
Benchmarking
● A benchmark suite comprises multiple benchmarks
● Now we have multiple results, how should we consider them?– 2 major senarios
● Hypothesis testing● Summary statistics
– Condensing a suite to a single number– Intrinsically lossy, but can still be useful
Old
New
T1 T2 T3 T4 T5 T6
Old: ?New: ?
NewOld :?
![Page 79: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/79.jpg)
Summary Statistics
Averages of r1, r2, …, rN
● Many ways to measure expectation or tendency
![Page 80: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/80.jpg)
Summary Statistics
Averages of r1, r2, …, rN
● Many ways to measure expectation or tendency
● Arithmetic Mean 1N ∑i=1
N
r i
![Page 81: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/81.jpg)
1N∑i=1
N
r i
Summary Statistics
Averages of r1, r2, …, rN
● Many ways to measure expectation or tendency
● Arithmetic Mean
● Harmonic MeanN
∑i=1
N1r i
![Page 82: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/82.jpg)
Summary Statistics
Averages of r1, r2, …, rN
● Many ways to measure expectation or tendency
● Arithmetic Mean
● Harmonic Mean
● Geometric Mean
1N∑i=1
N
r i N
∑i=1
N1r i
N√∏i=1
N
r i
![Page 83: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/83.jpg)
Summary Statistics
Averages of r1, r2, …, rN
● Many ways to measure expectation or tendency
● Arithmetic Mean
● Harmonic Mean
● Geometric Mean
1N∑i=1
N
r i N
∑i=1
N1r i
N√∏i=1
N
r i
Each type means something different and has valid uses
![Page 84: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/84.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing
1N∑i=1
N
r i
![Page 85: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/85.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means
1N∑i=1
N
r i
![Page 86: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/86.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times 1
N∑i=1
N
r i
![Page 87: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/87.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times
for (x in 0 to 4) times[x] = doWorkloadOfInterest();
Handling Nondeterminism
E(time) = arithmean(times)
1N∑i=1
N
r i
![Page 88: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/88.jpg)
Summary Statistics
● Arithmetic Mean – Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times
● Harmonic Mean– Good for reporting rates
N
∑i=1
N1r i
1N∑i=1
N
r i
![Page 89: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/89.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times
● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks
N
∑i=1
N1r i
1N∑i=1
N
r i
![Page 90: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/90.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times
● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks
1N∑i=1
N
r i
N
∑i=1
N1r i
Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec
What is the average throughput? What should it mean?1N∑i=1
N
r i
![Page 91: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/91.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times
● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks
1N∑i=1
N
r i
N
∑i=1
N1r i
Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec
What is the average throughput? What should it mean? Arithmetic = 16.7 p/s
1N∑i=1
N
r i
![Page 92: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/92.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times
● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks
1N∑i=1
N
r i
N
∑i=1
N1r i
Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec
What is the average throughput? What should it mean? Arithmetic = 16.7 p/s Harmonic = 15 p/s
1N∑i=1
N
r i
![Page 93: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/93.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times
● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks
1N∑i=1
N
r i
N
∑i=1
N1r i
Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec
What is the average throughput? What should it mean? Arithmetic = 16.7 p/s Harmonic = 15 p/s 120/16.7 = 7.2 120/15 = 8
1N∑i=1
N
r i
![Page 94: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/94.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times
● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks
1N∑i=1
N
r i
N
∑i=1
N1r i
Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec
What is the average throughput? What should it mean? Arithmetic = 16.7 p/s Harmonic = 15 p/s 120/16.7 = 7.2 120/15 = 8
Identifies the constant raterequired for the same time
1N∑i=1
N
r i
![Page 95: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/95.jpg)
Summary Statistics
● Arithmetic Mean– Good for reporting averages of numbers that mean the same thing– Used for computing sample means– e.g. Timing the same workload many times
● Harmonic Mean– Good for reporting rates– e.g. Required throughput for a set of tasks
1N∑i=1
N
r i
Given tasks t1, t2, & t3 serving 40 pages each:thoughput(t1) = 10 pages/secthoughput(t2) = 20 pages/secthoughput(t3) = 20 pages/sec
What is the average throughput? What should it mean? Arithmetic = 16.7 p/s Harmonic = 15 p/s 120/16.7 = 7.2 120/15 = 8
N
∑i=1
N1r i
Identifies the constant raterequired for the same time
CAVEAT: If the size of each workload changes,a weighted harmonic mean is required!
1N∑i=1
N
r i
![Page 96: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/96.jpg)
Summary Statistics
● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks
N√∏i=1
N
r i
![Page 97: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/97.jpg)
Summary Statistics
● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks
Any idea why it may be useful here?(A bit of a thought experiment)
N√∏i=1
N
r i
![Page 98: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/98.jpg)
Summary Statistics
● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks
OldT1 T2
N√∏i=1
N
r i
![Page 99: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/99.jpg)
Summary Statistics
● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks
OldT1 T2
New 1
T1 T2
What happens to thearithmetic mean?
halved
N√∏i=1
N
r i
![Page 100: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/100.jpg)
Summary Statistics
● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks
OldT1 T2
What happens to thearithmetic mean?
New 2
T1 T2
halved
N√∏i=1
N
r i
![Page 101: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/101.jpg)
Summary Statistics
● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks
OldT1 T2
The (non) change to T1 dominatesany behavior for T2!
N√∏i=1
N
r i
![Page 102: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/102.jpg)
Summary Statistics
● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks
OldT1 T2
Geometric:
√r1×r2
Old
N√∏i=1
N
r i
![Page 103: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/103.jpg)
Summary Statistics
● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks
OldT1 T2
Geometric:
√r1×r2
Old New 1√r1×(
12r2)
N√∏i=1
N
r i
![Page 104: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/104.jpg)
Summary Statistics
● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks
OldT1 T2
Geometric:
√r1×r2
Old New 1√(
12r1)×r2
New 2√r1×(
12r2)
N√∏i=1
N
r i
![Page 105: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/105.jpg)
Summary Statistics
● Geometric Mean – Good for reporting results that mean different things– e.g. Timing results across many different benchmarks
OldT1 T2
Geometric:
√r1×r2
Old New 1√(
12r1)×r2
New 2√r1×(
12r2) =√ 1
2×r1×r2=
N√∏i=1
N
r i
![Page 106: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/106.jpg)
Summary Statistics
● Geometric Mean– Good for reporting results that mean different things– e.g. Timing results across many different benchmarks– A 10% difference in any benchmark affects the final value the same way
N√∏i=1
N
r i
![Page 107: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/107.jpg)
Summary Statistics
● Geometric Mean– Good for reporting results that mean different things– e.g. Timing results across many different benchmarks– A 10% difference in any benchmark affects the final value the same way
Note: It doesn't have an intuitive meaning!It does provides a balanced score of performance.
See [Mashey 2004] for deeper insights.
N√∏i=1
N
r i
![Page 108: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/108.jpg)
Benchmarking
● In practice applying good benchmarking & statistics is made easier via frameworks– Google benchmark (C & C++)– Google Caliper (Java)– Nonius– Celero– Easybench– Pyperf– ...
![Page 109: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/109.jpg)
InvestigatingPerformance
![Page 110: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/110.jpg)
Profiling
● When benchmark results do not make sense, you shouldinvestigate why
![Page 111: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/111.jpg)
Profiling
● When benchmark results do not make sense, you shouldinvestigate why– For resource X, where is X being used, acquired, and or released?
![Page 112: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/112.jpg)
Profiling
● When benchmark results do not make sense, you shouldinvestigate why– For resource X, where is X being used, acquired, and or released?
● Sometimes microbenchmarks provide sufficient insight
![Page 113: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/113.jpg)
Profiling
● When benchmark results do not make sense, you shouldinvestigate why– For resource X, where is X being used, acquired, and or released?
● Sometimes microbenchmarks provide sufficient insight
● In other cases you will want to profile
![Page 114: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/114.jpg)
Profiling
● When benchmark results do not make sense, you shouldinvestigate why– For resource X, where is X being used, acquired, and or released?
● Sometimes microbenchmarks provide sufficient insight
● In other cases you will want to profile– Collect additional information about resources in an execution– The nature of the tool will depend on the resource and the objective
![Page 115: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/115.jpg)
Profiling
● When benchmark results do not make sense, you shouldinvestigate why– For resource X, where is X being used, acquired, and or released?
● Sometimes microbenchmarks provide sufficient insight
● In other cases you will want to profile– Collect additional information about resources in an execution– The nature of the tool will depend on the resource and the objective
You should already be familiar with tools like gprof or jprofile.We’ll examine some more advanced profilers now.
![Page 116: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/116.jpg)
Heap profiling
● Suppose I have a task and it consumes all memory– Note: This is not hypothetical. This often happens with grad students!
![Page 117: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/117.jpg)
Heap profiling
● Suppose I have a task and it consumes all memory– Note: This is not hypothetical. This often happens with grad students!– If I can identify where & why memory is consumed, I can remediate
● Maybe better algorithm● Maybe competent use of data structures....
![Page 118: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/118.jpg)
Heap profiling
● Suppose I have a task and it consumes all memory– Note: This is not hypothetical. This often happens with grad students!– If I can identify where & why memory is consumed, I can remediate
● Maybe better algorithm● Maybe competent use of data structures....
● Heap profilers track the allocated memory in a program& their provenance
![Page 119: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/119.jpg)
Heap profiling
● Suppose I have a task and it consumes all memory– Note: This is not hypothetical. This often happens with grad students!– If I can identify where & why memory is consumed, I can remediate
● Maybe better algorithm● Maybe competent use of data structures....
● Heap profilers track the allocated memory in a program& their provenance– Can identify hotspots, bloat, leaks, short lived allocations, ...
![Page 120: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/120.jpg)
Heap profiling
● Suppose I have a task and it consumes all memory– Note: This is not hypothetical. This often happens with grad students!– If I can identify where & why memory is consumed, I can remediate
● Maybe better algorithm● Maybe competent use of data structures....
● Heap profilers track the allocated memory in a program& their provenance– Can identify hotspots, bloat, leaks, short lived allocations, ...– Usually sample based, but sometimes event based– e.g. Massif, Heaptrack, ...
![Page 121: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/121.jpg)
Heap profiling
intmain() { std::vector<std::unique_ptr<long[]>> data{DATA_SIZE};
for (auto &element : data) { element = std::make_unique<long[]>(BLOCK_SIZE); // do something with element std::this_thread::sleep_for(std::chrono::milliseconds(100)); }
std::this_thread::sleep_for(std::chrono::seconds(1)); return 0;}
![Page 122: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/122.jpg)
Heap profiling
intmain() { std::vector<std::unique_ptr<long[]>> data{DATA_SIZE};
for (auto &element : data) { element = std::make_unique<long[]>(BLOCK_SIZE); // do something with element std::this_thread::sleep_for(std::chrono::milliseconds(100)); }
std::this_thread::sleep_for(std::chrono::seconds(1)); return 0;}valgrind --time-unit=ms --tool=massif <program invocation>heaptrack <program invocation>
massif-visualizer massif.out.<PID>heaptrack_gui <path to data>
![Page 123: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/123.jpg)
Heap profilingintmain() { std::vector<std::unique_ptr<long[]>> data{DATA_SIZE};
for (auto &element : data) { element = std::make_unique<long[]>(BLOCK_SIZE); // do something with element std::this_thread::sleep_for(std::chrono::milliseconds(100)); element.reset(); std::this_thread::sleep_for(std::chrono::milliseconds(100)); }
std::this_thread::sleep_for(std::chrono::seconds(1)); return 0;}
How do we expect this to differ?
![Page 124: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/124.jpg)
CPU Profiling & Flame Graphs
● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...
![Page 125: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/125.jpg)
CPU Profiling & Flame Graphs
● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...
● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually
![Page 126: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/126.jpg)
CPU Profiling & Flame Graphs
● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...
● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually
main()
foo() bar()
baz() quux()
70% 20%
70% 20%
![Page 127: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/127.jpg)
CPU Profiling & Flame Graphs
● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...
● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually
● Flame graphs provide a way of structuring and visualizing substantial profiling information
![Page 128: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/128.jpg)
CPU Profiling & Flame Graphs
● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...
● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually
● Flame graphs provide a way of structuring and visualizing substantial profiling information
main()
foo() bar()
baz() quux()
![Page 129: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/129.jpg)
CPU Profiling & Flame Graphs
● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...
● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually
● Flame graphs provide a way of structuring and visualizing substantial profiling information
main()
foo() bar()
baz() quux()
It is easier to see thatoptimizing baz() could be useful.
![Page 130: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/130.jpg)
CPU Profiling & Flame Graphs
● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...
● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually
● Flame graphs provide a way of structuring and visualizing substantial profiling information– Consumers of CPU on top
main()
foo() bar()
baz() quux()
![Page 131: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/131.jpg)
CPU Profiling & Flame Graphs
● When CPU is the resource, investigate where the CPU is spent– Classic profilers – gprof, oprofile, jprof, ...
● Classic CPU profilers capture a lot of data and force the user to explore & explain it manually
● Flame graphs provide a way of structuring and visualizing substantial profiling information– Consumers of CPU on top– ancestry, proportions, components can all be clearly identified
main()
foo() bar()
baz() quux()
![Page 132: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/132.jpg)
CPU Profiling & Flame Graphs
● Can extract rich information by embedding interesting things in colors
[Gregg, ATC 2017]
![Page 133: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/133.jpg)
CPU Profiling & Flame Graphs
● Flame graphs are not just limited to CPU time!– Any countable resource or event can be organized & visualized
![Page 134: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/134.jpg)
CPU Profiling & Flame Graphs
● Flame graphs are not just limited to CPU time!– Any countable resource or event can be organized & visualized
● You can also automatically generate them with clang & chrome– See project X-Ray in clang
![Page 135: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/135.jpg)
Perf & event profiling
● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses
![Page 136: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/136.jpg)
Perf & event profiling
● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses
How well does sample based profiling work for these?
![Page 137: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/137.jpg)
Perf & event profiling
● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses
How well does sample based profiling work for these?
● Instead, we can leverage low level system counters via tools like perf
![Page 138: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/138.jpg)
Perf & event profiling
● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses
How well does sample based profiling work for these?
● Instead, we can leverage low level system counters via tools like perf
![Page 139: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/139.jpg)
Perf & event profiling
● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses
How well does sample based profiling work for these?
● Instead, we can leverage low level system counters via tools like perfperf stat -e <events> -g <command>perf record -e <events> -g <command>perf reportperf list
![Page 140: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/140.jpg)
Perf & event profiling
● Sometimes low-level architectural effects determine the performance– Cache misses– Misspeculations– TLB misses
How well does sample based profiling work for these?
● Instead, we can leverage low level system counters via tools like perfperf stat -e <events> -g <command>perf record -e <events> -g <command>perf reportperf list
task-clock,context-switches,cpu-migrations,page-faults,cycles,instructions,branches,branch-misses,cache-misses,cycle_activity.stalls_total
events like
![Page 141: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/141.jpg)
Profiling for opportunities
● Causal profiling
...fo
o()
...
foo
()
foo
()
foo
()
![Page 142: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/142.jpg)
Profiling for opportunities
● Causal profiling
What should I look at to speed things up?
...fo
o()
...
foo
()
foo
()
foo
()
![Page 143: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/143.jpg)
Profiling for opportunities
● Causal profiling
What should I look at to speed things up?
... ...
foo
()
...fo
o()
...
foo
()
foo
()
foo
()
...
foo
()
...
foo
()
foo
()
foo
()
foo
()fo
o()
foo
()
![Page 144: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/144.jpg)
Profiling for opportunities
● Causal profiling
● Profiling for parallelism
...
foo
()
...fo
o()
foo
()
foo
()
![Page 145: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/145.jpg)
Profiling for opportunities
● Causal profiling
● Profiling for parallelism
...
foo
()
...fo
o()
foo
()
foo
()
foo
()
...fo
o()
...
foo
()fo
o()
![Page 146: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/146.jpg)
ImprovingPerformance
![Page 147: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/147.jpg)
Improving Performance
● We can attack performance at several levels
![Page 148: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/148.jpg)
Improving Performance
● We can attack performance at several levels– Compilers & tuning the build process
![Page 149: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/149.jpg)
Improving Performance
● We can attack performance at several levels– Compilers & tuning the build process– Managing the organization of data
![Page 150: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/150.jpg)
Improving Performance
● We can attack performance at several levels– Compilers & tuning the build process– Managing the organization of data– Managing the organization of code
![Page 151: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/151.jpg)
Improving Performance
● We can attack performance at several levels– Compilers & tuning the build process– Managing the organization of data– Managing the organization of code– Better algorithms & algorithmic modeling
![Page 152: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/152.jpg)
Improving Performance
● We can attack performance at several levels– Compilers & tuning the build process– Managing the organization of data– Managing the organization of code– Better algorithms & algorithmic modeling
● In all cases, we only care about improving performance of hot code
![Page 153: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/153.jpg)
Improving Performance
● We can attack performance at several levels– Compilers & tuning the build process– Managing the organization of data– Managing the organization of code– Better algorithms & algorithmic modeling
● In all cases, we only care about improving performance of hot code
● Optimizing cold code can hurt software
![Page 154: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/154.jpg)
Compiling for performance
● Enabling optimizations...
![Page 155: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/155.jpg)
Compiling for performance
● Enabling optimizations...
● LTO (Link Time Optimization / Whole Program Optimization)
![Page 156: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/156.jpg)
Compiling for performance
● Enabling optimizations...
● LTO (Link Time Optimization / Whole Program Optimization)
foo.c foo.o
bar.c bar.o
Compile &
Optimize
Compile &
Optimize
![Page 157: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/157.jpg)
Compiling for performance
● Enabling optimizations...
● LTO (Link Time Optimization / Whole Program Optimization)
foo.c foo.o
bar.c bar.o
Compile &
Optimize
Compile &
Optimize
programLink
![Page 158: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/158.jpg)
Compiling for performance
● Enabling optimizations...
● LTO (Link Time Optimization / Whole Program Optimization)
foo.c foo.o
bar.c bar.o
Compile &
Optimize
Compile &
Optimize
program(.o)Merge programOptimize &Link
![Page 159: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/159.jpg)
Compiling for performance
● Enabling optimizations...
● LTO
● PGO/FDO (Profile Guided Optimization/Feedback Directed Optimization)– Incorporate profile information in optimization decisions
![Page 160: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/160.jpg)
Compiling for performance
● Enabling optimizations...
● LTO
● PGO/FDO (Profile Guided Optimization/Feedback Directed Optimization)– Incorporate profile information in optimization decisions
funPtr = ?...
funPtr()
![Page 161: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/161.jpg)
Compiling for performance
● Enabling optimizations...
● LTO
● PGO/FDO (Profile Guided Optimization/Feedback Directed Optimization)– Incorporate profile information in optimization decisions
funPtr = ?...
funPtr()
foo(){A}
bar(){B}
![Page 162: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/162.jpg)
Compiling for performance
● Enabling optimizations...
● LTO
● PGO/FDO (Profile Guided Optimization/Feedback Directed Optimization)– Incorporate profile information in optimization decisions
funPtr = ?...
funPtr()
foo(){A}
bar(){B}
funPtr = ?...
if funPtr == bar: B’else: funPtr()
![Page 163: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/163.jpg)
Compiling for performance
● Enabling optimizations...
● LTO
● PGO/FDO (Profile Guided Optimization/Feedback Directed Optimization)– Incorporate profile information in optimization decisions
funPtr = ?...
funPtr()
foo(){A}
bar(){B}
funPtr = ?...
if funPtr == bar: B’else: funPtr()
[Visual Studio profile guided optimizations]
![Page 164: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/164.jpg)
Compiling for performance
● Enabling optimizations...
● LTO
● PGO
● Layout optimization (BOLT and otherwise)
![Page 165: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/165.jpg)
Compiling for performance
● Enabling optimizations...
● LTO
● PGO
● Layout optimization (BOLT and otherwise)
● Polyhedral analysis
![Page 166: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/166.jpg)
Optimizing Your Data
● The basic directions of data optimizations
![Page 167: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/167.jpg)
Optimizing Your Data
● The basic directions of data optimizations– Ensure the data you want is available for the tasks you have
![Page 168: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/168.jpg)
Optimizing Your Data
● The basic directions of data optimizations– Ensure the data you want is available for the tasks you have– Do not spend time processing you do not need
![Page 169: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/169.jpg)
Optimizing Your Data
● The basic directions of data optimizations– Ensure the data you want is available for the tasks you have– Do not spend time processing you do not need– Do not spend extra time managing the data at the system level
![Page 170: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/170.jpg)
Optimizing Your Data
● The basic directions of data optimizations– Ensure the data you want is available for the tasks you have– Do not spend time processing you do not need– Do not spend extra time managing the data at the system level
Several aspects of high level design may be in tension with these
![Page 171: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/171.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache
struct S1 { char a;};sizeof(S1) == 1
struct S2 { uint32_t b;};sizeof(S2) == 4
![Page 172: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/172.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache
struct S1 { char a;};sizeof(S1) == 1
struct S2 { uint32_t b;};sizeof(S2) == 4
struct S3 { char a; uint32_t b; char c;};
sizeof(S3) == ?
![Page 173: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/173.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache
struct S1 { char a;};sizeof(S1) == 1
struct S2 { uint32_t b;};sizeof(S2) == 4
struct S3 { char a; uint32_t b; char c;};
sizeof(S3) == 12
uint32_t must be 4 byte aligned.Padding is inserted!
![Page 174: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/174.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache
struct S1 { char a;};sizeof(S1) == 1
struct S2 { uint32_t b;};sizeof(S2) == 4
struct S3 { char a; uint32_t b; char c;};
sizeof(S3) == 12
struct S4 { char a; char c; uint32_t b;};
sizeof(S3) == 8
![Page 175: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/175.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache
struct S1 { char a;};sizeof(S1) == 1
struct S2 { uint32_t b;};sizeof(S2) == 4
struct S3 { char a; uint32_t b; char c;};
sizeof(S3) == 12
struct S4 { char a; char c; uint32_t b;};
Careful ordering improvescache utilization
sizeof(S3) == 8
![Page 176: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/176.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more
![Page 177: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/177.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more
● Operate on compressed data
![Page 178: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/178.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more
● Operate on compressed data● Steal low/high order bits of pointers
![Page 179: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/179.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more
● Operate on compressed data● Steal low/high order bits of pointers
template <class PointedTo>class PointerValuePair<PointedTo,int> { uintptr_t compact; PointedTo* getP() { return reinterpret_cast<PointedTo*>(compact & ~0xFFFFFFF8); } Value getV() { return compact & 0x00000007; }};
![Page 180: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/180.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more
● Operate on compressed data● Steal low/high order bits of pointers
template <class PointedTo>class PointerValuePair<PointedTo,int> { uintptr_t compact; PointedTo* getP() { return reinterpret_cast<PointedTo*>(compact & ~0xFFFFFFF8); } Value getV() { return compact & 0x00000007; }};
![Page 181: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/181.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more
● Operate on compressed data● Steal low/high order bits of pointers
template <class PointedTo>class PointerValuePair<PointedTo,int> { uintptr_t compact; PointedTo* getP() { return reinterpret_cast<PointedTo*>(compact & ~0xFFFFFFF8); } Value getV() { return compact & 0x00000007; }};
![Page 182: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/182.jpg)
Optimizing Your Data
● Basic structure packing– Smaller aggregates consume less cache– Carefully encoding data or reusing storage can do more
● Operate on compressed data● Steal low/high order bits of pointers
template <class PointedTo>class PointerValuePair<PointedTo,int> { uintptr_t compact; PointedTo* getP() { return reinterpret_cast<PointedTo*>(compact & ~0xFFFFFFF8); } Value getV() { return compact & 0x00000007; }};
![Page 183: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/183.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
![Page 184: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/184.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
std::list numbers = ...for (auto& i : numbers) { ...} We already saw this.
Traversing a linked list is expensive!
![Page 185: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/185.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
std::list numbers = ...for (auto& i : numbers) { ...}
3 1 4 1
![Page 186: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/186.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
std::list numbers = ...for (auto& i : numbers) { ...}
3 1 4 1
These elements are unlikely to be in cacheand unlikely to be prefetched automatically.
![Page 187: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/187.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
std::list numbers = ...for (auto& i : numbers) { ...}
3 1 4 1
![Page 188: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/188.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
std::list numbers = ...for (auto& i : numbers) { ...}
3 1 4 1
![Page 189: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/189.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
std::list numbers = ...for (auto& i : numbers) { ...}
3 1 4 1
Stall
![Page 190: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/190.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
std::list numbers = ...for (auto& i : numbers) { ...}
3 1 4 1
Stall
![Page 191: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/191.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
std::list numbers = ...for (auto& i : numbers) { ...}
3 1 4 1
Stall
![Page 192: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/192.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
std::list numbers = ...for (auto& i : numbers) { ...}
3 1 4 1
Stall
![Page 193: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/193.jpg)
Optimizing Your Data
● Managing indirection– Pointers and indirection can stall the CPU while waiting on memory
How does this relate to design tools that we have seen?
![Page 194: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/194.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization
![Page 195: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/195.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining
![Page 196: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/196.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining
struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};
![Page 197: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/197.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining
for (Dog& d : dogs) { play(d.friendliness, d.hobby);}
struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};
![Page 198: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/198.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining
for (Dog& d : dogs) { play(d.friendliness, d.hobby);}
struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};
![Page 199: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/199.jpg)
struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining
for (Dog& d : dogs) { play(d.friendliness, d.hobby);}
![Page 200: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/200.jpg)
struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining
for (Dog& d : dogs) { play(d.friendliness, d.hobby);}
We can try to push the cold fieldsout of the cache
![Page 201: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/201.jpg)
struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining
struct HotDog { double friendliness; std::string hobby; unique_ptr<Cold> cold;};
struct Cold { uint32_t age; uint32_t ownerID; Food treats[10];};
for (Dog& d : dogs) { play(d.friendliness, d.hobby);}
![Page 202: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/202.jpg)
struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining
struct HotDog { double friendliness; std::string hobby; unique_ptr<Cold> cold;};
struct Cold { uint32_t age; uint32_t ownerID; Food treats[10];};
for (Dog& d : dogs) { play(d.friendliness, d.hobby);}
Benefits depend on the size of Cold & the access patterns
![Page 203: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/203.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays
struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};
![Page 204: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/204.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays
struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};
struct DogManager { std::vector<uint32_t> friendliness; std::vector<uint32_t> age; std::vector<uint32_t> ownerID; std::vector<std::string> hobby; std::vector<std::array<Food,10>> treats;};
![Page 205: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/205.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays
struct Dog { uint32_t friendliness; uint32_t age; uint32_t ownerID; std::string hobby; Food treats[10];};
struct DogManager { std::vector<uint32_t> friendliness; std::vector<uint32_t> age; std::vector<uint32_t> ownerID; std::vector<std::string> hobby; std::vector<std::array<Food,10>> treats;};
for (auto i : range(dogs)) { play(friendliness[i], hobby[i]);}
![Page 206: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/206.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays
Dog1 Dog2
![Page 207: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/207.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays
Dog1 Dog2friend hobby friend hobby
![Page 208: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/208.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays
Dog1 Dog2
friendliness
Dog1
age
hobby
Dog2 Dog3 Dog4
![Page 209: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/209.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays
Dog1 Dog2
friendliness
Dog1
age
hobby
Dog2 Dog3 Dog4 You can pick and choose while stillgetting good locality
![Page 210: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/210.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays
Dog1 Dog2
friendliness
Dog1
age
hobby
Dog2 Dog3 Dog4 You can pick and choose while stillgetting good locality
![Page 211: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/211.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays
Dog1 Dog2
friendliness
Dog1
age
hobby
Dog2 Dog3 Dog4 You can pick and choose while stillgetting good locality
Easier for compilers to vectorize
![Page 212: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/212.jpg)
Optimizing Your Data
● Grouping things that are accessed together– Guiding spatial design by temporal locality can improve cache utilization– Cold field outlining– AoS vs SoA (Array of Structs vs Struct of Arrays
Dog1 Dog2
friendliness
Dog1
age
hobby
Dog2 Dog3 Dog4 You can pick and choose while stillgetting good locality
Easier for compilers to vectorize
Also a foundation of moderngame engine design (ECS)
![Page 213: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/213.jpg)
Optimizing Your Data
● Loop invariance– Avoid recomputing the same values inside a loop
![Page 214: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/214.jpg)
Optimizing Your Data
● Loop invariance– Avoid recomputing the same values inside a loop
for (auto i : ...) { auto sqrt2 = sqrt(2); auto x = f(i, sqrt2); ...}
auto sqrt2 = sqrt(2);for (auto i : ...) { auto x = f(i, sqrt2); ...}
![Page 215: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/215.jpg)
Optimizing Your Data
● Loop invariance– Avoid recomputing the same values inside a loop– Compilers automate this but cannot always succeed (LICM)
for (auto i : ...) { auto sqrt2 = sqrt(2); auto x = f(i, sqrt2); ...}
auto sqrt2 = sqrt(2);for (auto i : ...) { auto x = f(i, sqrt2); ...}
![Page 216: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/216.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
![Page 217: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/217.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
![Page 218: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/218.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
![Page 219: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/219.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1
![Page 220: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/220.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1 2
![Page 221: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/221.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1 2 3
![Page 222: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/222.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1 2 3 4
![Page 223: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/223.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1 2 3 4 5
Memory accessesare consecutive!
![Page 224: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/224.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
![Page 225: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/225.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1
![Page 226: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/226.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1 2
![Page 227: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/227.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1 2 3
![Page 228: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/228.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1 4 2 3
![Page 229: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/229.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1 4 2 5 3
![Page 230: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/230.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[cols*row + col]); }}
uint32_t marix[rows*cols];for (size_t row = 0; row < rows; ++row) { for (size_t col = 0; col < cols; ++col) { foo(matrix[rows*col + row]); }}
1 4 2 5 3
Memory accessesjump around &
thrash the cache!
![Page 231: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/231.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work
×
![Page 232: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/232.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work
×
![Page 233: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/233.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work
×
![Page 234: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/234.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work
×
![Page 235: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/235.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work
×
![Page 236: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/236.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work
×......
![Page 237: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/237.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work
×...Problem:Using the same layout creates bad locality.
...
![Page 238: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/238.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work
×
Solution: Transpose first.Implement over the transpose instead.
... ...
![Page 239: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/239.jpg)
Optimizing Your Data
● Inner loop locality– The simplest scenarios are like the matrix example we first saw– Matrix operations (e.g. multiplication) can require extra work
×... ...Note: Better solutions further leverage
layout & parallelization.
![Page 240: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/240.jpg)
Optimizing Your Data
● Memory management effects– Data structure packing & access patterns affect deeper system behavior
![Page 241: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/241.jpg)
Optimizing Your Data
● Memory management effects– Data structure packing & access patterns affect deeper system behavior
● What about virtual memory, page tables, & the TLB?
![Page 242: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/242.jpg)
Optimizing Your Data
● Memory management effects– Data structure packing & access patterns affect deeper system behavior
● What about virtual memory, page tables, & the TLB?● What about allocation strategies & fragmentation?
![Page 243: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/243.jpg)
Optimizing Your Data
● Designing with clear ownership policies in mind
![Page 244: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/244.jpg)
Optimizing Your Data
● Designing with clear ownership policies in mind– Resource acquisition should not happen in hot code
![Page 245: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/245.jpg)
Optimizing Your Data
● Designing with clear ownership policies in mind– Resource acquisition should not happen in hot code– Use APIs that express intent & prevent copying
![Page 246: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/246.jpg)
Optimizing Your Data
● Designing with clear ownership policies in mind– Resource acquisition should not happen in hot code– Use APIs that express intent & prevent copying
“std::string is responsible for almost half of all allocations in the Chrome”
![Page 247: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/247.jpg)
Optimizing Your Data
● Designing with clear ownership policies in mind– Resource acquisition should not happen in hot code– Use APIs that express intent & prevent copying
“std::string is responsible for almost half of all allocations in the Chrome”
template<class E>struct Span {
template<class E, auto N>Span(const std::array<E,N>& c);
template<class E>Span(const std::vector<E>& c);
E* first;size_t count;
};
![Page 248: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/248.jpg)
Optimizing Your Code
● Basic ideas for code optimization
![Page 249: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/249.jpg)
Optimizing Your Code
● Basic ideas for code optimization– Avoid branching whenever possible
![Page 250: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/250.jpg)
Optimizing Your Code
● Basic ideas for code optimization– Avoid branching whenever possible
Misspeculating over a branch is costly
![Page 251: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/251.jpg)
Optimizing Your Code
● Basic ideas for code optimization– Avoid branching whenever possible– Make code that does the same thing occur close together temporally
![Page 252: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/252.jpg)
Optimizing Your Code
● Basic ideas for code optimization– Avoid branching whenever possible– Make code that does the same thing occur close together temporally
Leverage the instruction cache if you can
![Page 253: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/253.jpg)
Optimizing Your Code
● Branch prediction & speculation
![Page 254: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/254.jpg)
Optimizing Your Code
● Branch prediction & speculation– On if statements
for (...) { if (foo(c)) { bar(); } else { baz(); }}
A
B
90%
10%
![Page 255: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/255.jpg)
Optimizing Your Code
● Branch prediction & speculation– On if statements
for (...) { if (foo(c)) { bar(); } else { baz(); }}
A
B
Pipeline:90%
10%
A A A
Actual: A
![Page 256: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/256.jpg)
Optimizing Your Code
● Branch prediction & speculation– On if statements
for (...) { if (foo(c)) { bar(); } else { baz(); }}
A
B
Pipeline:90%
10%
A A A
Actual: A A
![Page 257: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/257.jpg)
Optimizing Your Code
● Branch prediction & speculation– On if statements
for (...) { if (foo(c)) { bar(); } else { baz(); }}
A
B
Pipeline:90%
10%
A A A
Actual: A A B
![Page 258: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/258.jpg)
Optimizing Your Code
● Branch prediction & speculation– On if statements
for (...) { if (foo(c)) { bar(); } else { baz(); }}
A
B
Pipeline:90%
10%
A A A
Actual: A A B
Stall, but relatively infrequently
![Page 259: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/259.jpg)
Optimizing Your Code
● Branch prediction & speculation– On if statements
for (...) { if (foo(c)) { bar(); } else { baz(); }}
A
B
51%
49%
![Page 260: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/260.jpg)
Optimizing Your Code
● Branch prediction & speculation– On if statements
for (...) { if (foo(c)) { bar(); } else { baz(); }}
A
B
Pipeline:51%
49%
A A A
Actual: A B
Stall, frequently
![Page 261: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/261.jpg)
Optimizing Your Code
● Branch prediction & speculation– On if statements– On function pointers!
for (...) { foo();}
A
B
bar() {}
baz() {}
51%
49%
![Page 262: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/262.jpg)
Optimizing Your Code
● Branch prediction & speculation– On if statements– On function pointers!
for (...) { foo();}
A
B
bar() {}
baz() {}
The same problems arise51%
49%
![Page 263: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/263.jpg)
Optimizing Your Code
● Branch prediction & speculation– On if statements– On function pointers!
for (...) { foo();}
A
B
bar() {}
baz() {}
The same problems arise51%
49%Consistent call targets
perform better
![Page 264: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/264.jpg)
Optimizing Your Code
● Designing away checks– Repeated checks can be removed by maintaining invariants
![Page 265: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/265.jpg)
Optimizing Your Code
● Designing away checks– Repeated checks can be removed by maintaining invariants
i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1
[Wikipedia’s Insertion Sort]
![Page 266: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/266.jpg)
Optimizing Your Code
● Designing away checks– Repeated checks can be removed by maintaining invariants
i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1
[Wikipedia’s Insertion Sort]
![Page 267: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/267.jpg)
Optimizing Your Code
● Designing away checks– Repeated checks can be removed by maintaining invariants
i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1
[Wikipedia’s Insertion Sort]
Can we turn the semantic checkinto a bounds check?
![Page 268: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/268.jpg)
Optimizing Your Code
● Designing away checks– Repeated checks can be removed by maintaining invariants
i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1
[Wikipedia’s Insertion Sort]
We just guarantee that A startswith the smallest element!
![Page 269: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/269.jpg)
Optimizing Your Code
● Designing away checks– Repeated checks can be removed by maintaining invariants
i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1
[Wikipedia’s Insertion Sort]
k find_smallest(A)← 1swap A[0] and A[k]i 1← 1while i < length(A) j i← 1 while A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1
We just guarantee that A startswith the smallest element!
![Page 270: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/270.jpg)
Optimizing Your Code
● Designing away checks– Repeated checks can be removed by maintaining invariants
i 1← 1while i < length(A) j i← 1 while j > 0 and A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1
[Wikipedia’s Insertion Sort]
A[-1] MIN_VALUE← 1i 1← 1while i < length(A) j i← 1 while A[j-1] > A[j] swap A[j] and A[j-1] j j - 1← 1 i i + 1← 1
We just guarantee that A startswith the smallest element!
![Page 271: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/271.jpg)
Optimizing Algorithms
● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware
![Page 272: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/272.jpg)
Optimizing Algorithms
● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware
● Hybrid algorithms– Constants matter. Use thresholds to select algorithms.
![Page 273: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/273.jpg)
Optimizing Algorithms
● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware
● Hybrid algorithms– Constants matter. Use thresholds to select algorithms.– Use general N logN sorting for N above 300 [Alexandrescu 2019]
![Page 274: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/274.jpg)
Optimizing Algorithms
● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware
● Hybrid algorithms– Constants matter. Use thresholds to select algorithms.– Use general N logN sorting for N above 300 [Alexandrescu 2019]
● Caching & Precomputing
![Page 275: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/275.jpg)
Optimizing Algorithms
● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware
● Hybrid algorithms– Constants matter. Use thresholds to select algorithms.– Use general N logN sorting for N above 300 [Alexandrescu 2019]
● Caching & Precomputing– If you will reuse results, save them and avoid recomputing
![Page 276: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/276.jpg)
Optimizing Algorithms
● Improving real world algorithmic performance comes from recognizing the interplay between theory and hardware
● Hybrid algorithms– Constants matter. Use thresholds to select algorithms.– Use general N logN sorting for N above 300 [Alexandrescu 2019]
● Caching & Precomputing– If you will reuse results, save them and avoid recomputing– If all possible results are compact, just compute a table up front
![Page 277: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/277.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
![Page 278: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/278.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice
![Page 279: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/279.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!
![Page 280: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/280.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world
performance
![Page 281: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/281.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world
performance
A uniform cost modelthrows necessary information away
![Page 282: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/282.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world
performance– We want modeling & algorithms that account for artifacts like:
memory, I/O, consistency & speculation, shapes of workloads
![Page 283: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/283.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world
performance– We want modeling & algorithms that account for artifacts like:
memory, I/O, consistency & speculation, shapes of workloads
● Alternative approaches
![Page 284: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/284.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world
performance– We want modeling & algorithms that account for artifacts like:
memory, I/O, consistency & speculation, shapes of workloads
● Alternative approaches– I/O complexity, I/O efficiency and cache awareness
![Page 285: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/285.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world
performance– We want modeling & algorithms that account for artifacts like:
memory, I/O, consistency & speculation, shapes of workloads
● Alternative approaches– I/O complexity, I/O efficiency and cache awareness
CPU
Memory 1
Memory 2
Block size B
![Page 286: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/286.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world
performance– We want modeling & algorithms that account for artifacts like:
memory, I/O, consistency & speculation, shapes of workloads
● Alternative approaches– I/O complexity, I/O efficiency and cache awareness
CPU
Memory 1
Memory 2
Block size B
Complexity measured in block transfers
![Page 287: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/287.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world
performance– We want modeling & algorithms that account for artifacts like:
memory, I/O, consistency & speculation, shapes of workloads
● Alternative approaches– I/O complexity, I/O efficiency and cache awareness– Cache oblivious algorithms & data structures
![Page 288: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/288.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world
performance– We want modeling & algorithms that account for artifacts like:
memory, I/O, consistency & speculation, shapes of workloads
● Alternative approaches– I/O complexity, I/O efficiency and cache awareness– Cache oblivious algorithms & data structures
Similar to I/O, but agnostic to block size
![Page 289: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/289.jpg)
Optimizing Algorithms
● Better performance modeling & algorithms– The core approaches we use have not adapted to changing contexts
● Classic asymptotic complexity less useful in practice– It uses an abstract machine model that is too approximate!– Constants and artifacts of scale can actually dominate the real world
performance– We want modeling & algorithms that account for artifacts like:
memory, I/O, consistency & speculation, shapes of workloads
● Alternative approaches– I/O complexity, I/O efficiency and cache awareness– Cache oblivious algorithms & data structures– Parameterized complexity
![Page 290: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/290.jpg)
Optimizing Algorithms
● Classic design mistakes [Lu 2012]
![Page 291: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/291.jpg)
Optimizing Algorithms
● Classic design mistakes [Lu 2012]
– Uncoordinated functions (e.g. lack of batching)
for (auto& action : actions) { action.do()}
Action::do() { acquire(mutex) ... release(mutex)}
![Page 292: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/292.jpg)
Optimizing Algorithms
● Classic design mistakes [Lu 2012]
– Uncoordinated functions (e.g. lack of batching)
acquire(mutex)for (auto& action : actions) { action.do()}release(mutex)
for (auto& action : actions) { action.do()}
Action::do() { acquire(mutex) ... release(mutex)}
vs
![Page 293: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/293.jpg)
Optimizing Algorithms
● Classic design mistakes [Lu 2012]
– Uncoordinated functions (e.g. lack of batching)– Skippable functions (e.g. transparent draws)
![Page 294: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/294.jpg)
Optimizing Algorithms
● Classic design mistakes [Lu 2012]
– Uncoordinated functions (e.g. lack of batching)– Skippable functions (e.g. transparent draws)– Poor/unclear synchronization
![Page 295: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/295.jpg)
Optimizing Algorithms
● Classic design mistakes [Lu 2012]
– Uncoordinated functions (e.g. lack of batching)– Skippable functions (e.g. transparent draws)– Poor/unclear synchronization
foo() { bar()}
bar() { baz()}
baz() { quux()}
quux() { random()}
random() { acquire(mutex) ... release(mutex)}
![Page 296: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/296.jpg)
Optimizing Algorithms
● Classic design mistakes [Lu 2012]
– Uncoordinated functions (e.g. lack of batching)– Skippable functions (e.g. transparent draws)– Poor/unclear synchronization– Poor data structure selection
![Page 297: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/297.jpg)
Summary
● Reasoning rigorously about performance is challenging
![Page 298: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/298.jpg)
Summary
● Reasoning rigorously about performance is challenging
● Good tooling can allow you to investigate performance well
![Page 299: Measurement & Performancewsumner/teaching/745/03-measurement.pdf · Benchmarking We must reason rigorously about performance during assessment, investigation, & improvement Assessing](https://reader035.vdocument.in/reader035/viewer/2022070714/5ed65a051f01621abd2d879a/html5/thumbnails/299.jpg)
Summary
● Reasoning rigorously about performance is challenging
● Good tooling can allow you to investigate performance well
● We can improve performance through– compilers– managing data– managing code– better algorithmic thinking