this page intentionally left blank

Mats Björkman, MdH

This page intentionally left blank

Please focus here

Mats Björkman, MdH

Measurement-based Research Methods in Computer Engineering

Mats BjörkmanMälardalens Högskola

Mats Björkman, MdH

Overview Introduction Experimental-based research

methodology Statistics Measurements Methodology Examples Pitfalls Conclusions

Mats Björkman, MdH

Introduction Measurement-based research

is founded in: Experimental research methodology Statistics

Mats Björkman, MdH

Experimental-based research methodology Overview (repetition) Comments

Mats Björkman, MdH

Experimental-based research methodology - Overview Already the old Greeks… Two main standpoints: Rational methods, it all comes from

the brain, everything can be thought out

Idealistic methods, everything we observe give us knowledge about the ideal world (e.g. Plato)

Mats Björkman, MdH

Different Methodologies Rational research meant thinking,

thus deductive (logical) methodologies

Idealistic research meant observing and drawing conclusions, thus inductive methodologies

Mats Björkman, MdH

Practice often a mixture E.g. Astronomy, a combination of

induction from observations and deduction through e.g. Mathematics

Mats Björkman, MdH

Medieval times Much debate around whether or

not God stands above the laws of logic

The question “Why?” important Research always seen through the

glasses of religion

Mats Björkman, MdH

The Scientific Revolution Bacon, Copernicus, Kepler, Newton

etc. Focus on “How?” Nature and religion may be treated

separately as long as focus is on How Led to the development of the

“traditional” sciences

Mats Björkman, MdH

Modern Science Karl Popper - important philosopher Science is a process of testing and

refining hypotheses Induction problem: Can experience

be generalized? Popper says ‘no’, experiments

cannot prove general hypotheses

Mats Björkman, MdH

Modern Science Falsification is the most important

feature of science according to Popper Hypotheses cannot be proven, but they

can be falsified by counter-examples Theories are compared by their

expressiveness and by their abilities to withstand falsification

Mats Björkman, MdH

Modern Science Since hypotheses cannot be

generally proven, corroboration and statistics play important roles

Hypothetical-deductive research methods build on these views of science

However, corroboration versus verification can be and is discussed

Mats Björkman, MdH

Hypothetical-deductive methods Problem formulation Hypothesis Deduction to find evaluation criteria Experiment(s)/observation(s) Conclusion

(corroboration/verification or falsification)

Mats Björkman, MdH

Problem formulation The research problem is

formulated Typically, a good problem is

addressable through empirical studies

Mats Björkman, MdH

Hypothesis A hypothesis regarding the answer to

the research question is formulated This is the really creative part of

research, scientific intuition and a good “educated guess” are important to success

(Popper: The more “risky” the hypothesis, the “better” the result.)

Mats Björkman, MdH

Deduction From the hypothesis, criteria are

deduced, the criteria to be used to test the hypothesis

Mats Björkman, MdH

Experiments/observations The “hard work” part of research Experiments are set up and/or

observations are performed in order to corroborate/verify (or falsify) the hypothesis

Mats Björkman, MdH

Corroboration/verification or falsification Using the deduced criteria on the

results of the experiments/observations leads to either corroboration/verification or falsification of the hypothesis

Mats Björkman, MdH

Then iterate… Modern scientific research is

typically a series of hypothetical-deductive situations; each corroboration/ verification or falsification gives input to a new or modified research question etc. etc.

Through this process, our scientific theories are expanded and refined

Mats Björkman, MdH

What is “the Truth”? Experimental research is often

more quantitative than qualitative For quantitative results, confidence

levels or margins of errors are used in attempts to “encircle” the Truth (should it exist)

Experiments are repeated and/or modified until confidence levels or error margins are satisfactory

Mats Björkman, MdH

What is “the Truth”? For qualitative results, we must also

use statistics. Even if we believe in induction and

that the Truth is possible to find, there are always experimental errors and the like that makes 100% impossible to reach

Here too, repeated experiments are needed

Mats Björkman, MdH

Conclusions Experimental research is an

iterative process Potential falsification is important:

experiments without risks are not interesting

Mats Björkman, MdH

Conclusions Examples: If we know the outcome

beforehand, the experiment is of no scientific value.

If there is no way to falsify the hypothesis (e.g. pseudoscience), the experiment is of no scientific value.

Mats Björkman, MdH

Experimental-based research methodology - Comments 100 % does not exist in reality!

Mats Björkman, MdH

Experimental-based research methodology - Comments In reality, there is always a

residual chance/risk that something really weird will happen

Mats Björkman, MdH

Experimental-based research methodology - Comments Therefore, Popper and his followers

are maybe not wrong, but it is kind of irrelevant whether a hypothesis can be “generally proven” or not

(I’m trying to be provocative here…)

Mats Björkman, MdH

Experimental-based research methodology - Comments If I show that in X percent of all

cases, some hypothesis Y holds… …then according to Popper, this

cannot prove the general case… …but if 1 – X (the risk of Y not

holding) is smaller than e.g. the risk of the world being ended by a comet…

…then who cares?

Mats Björkman, MdH

Experimental-based research methodology - Comments In reality, we always take

calculated risks If a hypothesis is true for all

practical purposes, then it is an academic question (a philosophical question) whether or not the hypothesis is TRUE

Mats Björkman, MdH

Experimental-based research methodology - Conclusions Conclusion: The hypothetical-

deductive method is the modern methodology in experimental research

However, not everyone agrees that hypotheses cannot be generally proven (and others don’t care…)

Mats Björkman, MdH

An Experimental Example The work I did for my PhD thesis

around the performance of parallel implementations of communication protocols

Mats Björkman, MdH

Parallel TCP and UDP stacks On a shared-memory

multiprocessor we implemented parallel TCP/IP/Ethernet and UDP/IP/Ethernet stacks

The performance behavior of these stacks gave rise to the research question “What factors limit the performance of these parallel implementations?”

Mats Björkman, MdH

Performance limiting factors In parallel processing, critical

resources must be protected from simultaneous access, in our case by using locks

Hence, these critical sections were main suspects as performance limiting factor

Mats Björkman, MdH

Our research hypothesis Our hypothesis then was “locking is

the main performance limiting factor”

We built a performance model using only locking and processing

If our hypothesis was right, then the model should behave like the real system

Mats Björkman, MdH

Experiment Our experiment was to run the

model with the same input as the real implementation and compare results

Mats Björkman, MdH

Results For the TCP stack,

results were fairly accurate for low numbers of processors (but far from perfect).

Conclusion: locking “is probably” one major factor (but not the only)

Mats Björkman, MdH

Results For the UDP

stack, results differed widely.

Conclusion: locking is not a major factor here

Mats Björkman, MdH

Results from conclusions We need to rethink and refine

(iterate) Locking obviously is one factor, but

not the only Need to think again and formulate

a new hypothesis

Mats Björkman, MdH

New hypothesis Next to contention for shared

software resources, contention for shared hardware resources (e.g. buses, memory) is a likely candidate

New hypothesis: Contention for locks and contention for the bus/memory system are the two main factors

Mats Björkman, MdH

New model We then built a new model that

captured the effects of both locking and bus/memory contention

The same evaluation criteria as before, model and reality should agree

Mats Björkman, MdH

New results For the new

model, the TCP results were very good

Mats Björkman, MdH

New results While not perfect,

UDP results also showed that our new model captured the main behavior of the UDP stack

Mats Björkman, MdH

New results Conclusion: Lock and bus/memory

contention “are” the two main performance limiting factors for the observed implementations

Mats Björkman, MdH

StatisticsStatistics are used for many

purposes: Quantify results Measure confidence Statistics needed in

corroboration process

Mats Björkman, MdH

Statistics – Result quantification Assume we are measuring

some property P that has a certain (but to us unknown) value V

When measuring, we get a measured value V*

Mats Björkman, MdH

Statistics – Result quantification How is V and V* related? It depends on our

measurement methods Ideally, our measurement

method gives an exact result, i.e. V* = V

Mats Björkman, MdH

Statistics – Result quantification However, most measurement

methods are: Statistical by their nature Inexact Deliberately simplified (model)

Mats Björkman, MdH

Statistics – Measurement methods

Statistical by nature: Sampling a typical example:

Instead of observing a long and possibly continuous process, we take a number of snapshots

These snapshots are statistically representative of the process

Mats Björkman, MdH


Example: Counting cars (vehicles) We want to know the number of

cars/vehicles passing outside Rosenhill on one day

Instead of counting for 24 hours, we can count 10 randomly chosen minutes and multiply by 144.

Mats Björkman, MdH


Inexactness: If our measurement tools have

lower resolution than the property we are measuring, we introduce measurement errors

Mats Björkman, MdH


Example: Using a simple scale to weigh something will only yield approximative values of the weight

Mats Björkman, MdH


Simplification: We approximate the problem

by introducing an inexact model

Mats Björkman, MdH


Example: Counting cars (again) We put a rubber tube across

the street with a counter that ticks every time the tube is run over by a vehicle wheel (pair)

Mats Björkman, MdH


Example: Counting cars (again) We get one tick per vehicle axle We can approximate that all

vehicles have 2 axles… …or use some previously

determined value, e.g. 2.042 axles

Mats Björkman, MdH


Note that whereas using 2 axles is a coarser simplification, using 2.042 axles introduces dependencies on some earlier measurements and their reliability and exactness

Mats Björkman, MdH

Statistics – Result quantification Dependent on the number of

measurements we make and the measurement precision etc., we will get a measured value V* where we can quantify the relation between V* and the true value V

Mats Björkman, MdH

Statistics – Result quantification With enough knowledge about

all parts of our measurement process, we can determine that:

With a confidence of q, the true value V lies in the range of [V*-1,V*+2] (with some specific values of q, 1, 2).

Mats Björkman, MdH

Statistics – Result quantification Two goals then are to: Have as high confidence (q) as

possible in the results, and have as small interval margins

(1, 2) as possible.

Mats Björkman, MdH

Statistics – Result quantification Both confidence levels and

margins of error are dependent on the measurement methods we use

Mats Björkman, MdH

Statistics – Result quantification Therefore, it is a prime task to

make measurements as reliable as possible

By removing sources of errors, the results can be very much better

Mats Björkman, MdH

Statistics – Result quantification Repetition is another important

point By repeated experiments, we

can obtain better confidence in our results, as well as reduce the margins of error

Mats Björkman, MdH

Statistics – Methodology In order to reduce errors,

measurement experiments need to be as controlled as possible

By controlled we mean that we should reduce and (hopefully) control all error sources

Mats Björkman, MdH

Methodology If there are factors in the

environment that cannot be eliminated, they should be measured and quantified

Example: vehicle axles, 2.042 is a quantification of our “conversion error” from axles to vehicles

Mats Björkman, MdH

Methodology - dilemma Methodology dilemma:

Intrusive measurements Example: Timestamping a code

section Time for timestamping

included in reported times…

Mats Björkman, MdH

Timestamping dilemmaTimestamp

Timestamp

Code to measure

tProblem: ≠

Mats Björkman, MdH

Timestamping: reducing error

Timestamp

Timestamp

Code to measure

TimestampTimestamp

- ≈

Mats Björkman, MdH

Timestamping: reducing error

Timestamp

Timestamp

Code to measureCode to measure

Code to measure

12…N

≅ N ∗

Mats Björkman, MdH

Simulations One way to avoid intrusive

measurements Problem: Simulations must be

verified/corroborated against reality

Mats Björkman, MdH

Simulations Problem: Simulations lend

themselves to simplifications This means that we introduce

modeling errors These errors must be identified

and controlled

Mats Björkman, MdH

PitfallsThere are many pitfalls in

experimental research, especially when it involves measurements

Mats Björkman, MdH

Pitfall: Sampling too sparsely Example from a very early

publication We performed measurements

on protocol stacks in a UNIX environment

Mats Björkman, MdH

Pitfall: Sampling too sparsely Our controlled measurements

did not agree with reality

05

1015202530354045

ControlledReality

Mats Björkman, MdH

Pitfall: Sampling too sparsely Turned out, our measurement

points were not representative

05

1015202530354045

ControlledReality

Mats Björkman, MdH

Pitfall: Sampling too sparsely We had chosen measurement

points (data sizes) in multiples of 1KB

The UNIX mbuf system puts communication data into 128-byte linked buffers. For 1KB, 8 such buffers are substituted for one large 1KB buffer

Mats Björkman, MdH

Pitfall: Sampling too sparsely This means: Sending 1023 bytes means

handling of 8 small buffers Sending 1024 bytes means

handling of 1 large buffer Sending 1025 bytes means

handling of 1 large and 1 small buffer

Mats Björkman, MdH

Pitfall: Sampling too sparsely Hence, our measurement points

were extreme points, therefore our results deviated from reality’s mean vaules

05

1015202530354045

ControlledReality

Mats Björkman, MdH

Pitfall: Time correlationsWhen working in a real system,

unknown time correlations may occur. Process scheduling is one typical example.

Mats Björkman, MdH

Pitfall: Time correlations If the arrival time of packets to

a host is timestamped in a process, the timestamps will exhibit a pattern correlated to the scheduling of that process.

Mats Björkman, MdH

Pitfall: Clock resolutionMeasuring small time quantities

can be hard, and clocks may not be trustworthy on small time scales

Mats Björkman, MdH

Pitfall: Clock resolutionExample: uniqtime in UNIX

systemsUniqtime keeps track of clock

reads. Has the clock not ticked since last read, uniqtime will add a small fraction to the time to avoid “time standing still”.

Mats Björkman, MdH

Pitfall: Clock resolutionThis means that times in the

same order as the clock tick cannot me measured accurately

Mats Björkman, MdH

Pitfall: Trusting authoritiesAuthorities can make errors, too.Example: ns (and ns-2) are

standard simulators for communication.

ns comes with a large set of standard protocols

Mats Björkman, MdH

Pitfall: Trusting authoritiesWe were interested in investigating the

backoff mechanism in Ethernet.We found out that the Ethernet

implementation in ns was broken (and had been so for a long time).

For several years, many people around the world had used a broken protocol in their simulations

Mats Björkman, MdH

Conclusions Measurement-based research is a

prime example of experimental research methodology

Measurements are tricky, but fun! If you are interested in a career in

measurement-based research, study statistics!

Mats Björkman, MdH

The End

That’s all, Folks!

this page intentionally left blank

Documents

research question

good problem

deductive situations

rational methods

outidealistic methods

corroboration verification

deduced criteria

scientific intuition