an automated approach for recommending when to stop performance tests

37
An Automated Approach for Recommending When to Stop Performance Tests Hammam AlGhamdi Weiyi Shang Mark D. Syer Ahmed E. Hassan 1

Upload: sailqu

Post on 07-Feb-2017

126 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: An Automated Approach for Recommending When to Stop Performance Tests

An Automated Approach for Recommending When to Stop

Performance Tests

Hammam AlGhamdi

Weiyi Shang

Mark D. Syer

Ahmed E. Hassan

1

Page 2: An Automated Approach for Recommending When to Stop Performance Tests

Failures in ultra large-scale systems are often due to

performance issues rather than

functional issues

2

Page 3: An Automated Approach for Recommending When to Stop Performance Tests

A 25-minutes service outage in 2013 cost Amazon approximately $1.7M

3

Page 4: An Automated Approach for Recommending When to Stop Performance Tests

4

Performance testing is essential to prevent these failures

System under test

requests

requests

requests Performance

counters, e.g., CPU, memory, I/O and response time

Pre-defined workload

Performance testing environment

Page 5: An Automated Approach for Recommending When to Stop Performance Tests

5

Determining the length of a performance test is challenging

Time

Repetitive data is generated from the test

Optimal stopping time

Page 6: An Automated Approach for Recommending When to Stop Performance Tests

6

Determining the length of a performance test is challenging

Time

Stopping too early, misses performance

issues

Stopping too late, delays the release and wastes testing

resources Optimal stopping

time

Page 7: An Automated Approach for Recommending When to Stop Performance Tests

7

Our approach for recommending when to stop a performance test

Collected data Likelihood of repetitiveness

First derivatives

Whether to stop the test

1) Collect the already-generated

data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop the

test

STOP

No

Yes

Page 8: An Automated Approach for Recommending When to Stop Performance Tests

8

Our approach for recommending when to stop a performance test

Collected data Likelihood of repetitiveness

First derivatives

Whether to stop the test

1) Collect the already-generated

data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop the

test

STOP

No

Yes

Page 9: An Automated Approach for Recommending When to Stop Performance Tests

9

Our approach for recommending when to stop a performance test

Collected data Likelihood of repetitiveness

First derivatives

Whether to stop the test

1) Collect the already-generated

data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop the

test

STOP

No

Yes

Page 10: An Automated Approach for Recommending When to Stop Performance Tests

10

Our approach for recommending when to stop a performance test

Collected data Likelihood of repetitiveness

First derivatives

Whether to stop the test

1) Collect the already-generated

data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop the

test

STOP

No

Yes

Page 11: An Automated Approach for Recommending When to Stop Performance Tests

11

Time

Current time

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

Step 1: Collect the data that the test generates

Performance counters, e.g., CPU, memory, I/O

and response time

Page 12: An Automated Approach for Recommending When to Stop Performance Tests

12

Time

Step 2: Measure the likelihood of repetitiveness

Select a random time period (e.g. 30 min)

Current time

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

A

Page 13: An Automated Approach for Recommending When to Stop Performance Tests

13

Time

Current time

Search for another non-overlapping time period that is NOT statistically significantly different.

Step 2: Measure the likelihood of repetitiveness

… B … A

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

Page 14: An Automated Approach for Recommending When to Stop Performance Tests

14

Time

Wilcoxon test between the

distributions of every performance counter across both

periods

Current time

Step 2: Measure the likelihood of repetitiveness

B … A

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

Page 15: An Automated Approach for Recommending When to Stop Performance Tests

15

Step 2: Measure the likelihood of repetitiveness

Response time

CPU Memory IO

p-values 0.0258 0.313 0.687 0.645

Statistically significantly different in response time!

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

Time

Wilcoxon test between every performance counter from both periods

Current time

B … A

Page 16: An Automated Approach for Recommending When to Stop Performance Tests

16

Step 2: Measure the likelihood of repetitiveness

Response time

CPU Memory IO

p-values 0.67 0.313 0.687 0.645

Find a time period that is NOT statistically significantly different in ALL performance metrics!

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

Time

Wilcoxon test between every performance counter from both periods

Current time

B … A

Page 17: An Automated Approach for Recommending When to Stop Performance Tests

17

Find a period that is NOT statistically significantly different?

Yes. Repetitive! No. Not repetitive!

Step 2: Measure the likelihood of repetitiveness

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

Time

Wilcoxon test between every performance

counter from both periods

Current time

B … A

Page 18: An Automated Approach for Recommending When to Stop Performance Tests

18

Step 2: Measure the likelihood of repetitiveness

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

Repeat this process a large number (e.g., 1,000) times

to calculate the:

likelihood of repetitiveness

Page 19: An Automated Approach for Recommending When to Stop Performance Tests

19

Step 2: Measure the likelihood of repetitiveness

30 min 40 min

Time …

1h 10 min

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

A new likelihood of repetitiveness is measured periodically, e.g., every 10 min, in order to get more frequent feedback on the

repetitiveness

Page 20: An Automated Approach for Recommending When to Stop Performance Tests

20

Step 2: Measure the likelihood of repetitiveness

Time

likelihood of repetitiveness

00:00 24:00

1%

100%

Stabilization (little new information)

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

The likelihood of repetitiveness eventually starts stabilizing.

Page 21: An Automated Approach for Recommending When to Stop Performance Tests

21

Step 3: Extrapolate the likelihood of repetitiveness

Time

likelihood of repetitiveness

00:00 24:00

1%

100%

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

To know when the repetitiveness stabilizes, we calculate the first derivative.

Page 22: An Automated Approach for Recommending When to Stop Performance Tests

22

Step 4: Determine whether to stop the test

Time

likelihood of repetitivenes

s

00:00 24:00

1%

100%

Stop the test if the fist derivative is

close to 0.

Collected data

Likelihood of repetitiveness

First derivative

Whether to stop the test

1) Collect the already-

generated data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop

the test

To know when the repetitiveness stabilizes, we calculate the first derivative.

Page 23: An Automated Approach for Recommending When to Stop Performance Tests

23

Our approach for recommending when to stop a performance test

Collected data Likelihood of repetitiveness

First derivatives

Whether to stop the test

1) Collect the already-generated

data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop the

test

STOP

No

Yes

Page 24: An Automated Approach for Recommending When to Stop Performance Tests

PetClinic Dell DVD Store

CloudStore

24

We conduct 24-hour performance tests on three systems

Page 25: An Automated Approach for Recommending When to Stop Performance Tests

25

We evaluate whether our approach:

Stops the test too early?

Stops the test too late?

Optimal stopping time

1 2

Page 26: An Automated Approach for Recommending When to Stop Performance Tests

26

Pre-stopping data Post-stopping data

Time

STOP

Does our approach stop the test too early?

00:00 24:00

1

1) Select a random time period from the post-

stopping data

2) Check if the random time period has a

repetitive one from the pre-stopping data

The test is likely to generate little new data, after the stopping times (preserving more

than 91.9% of the information).

Repeat 1,000 times

Page 27: An Automated Approach for Recommending When to Stop Performance Tests

27

We apply our evaluation approach in RQ1 at the end of every hour during the test to find the most cost effective stopping time.

Does our approach stop the test too late?

2

1h 2h

Time …

10h 20h 24h

Page 28: An Automated Approach for Recommending When to Stop Performance Tests

The most cost-effective stopping time has: 1.  A big difference to

the previous hour

2. A small difference to the next hour

28

1%

100%

00:00 04:00 05:00 06:00

Does our approach stop the test too late?

2

likelihood of repetitiveness

Page 29: An Automated Approach for Recommending When to Stop Performance Tests

29

There is a short delay between the

recommended stopping times and the most cost effective stopping times (The majority are under

4-hour delay).

Short delay

Does our approach stop the test too late?

2

Page 30: An Automated Approach for Recommending When to Stop Performance Tests

30

Determining the length of a performance test is challenging

Time

Stopping too early, misses performance

issues

Stopping too late, delays the release and wastes testing

resources Optimal stopping

time

Page 31: An Automated Approach for Recommending When to Stop Performance Tests

31

30

Determining the length of a performance test is challenging

Time

Stopping too early, misses performance

issues

Stopping too late, delays the release and wastes testing

resources Optimal stopping

time

Page 32: An Automated Approach for Recommending When to Stop Performance Tests

32

Our approach for recommending when to stop a performance test

Collected data Likelihood of repetitiveness

First derivatives

Whether to stop the test

1) Collect the already-generated

data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop the

test

STOP

No

Yes

Page 33: An Automated Approach for Recommending When to Stop Performance Tests

33

30

Determining the length of a performance test is challenging

Time

Stopping too early, misses performance

issues

Stopping too late, delays the release and wastes testing

resources Optimal stopping

time 32

Our approach for recommending when to stop a performance test

Collected data Likelihood of repetitiveness

First derivatives

Whether to stop the test

1) Collect the already-generated

data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop the

test

STOP

No

Yes

Page 34: An Automated Approach for Recommending When to Stop Performance Tests

34

Pre-stopping data Post-stopping data

Time

STOP

Does our approach stop the test too early?

00:00 24:00

1

1) Select a random time period from the post-

stopping data

2) Check if the random time period has a

repetitive one from the pre-stopping data

The test is likely to generate little new data, after the stopping times (preserving more

than 91.9% of the information).

Repeat 1,000 times

Page 35: An Automated Approach for Recommending When to Stop Performance Tests

35

30

Determining the length of a performance test is challenging

Time

Stopping too early, misses performance

issues

Stopping too late, delays the release and wastes testing

resources Optimal stopping

time

32

Pre-stopping data Post-stopping data

Time

STOP

Does our approach stop the test too early?

00:00 24:00

1

1) Select a random time period from the post-

stopping data

2) Check if the random time period has a

repetitive one from the pre-stopping data

The test is likely to generate little new data, after the stopping times (preserving more

than 91.9% of the information).

Repeat 1,000 times

32

Our approach for recommending when to stop a performance test

Collected data Likelihood of repetitiveness

First derivatives

Whether to stop the test

1) Collect the already-generated

data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop the

test

STOP

No

Yes

Page 36: An Automated Approach for Recommending When to Stop Performance Tests

36

There is a short delay between the

recommended stopping times and the most cost effective stopping times (The majority are under

4-hour delay).

Short delay

Does our approach stop the test too late?

2

Page 37: An Automated Approach for Recommending When to Stop Performance Tests

37

30

Determining the length of a performance test is challenging

Time

Stopping too early, misses performance

issues

Stopping too late, delays the release and wastes testing

resources Optimal stopping

time

32

Pre-stopping data Post-stopping data

Time

STOP

Does our approach stop the test too early?

00:00 24:00

1

1) Select a random time period from the post-

stopping data

2) Check if the random time period has a

repetitive one from the pre-stopping data

The test is likely to generate little new data, after the stopping times (preserving more

than 91.9% of the information).

Repeat 1,000 times

33

There is a short delay between the

recommended stopping times and the most cost effective stopping times (The majority are under

4-hour delay).

Short delay

Does our approach stop the test too late?

2

32

Our approach for recommending when to stop a performance test

Collected data Likelihood of repetitiveness

First derivatives

Whether to stop the test

1) Collect the already-generated

data

2) Measure the likelihood of repetitiveness

3) Extrapolate the likelihood of repetitiveness

4) Determine whether to stop the

test

STOP

No

Yes