web-application's performance testing ilkka myllylä

Web-application’s performance testing

Ilkka Myllylä

Reaaliprosessi Ltd

Agenda

Research results about web-application scalability Performance requirements specification Load testing tools Load test preparation Load test execution Analysis

– What is bottleneck and how to find it ?– Optimization and tuning– What is root cause and how to fix it ?

Research results about web-application scalability

Is scalability problem ?

Research results (Newport Group Inc.) Scalability in production

– as required 48%– worse than requirements 52% (average 30%)

Late from timetable and costs overrun– as required 1 month and 70 000 euros– worse 2 months and 140 000 euros

What explains bad results ?– performance testing timing is most important factor

Performance testing timing

Performance testing timing

06

26

8

60

21

3538

06

0

10

20

30

40

50

60

70

Plan a

nd d

esig

n

Begin

ning

of im

plem

enta

tion

End o

f im

plem

enta

tion

After d

eploy

men

t

Not te

sted

Worse

As required

Early start of testing is profitable

Efficiency– one and 2-5 user tests show average 80% of bottlenecks

Costs– late fixing costs many times more -> wrong architecture risk for

example

When to test performance ?

Architecture validation

Component performance

New application system performance test

Major changes to application made

Realistic performance testing is not easy

Too optimistic results are common problem– ”It worked in tests !” says 32 % (Gartner Group)

Bad environment– Testing environment not equal to production

Bad design + implementation– Load testing tool not used right way

Wrong tool for a job– Load test tool used not functional / does not include features needed

Performance requirements

specification

Performance requirements testability

Requirements should be exact enough in order know how to test them Usage information

– Different users and their profiles ?– Transaction amounts ?

Response times – Detailed response times requirements ?

Technical constraints– User terminals, software, connection speed ?– Technical architecture– Database size

Security and reliability constraints

Completeness of requirements

Sensibility check : Are customers requirements / calculations right and sensible ?

Completeness : What information is missing ?– Checklist

System usage information

Goal : Real usage simulation that is close enough to reality Business / use case scenarios are good starting point Close enough ?

– Not all functions are tested but most important and most used are tested

– 3-5 scenarios is usually enough for one application Usage information : History info available ?

– Yes : Future scenarios ?– No : Estimation and calculation with known facts

Transaction profile

Business scenarioTrans / h

Typical dayTrans / h Peak day

Web server activity DB activity Risk

Log in 70 210 Heavy Light HighCreate new account 10 15 Moderate Moderate LowCreate order 130 180 Moderate Moderate ModUpdate order 20 30 Moderate Moderate HighShip order 40 90 Moderate Heavy High

User profiles

Business sce/User Order Entry Clerk Order Clerk Shipping Clerk OverallConcurrent sessions 15 5 10 30Log in 15 5 10 30Create new account 5 10 15Create order 100 30 130Update order 10 20 30Ship order 90 90

Does customers requirements / calculations make sense ?

Expectations can be too high– ”Response time for all functions should be below 2 sec”– ”In old system everything was very fast”

Different functions are not equal– requirements should be set individually for important use casesand their functions

Costs should be involved– less response time, more costs– technically ”chalenging” requirements ?

How long user is willing to wait ?

Not simple thing– different users have different requirements for same functions

Research results (Netforecast Inc)– two important factors : frequency and amount – strong variation between applications and functions– satisfactory average 10 s

Frequency of use

Frequency : How often user needs to use function ?– more often user needs, less he/she is willing to wait

Example 1 : Frequency– Use case : Search customer info– A. Once a hour : requirement for response time 5 s– B. Once a month : requirement for response time 30 s

Amount of information

Amount : How much valuable information we get as result ?– more information one get, more he/she is willing to wait

Example 2 : Amount– A. Saving function for few input fields : requirement for response time

3 s– B. Search function for product information (100 fields) : requirement

for response time 10 s

Response time requirements

Business scenarioMaximum

acceptable 95%Satisfactory

95 %Log in 10 5Create new account 10 5Create order 8 3Update order 8 3Ship order 15 8

Performance requirements in contracts

Requirements for performance testing– Appendix about performance requirements– Test engineer validate testability– Both customer and supplier benefit

Good for customer– No extra costs late in development cycle

Good form supplier– No new ”surprise” requirements late in development cycle

What about performance testing tools ?– expensive investment

Load testing tools

Types of performance testing

Concurrency– one to several users concurrently in risk scenarios– manual testing normally

Performance– requirements for scalability ?– load testing tool needed

Load– max number of users ?

Reliability– long usage possible without degrading performance ?

Load testing tools - markets

Mercury Interactive market leader (> 50%) Big six have 90% of markets 100’s of small companies Prices for major tools quite high Growing market

Load testing tools– options to get one

Buy licence (+ consulting)– usually virtual user count pricing

Buy service– load generation externally

Rent licence (+ consulting)– need for only limited time and maybe for just this one project

Load testing tools – main functions

1. Recording and editing scripts

2. Scenario design and running

3. Analysis and reporting

Load testing tools - features

Lots of obligatory features – script recording– parametrizing– scenario design

ramp up and weighting different scripts running concurrently

– online results / feedback error detection transaction level response times

– http protocol support

Load testing tools - features

Lots of usually needed features– Distributed clients– Unix/Linux clients– Multi protocol support– Multi speed support– Multi browser support– Server monitoring – Content validation– Dynamic urls supported

Features - Make or buy

Some features are possible to be done manually– server monitoring– analysis and reporting

Usability– best tools are really easy to use– others need lots of work and ”programming experience”

Workarounds– more features than promised with clever trics

Good tool combination

Separate load and monitoring tool– even from different vendor ?– how about profiling ?

Script reusability– same scripts for functional and load testing

Load test preparation

Testing environment

Same as production environment– Other applications dividing same resources (firewall etc) ?

Controllable– No outside disturbance

”Basic” optimization made

What is basic optimization ?– Server parameters are validated by responsible persons and list of

values given to load testers– Database : sql performance checked and necessary indexes exists

Without basic optimization load test is waste of time – just first obvious bottleneck is found and no real information exists

Load test cases

First each script separately with ramp-up usage– easier to see what is problem straight away

Real usage scenario with weighted scripts– usually many test runs before goals are achieved’– time for repeated tests– usual usage first then special occasions

What-if scenarios– one change at time to see influence of changing factor

Risk based testing– different location and speed testing– hacker testing -> Dos attack etc

Script selection

User or process scripts ?– both are possible

Example : Petshop application – user oriented– Create order - returning customer – Create order - new customer– Searching customer

Example : Petshop application – process oriented– Registeration – Create order– Search order

Script recording and editing

Script = Program code (C, Perl etc) to execute test automatically

Basic recording – execute test case with recording on– check and set recording options before start– generates script

Editing– parametrizing– transactions– think time changes– checkpoints– comments

Parametrizing

Recorded script includes hard coded input values– If we execute load test with hard coded values results are not

realiastic (too good or bad)

Parametrizing = Different input values for different virtual users– all users of system have different user information– more realistic load

Test data generation

Parameter data with right distribution– Generation of test data to text files which load tools can use

Real amount of data in databases – Backup and restore procedures

Transaction

Detailed response time information inside script

Exact execution times and problem transactions could be seen– script with 10 transactions -> when response time increase, are all the

tranasactions equally slow or just some ?

Checkpoint

Functions in script that check correctness of results during execution

In some tools could be set automatically – Others need manual implementation

Find errors otherwise not seen

Think time

Think time = Time user uses for looking and input before making next request to server

Important parameter when estimating usage– Less think time means more frequent request and more load to

servers

Example – 100 users logged to system with 10 s average think time = 1 user 6

transactions/ minute and 100 users 600 t/min = 10 tps– If think time is 30 s load is 3 tps

Comments

Making and testing scripts = software development– comments for meintenance– tools own naming is not always good -> changes needed to get

readability

Script testing

Executing single script succesfully – at least twice– with checkpoints and parameters

Scenario creation and testing

Usage information and ramp-up of different scripts in same scenario

Designed counters available and working for testing

Test run with couple of users

Ramp-up

Ramp-up– User amount increase little by little– In real life usually amounts does not change immediately– When user amount increase little by little, it is easy to see how

response time and utilization develop– Stabilizing before next level of load

Example : 1000 users use system at same time– first 50 users then 50 more every 10 minutes until response time is

bad or errors start to increase

Collection of performance counters

Responsiblity of getting performance counters is usually divided between

– administrators– developers– testers -> Load Tool monitors

Load tool monitors should tested– not always so easy to get information from servers as vendor says

Load test execution

Reset and warm-up

Reset situation– Old tests should not influence

Warm-up– Before actual test, some usage need to done

Synchronizing people involved

Test manager gets ready from all people involved

When test ends syncronization again to stop monitors

Collection of results

Active online following

Counters– following online monitors – response time and throughput– client and server system counters (cpu, memory, disk)

Error messages– if lot of errors occured test should be stopped– errors occur often before application run out of system resourches

Response time

Most important counter for performance Response time = time user needs to wait before able to continue Industry standard for response time : 8 seconds With response time usage information is needed too

– simultaneous user amount and what most of they are doing Example

– 100 simultaneous sessions, 50% update and 50% search – Response time requirement 4 s to 95% of insert and update of bill

insert. To other functions requirement is 8 s.

Throughput

Another important counter for validating scalability Amount of transactions, events, bytes or hits per second usual counter tps (=transaction per second) Requirements could be told as throughput value Bottleneck could be seen easily with saturation point of throughput

Throughput and response time

Performance ”knee”

Performance vs users

0.00

20.00

40.00

60.00

80.00

100.00

120.00

20 40 60 80 100 120 140 160 180

Virtual users

Re

sp

on

se

tim

e

What is bottleneck and how to find it ?

What is bottleneck and why it is important ?

Any resource (software, hardware, network) which limits speed of application

– under requirements– from good to even better (changing requirements)

Bootleneck is result– reason should be analysed and fixed– for example disk i/o is bootleneck and fix is to distribute log file and

database files to different disks ”Chain is as strong as it’s weakest link”

– application is as fast as worse bottleneck permits

How can we identify bottleneck ?

Using application and measuring– response time and throughput– resourche usage measuring

One user– ralative slowness and resourche utilization (= not yet bottleneck but

possible to see that bigger amounts of users will couse one) Several users

– trends possible to see already (=1 user 1s, 5 users 3 s, 1000 users ? s)

Required amount of users– Actual max usage scenatio

Not so nice features of bottleneck

Real bottleneck influence load of other resources– ”everything influences everything”– when disk is bottleneck, processor looks like one too (but is not)– when real bottleneck is fixed other problem will be solved too– if we increase processor power, it does not help

Real bottleneck ”create” other problems but hide them too– first bottleneck should be solved in order to see next real bottleneck

Amount and finding of bottlenecks

One application has usually many bottlenecks– many changes are needed in order to

One test finds only one bottleneck– many iterations are needed in order to fix all bottlenecks

Most common bottlenecks in web-applications

Bottlenecks % in web-applications

40

30

20

10

0

5

10

15

20

25

30

35

40

45

Application server Database server Network Web server

Server counters and profiling

What counters and log/profile information do we need in order see bottleneck and root cause ?

Two levels of counters– system counters – cpu utilization %– application software counters – Oracle cache hit ratio %

Log/profile information– detail level resource usage information

Collecting system counters

Memory, CPU, network and disk counters could be collected– with operating system dependent programs like Windows

Performance monitor or Unixin sar,top etc– with load testing programs like LoadRunner or QALoad

Collecting with load testing programs is easier and information is in easy to analyze/report form

Counters for all four are needed

Interpreting system counters

Most important counters– CPU – queue length tells if it is too busy– Disk – queue length tells if it is too busy– Network - queue length tells if it is too busy– Memory – hard page faults (disk) tells if it is too small

However one counter is not enough– to be sure more counters are needed– to see root couse more counters are needed

Application counters

Collecting with load testing programs is easy and information is in easy to analyze/report form

However all counters are not available to load testing tools– online monitors (Websphere, WebLogic) could be used to

complement information

Different products have different counters– need for understanding that particular product

Profiling tools

Collecting exact information in call level– memory usage– disk i/o usage– response time

Collecting information may influence quite much to results– one solution is to make two test runs : one without logging/profiling

and other with them

Example 1 : One clear bottleneck

One of four system resources is busy– easy to see bottleneck

Example 2 : More than one system resources looks bad

However only one resource is real bottleneck– others are ”side effects” of real bottleneck

Example 3 : None of system resources looks bad

Where is the bottleneck then ?– usually some software application uses works inefficiently internally or

interface queue to external systems does not work efficiently

What is root cause and how to fix it ?

How to see root cause ?

Application level information is usually needed and always good to have

– Software code problems could be solved when we see which is slow function

Some root causes are easy to see while others needs sophisticated monitoring and profiling

Software implementation

Database server– Bad sql from performance point of view (works but not efficiently)– No or not good enough indexes used

Application server– Object references not freed -> too much objects in heap– Bad methods used from performance point of view

Idea is to decrease load to hardware resouches

Efficiency

Not efficient use of existing hardware resources – Parametrizing and configuring help

Capacity

Resource too slow for handling events fast enough

More resourches or reconfiguring existing resourches– Cpu from web-server to db server

Hard constraints and requirements

Client’s complicated business logic requirements– too much bytes needed in user interface (slow network speed)– too many different sources of information needed (syncronous)– long transactions; single function needs many chained updates

Security requirements– too much request to web server -> encrypted network traffic

Online data needed– many big updates needed immediately

Bad design

Application tiers– distribution of tiers possible (=EJB vs pure Servlet)

Technology – too much information in session object

Infrastructure– not compatible versions of different vendors from performance point

of view– needed functionality not available (= distibution not supported)

Tuning

Tuning– application– server software– operating system– network

Usually good choice– fast to do– risks to regression small

Usually tuning is not enough– changes are needed

Changing

Application code Application software Hardware Network infrastructure

Tuning vs change

Tuning is not so risky

Change is not always possible

In practise both are valid and equally considered

Example : Tuning vs change

Sales system has application server processor bottleneck Could be removed

– More processing power– Less processing needed -> application code change

If application logic need to be changed a lot– More processing power choosed

If application logic need to be changed a little– application code change choosed

If both are fast, easy and costs are low– both are choosed

Removing bottlenecks

Idea : Removing root cause of bottleneck one by one

Rerun same test to see influence

Testing part of system

Sometimes it is difficult to see bottleneck and root cause– More information is needed in order to understand system better– Testing just one suspect at time is usually possible but could need

much effort

Testing only one extent at time is ultimate way

Top – down optimizing

When there is plenty of time– not very fast, but efective

Idea : Optimize one level at time

-> Level by level readyness– No jumping between levels

Application code

Application software

Operating system

Hardware

Memory–cache-pool-area usage

Idea : Data or service that application needs is already in memory as much as possible

System level– big enough memory -> not much swapping needed– proxy server caches content

Application level– big enough database connection pool -> new objects not needed– big enough database sort area -> not much swapping needed

Connection and thread pools

Creating many objects at startup– new user gets object from pool– when used object returns to pool

Synchronous and asynchronous traffic

If possible actions could happen asyncronously (= no need to wait that action is ready)

Interfaces to other systems

Distributing load

Between servers– load balancing

Inside server– cpus– disks

Between networks– segments

Cut down features

Sometimes only possibility is to cut down features and requirements

– deadline too neat to make other optimizing– costs or risks too big when doing anything else

Making recomendations for correcting actions

Need usually interpretion of results from different persons– however understanding and criticality is needed

Results should be clear– usual ”It is not our software but yours” conversations could be avoided

if nobody can question results and recomendations

– need to show where problem is not !

Example : Internet portal

Application : Many background systems develop data to this portal Response time in USA:ssa 5 s, when connections are fast In Asia every connection takes 2 sec and moving elements

between server and client is slow too Logic : 12 frames inside each other Result : Opening first page takes 2 s*12 + 30 s = 54 sec Requirement : 8 sec

Corrective actions and ideas

Idea 1 : Faster connection– not possible -> thousands of internet customers

Idea 2 : Content nearer to customer– pictures partly to client workstations -> security regulations prevent partly– content to servers near customers (Content Delivery Network) – helps some but not enough

Idea 3 : Packaging of data– helps some but not enough

Idea 4 : Application logic change -> less frames– lot of costs– requirements achieved

Error and lessons learned

Internet users with slow connections and different geagraphical areas

– can be important user group– Technical design failed to this group

Perfotmance testing late in development cycle– too late– not simulated real usage good enough

Pilot users saved much– not widely used when problems we seen

Solution was found (as usual)– but fixing took much time and money

web-application's performance testing ilkka myllylä

Documents