engageone server performance and scalability whitepaper...hardware – the system is capable of...
TRANSCRIPT
EngageOne Compose – Scalability White Paper Page 1 of 27
Customer Engagement
EngageOne® Compose
Performance and scalability whitepaper
EngageOne® Server
EngageOne Compose – Scalability White Paper Page 2 of 27
Scalability White Paper
Contents Introduction ............................................................................................................................................ 3
Executive Summary ................................................................................................................................ 4
Architectural Overview ........................................................................................................................... 5
Performance and scalability factors ....................................................................................................... 6
Real-time test scenarios ......................................................................................................................... 6
Reference Configuration ......................................................................................................................... 7
Scaling out............................................................................................................................................. 11
Scaling up .............................................................................................................................................. 16
Batch Processing ................................................................................................................................... 20
Software Platforms ............................................................................................................................... 20
Capacity Planning.................................................................................................................................. 22
1: Estimate the demand ................................................................................................................... 22
2: Estimate the number of application servers required ................................................................. 23
3: Define the database server and shared file system ..................................................................... 25
4: Validate the environment through testing .................................................................................. 27
Diagnostic and Tuning Techniques ....................................................................................................... 27
EngageOne Compose – Scalability White Paper Page 3 of 27
Scalability White Paper
Introduction This document describes the performance and scalability characteristics of the EngageOne Compose
server. It is intended to be used by customers and Professional Services staff who need to understand how
the system scales so that capacity can be properly planned.
Throughout this document performance refers to how quickly the system is able to process a request. This
is measured as the time taken for a web application to deliver a page to a user or for a web service to
respond to its caller.
Scalability refers to the system’s ability to increase its throughput whilst maintaining good response times.
For example, if the number of real-time users is doubled the system should be able to sustain twice as
much throughput without an increase in response time. A system with good scalability characteristics can
make use of additional hardware to support the increased load.
When measuring performance and scalability it is important to use test scenarios that represent the way in
which the system will be used in the real world. For EngageOne Compose there are four primary use
cases, divided into real-time and batch.
Real-time use cases
1. Interactive – Users of the EngageOne Interactive application or custom applications that behave in
a similar way. These users create one document at a time by filling in customer-specific data and
then submitting the document for composition. The Interactive application enables users to track
and update their documents in a task list.
2. On-demand – An external system calls the EngageOne web services to compose and deliver a
document.
Batch use cases
3. Accumulated Batch – A document created by an Interactive or On-demand user can have its
output type set to Batch. The system will add it to a collection of documents to be composed
later, as part of a batch. Documents accumulate until the batch is started either manually or by a
timer.
4. Non-accumulated Batch – The system composes a set of documents based on data in an input file.
The there are no Interactive or On-demand users involved in the process.
EngageOne Compose – Scalability White Paper Page 4 of 27
Scalability White Paper
Executive Summary A reference hardware configuration was used to obtain the numbers below. The details of the hardware
are provided later in the document, but in summary it consisted of two application servers, one file server
and one database server. Increased throughput can be achieved by using larger hardware configurations.
Real-time workload
The table below summarizes the throughput and response times for the real-time scenarios using the
reference hardware.
Test Scenario Throughput 1 Response time 2 Max supported users 3
1. Interactive 8 tps (2-server cluster) 300 ms 2,400 to 14,400
2. On-demand 27 tps (2-server cluster) 400 ms 8,100 to 48,600
Notes:
1. Throughput is expressed as transactions per second. A transaction represents one iteration of the
test scenario.
2. The response times are averages across all of the pages and web service calls in the scenario.
3. The number of supported users varies greatly depending on the workload of each user. The
numbers shown in the ‘Max supported users’ column table represent the maximum number of
busy and light users respectively. A busy user creates one document every five minutes and a light
user creates one every 30 minutes.
Later sections of this document describe how the system scales out (by using more servers) and scales up
(by using servers with more cores).
The results show that the system scales out in a nearly linear way when executing the real-time scenarios.
This means that doubling the number of servers results in approximately double the capacity.
The system also scales up nearly linearly, to a maximum of eight cores per server. The system will use
more than eight cores if they are available, but the benefit is no longer linear. When deploying on
hardware with many more than eight cores it is therefore recommended to use virtual machines, each
with four to eight cores.
Batch workload
The batch process runs on a single server. Using the reference hardware configuration, batches of 100,000
documents were processed at the rate shown in the table below.
Batch type Number of docs Time taken Rate (per hour) Rate (per second)
Accumulated 100,000 35:38 minutes 168,382 47
Non-accumulated 100,000 28:00 minutes 214,286 60
EngageOne Compose – Scalability White Paper Page 5 of 27
Scalability White Paper
Architectural Overview The EngageOne Compose server is made up of a number of bundles, each of which has a particular role.
The system is designed so that each bundle can be scaled independently to meet the needs of a particular
environment. The bundles are as follows:
Security bundle – Responsible for the authentication and authorization of users and web service
callers.
Core bundle – Responsible for the EngageOne Interactive web application and web services.
Composition bundle – Responsible for composing documents.
Conversion bundle – Responsible for converting the format of attachments. Note that the
conversion bundle was not used in the tests described in this paper.
Batch bundle – Responsible for processing batches of documents and also for purging old
documents from the database and the file system.
The Core and Composition bundles are the most CPU intensive. A typical deployment will use one instance
of core and one instance of composition per application server. Generally, the security bundle is deployed
on two servers – a primary and a replica.
A minimum of two application servers should be used in a production environment for resilience purposes.
Depending on the load placed on the system additional application servers may be required. The
application servers require access to three shared resourced: A database server (running Oracle or SQL
Server), a file system and an LDAP server.
The Reference Environment described later in this document is deployed as follows:
The reference system uses two application servers, each of which has the Security, Core and Composition
bundles installed. Additionally, one of the servers has the Batch bundle.
EngageOne Compose – Scalability White Paper Page 6 of 27
Scalability White Paper
For testing purposes, a load generation tool, JMeter, replaces the real users in this environment. The load
is distributed evenly across the two application servers by a load balancer.
Performance and scalability factors There are many factors that can influence the response times and throughput of an EngageOne Compose
deployment. These include:
Hardware – The system is capable of scaling out through the addition of extra servers and scaling
up by increasing the processing capacity of each server. There are two shared resources – the
database and the file system – and these must be sized appropriately for the anticipated load as
they cannot be scaled out.
Size and complexity of templates – Larger and more complex templates require more CPU cycles to
process them and use more storage and I/O bandwidth on the file system. The mix of templates
must be considered when sizing a system.
Different types of workload (Interactive, On-demand and Batch) affect the system in different
ways. The mix of workload is therefore an important factor when sizing a system.
Housekeeping processes such as purging work items and maintaining the database are important
for keeping a system running efficiently. The EngageOne Compose Administration Guide provides
advice on maintaining the system for best performance.
Real-time test scenarios There are two test scenarios which are representative of the way the system is used by real-time users, the
scenarios are:
Interactive – This scenario simulates users manually creating documents through the EngageOne
Interactive application. As well as the creation of documents, this scenario also includes populating the
application’s home page with information about tasks that are relevant to the user.
This scenario includes approximately 12 seconds of think time per iteration. A real user would take several
minutes to execute the steps in the scenario, so the think time is not intended to be representative.
The user actions included in the Interactive scenario are:
Land on the EngageOne Interactive start page.
Log in
Repeat the following steps three times
o Go to home page (populates various task lists)
o Search for template and randomly select one of the three available
o Create a work item based on the template
o Load the ActiveX editor and fill in the required fields
o Search for delivery option and select an option with two channels
o Submit the document
o Return to home page (populates various task lists)
o Wait for the document to be composed
o Open the work item and confirm its delivery
Log off
EngageOne Compose – Scalability White Paper Page 7 of 27
Scalability White Paper
On-demand – This scenario simulates an environment where an external application is submitting requests
to EngageOne Compose via its web service interface. Each request results in a document being composed
and delivered to two different channels. This scenario runs with no think time, therefore a new request is
made as soon as the previous one completes.
The web service requests made in this scenario are:
Search for template and randomly select one of the three available
Search for delivery option and select an option with two channels
Deliver document
Invalidate session
Batch test scenarios Whilst the Interactive and On-demand scenarios are the primary focus of this document, the performance
of batch jobs are also tested. Accumulated Batch and Non-accumulated batch are tested by timing the
total execution time of a manually initiated batch job.
Reference Configuration The Reference configuration consists of a defined set of hardware, software, templates and test scenarios.
Variations to this configuration can be used and the results can be compared with the reference. For
example, it is possible to measure the effect of doubling the number of application servers.
All of the results described in this document were obtained by testing on virtual hardware in Amazon Web
Services (AWS). Therefore, the reference hardware is defined in AWS terminology.
The reference hardware configuration is as follows:
Server purpose Instance type Number
used
Notes
Application Server m5.2xl
4 cores (8 vCPUs)
32GB RAM
2 The m5 family of virtual machine instances
are designed for general purpose
workloads.
File Server m5.4xl
8 cores (16 vCPUs)
400GB drive with
5000 Provisioned IOPS
1 An m5.4xl server was chosen because it
guarantees the availability of high I/O
throughput to the disk. The file server
does not require a large number of cores
but the disk I/O is very important.
Database server db.m4.2xl
4 cores (8 vCPUs)
32GB RAM
100GB drive
2000 Provisioned IOPS
1 AWS RDS instance running Oracle
Enterprise Edition 12.1.
EngageOne Compose – Scalability White Paper Page 8 of 27
Scalability White Paper
The test scenarios all use three different templates of varying size and complexity. Each iteration of the
scenario randomly chooses one of these templates, so each one is used for 1/3 of the iterations.
Template type Description
Letter Relatively simple. Composed document size is 150KB.
Policy document 20-page document. Composed size is 350KB.
Brochure Three pages long, with images. Composed size is 1.8MB.
Two real-time scenarios are executed as part of the reference configuration – the Interactive and On-
demand scenarios. The two scenarios are tested independently and the results are presented separately.
The tests involve ramping up the load over a period of time and measuring how the throughput and
response times vary with load. For convenience each test is one hour long. Several test runs were
executed with different levels of load to determine the amount of load required to fully saturate the
system.
The throughput for the Interactive and On-demand scenarios using the reference hardware was as follows:
As mentioned previously, a transaction in this context is a single iteration of the test scenario. Each
iteration includes searching for a template and submitting it to be composed. In addition, the Interactive
scenario includes populating the task lists on the user’s home page. The Interactive scenario therefore
does considerably more work during each iteration, which is why Interactive transactions per second (tps)
are much lower than the On-demand transactions per second.
The graph shows that for both test scenarios the throughput rises as the load increases, up to the system’s
saturation point. For the Interactive scenario the maximum throughput is approximately 9 tps and for On-
demand it is approximately 31 tps.
EngageOne Compose – Scalability White Paper Page 9 of 27
Scalability White Paper
The chart below shows how the average response time changes as the load increases. The response times
shown are averages of all web service calls in the scenario (excluding those required to log on and off the
application).
It is clear that response times increase with load, which is to be expected. Plotting the response time
against the throughput can be helpful when estimating the “usable” capacity of a system. The usable
capacity represents the work that the system can do without encountering unreasonably long response
times.
This chart confirms that the system can deliver a certain level of throughput whilst maintaining good
response times, but if the system is asked to deliver slightly more throughput then the response times will
increase significantly. For the Interactive scenario the usable throughput is 8 tps (compared to its
EngageOne Compose – Scalability White Paper Page 10 of 27
Scalability White Paper
maximum throughput, which is 9 tps). For the On-demand scenario the usable throughput is around 27
tps (compared to its maximum of 31 tps).
Throughout this paper all throughput numbers refer to usable throughput unless they are specifically
stated as being maximums.
The number of concurrent users that a system can support depends largely on the workload of those
users. For example, a busy user who is creating 12 documents per hour would use six times as much
system resource as a light user who creates only 2 documents per hour. Therefore, a given hardware
configuration can support six times as many light users as busy users. For any implementation it is
important to understand the number of users and their workload before attempting to size the hardware.
The busy and light users described here are just examples and they will not be applicable to all
implementations.
Multiplying the usable throughput by 3600 (number of seconds in an hour) and dividing by the average
number of documents per user per hour gives the maximum number of concurrent users supported for a
given scenario.
System capacity (tps) Number of busy users supported Number of light users supported
8 (Interactive) 8 * 3600 / 12 = 2400 users 8 * 3600 / 2 = 14400 users
27 (On-demand) 27 * 3600 / 12 = 8100 users 27 * 3600 / 2 = 48600 users
It is best practice to size a system so that the anticipated maximum load does not saturate the system. As
a guideline, the system should never exceed 75% of its saturation point. See the capacity planning section
later in this document for more details.
The graph below shows the CPU utilization for the application servers and the database servers in both test
scenarios. The “App” lines show the average of the two application servers.
EngageOne Compose – Scalability White Paper Page 11 of 27
Scalability White Paper
This graph shows that the application servers’ CPUs reached nearly 90% for the On-demand scenario but it
was a little lower, around 75%, for the Interactive scenario. The situation for the database server is
different – during the On-demand scenario it reached about 45% and during Interactive it reached about
55%. This suggests that the Interactive scenario places more load on the database than the On-Demand
scenario.
Another key resource is the shared file system, or Active Drive. All of the application servers write files to
this shared location and it is important that its disk system has sufficient bandwidth to cope with the load.
The bandwidth is measured in IO Operations Per Second (IOPS).
The graph below shows the IOPS consumed during the On-demand and Interactive scenario tests.
The graph shows that the On-demand scenario reached a peak of around 2500 IOPS whilst delivering a
usable throughput of 27 tps. This means that the system uses approximately 93 IOPS for each transaction-
per-second of throughput. This can be rounded up to 100 IOPS per tps for convenience.
The Interactive scenario reached a peak of around 1000 IOPS whilst delivering a usable throughput of 8
tps. The system therefore uses approximately 125 IOPS per tps for the Interactive scenario.
These numbers provide useful input to the capacity planning process, as described later in the document.
Scaling out Scaling out, or horizontal scaling, refers to the ability of a system to handle more load when additional
servers are added to the environment. In the case of EngageOne Compose it is possible to add extra
application servers to the cluster to achieve greater throughput. The database server and shared file
system must be sized appropriately so that they can handle the extra load.
Load tests were conducted with various numbers of application servers, as shown in the table below. In all
cases the specification of the application servers was the same as in the reference test. They were
m5.2xlarge instances, which have 4 cores and 32GB RAM. The file server for the shared file system (Active
EngageOne Compose – Scalability White Paper Page 12 of 27
Scalability White Paper
Drive) was sized to handle a large amount of load and the same server was used in all test configurations.
However, the database server was sized for each specific test and three different sizes of database server
were used in total.
Number of application servers Database instance
1 db.m4.2xlarge (4 cores)
2 (Reference configuration) db.m4.2xlarge (4 cores)
4 db.m4.4xlarge (8 cores)
8 db.m4.10xlarge (20 cores)
(Note: There is no “8xlarge” instance type)
Note that a single-server test is included for completeness but in a production environment there must
always be at least two servers for resilience.
The two graphs below show the throughput obtained for both the Interactive and On-demand scenarios.
The scenarios are presented separately for clarity.
EngageOne Compose – Scalability White Paper Page 13 of 27
Scalability White Paper
For both scenarios it is clear that doubling the number of application servers results in a significant
increase in throughput. As before, the response time can be plotted against the throughput to enable the
usable throughput to be estimated.
EngageOne Compose – Scalability White Paper Page 14 of 27
Scalability White Paper
The usable throughput for each combination of scenario and hardware is as follows:
Number of app servers
(4 cores each)
Database cores
Usable tps
Interactive
Usable tps
On-demand
1 4 cores 5.5 14
2 (Reference configuration) 4 cores 8 27
4 8 cores 15 53
8 20 cores 39 90
Plotting these numbers on a graph shows that the system has near linear scalability. i.e. Doubling the
number of application servers (and adjusting the size of the database server) gives the system almost
double the capacity.
EngageOne Compose – Scalability White Paper Page 15 of 27
Scalability White Paper
The shape of the On-demand scenario line is as expected. It is almost straight but curves downwards
slightly at 8 nodes. This shows that the CPU capacity of the application servers is the limiting factor for the
On-demand scenario.
The shape of the Interactive scenario line is a little different. It curves up instead of down at 8 nodes. The
reason for this is that the Interactive scenario causes much more work in the database. The database used
for the 8-node tests had 20 cores (compared with 8 cores for the 4-node test). These additional cores
enable the database to process 2.5 times more work, and hence the overall throughput of the system
more than doubled when moving from 4 to 8 application servers. This shows that the CPU capacity of the
database is the limiting factor for the Interactive scenario.
The graph below shows the maximum CPU utilization of the application servers and the database server
for each hardware configuration.
EngageOne Compose – Scalability White Paper Page 16 of 27
Scalability White Paper
A number of observations can be made from this graph.
On-demand scenario:
o The application server CPU was relatively high during all tests, ranging from about 90% in
the 2-server test to about 75% in the 8-server test. This shows that each application
server was working near to its limit, which is good. This suggests that additional On-
demand throughput could be achieved by adding more application servers.
o In the 2-server test the database server was loaded to about 50%. This might have a small
impact and slightly better throughput might have been possible with a larger database
instance.
Interactive scenario:
o With 2 and 4 application servers the database CPU utilization was around 55-60%. This
increased to around 90% when 8 application servers were used, even though the database
had 20 cores.
o This shows that the database is the limiting factor for the Interactive scenario and that
deploying more than 8 servers is unlikely to increase the overall capacity of the system.
Scaling up Scaling up, or vertical scaling, refers to the ability of a system to handle more load when larger servers are
used. In the case of EngageOne Compose it is possible to obtain more throughput by deploying application
servers with additional cores. The database server and shared file system must be sized appropriately so
that they can handle the extra load.
Load tests were conducted using application servers with different numbers of cores, as shown in the table
below. All tests used a cluster of two application servers so that the results can be compared with the
reference test. The larger servers had more RAM than the smaller ones but none of the servers was
constrained by its RAM. The file server for the shared file system (Active Drive) was sized to handle a large
amount of load and the same server was used in all test configurations. However, the database server was
sized for each specific test and three different sizes of database server were used in total.
Note: EC2 “m5” instances were used. These instances are available with 2, 4, 8 and 24 cores. There is no
instance type with 16 cores. The RDS instance types have 4, 8 and 20 cores.
Cores per server Instance type Database instance
2 m5.xlarge db.m4.2xlarge (4 cores)
4 (Reference configuration) m5.2xlarge db.m4.2xlarge (4 cores)
8 m5.4xlarge db.m4.4xlarge (8 cores)
24 m5.12xlarge db.m4.10xlarge (20 cores)
The two graphs below show the throughput obtained for both the Interactive and On-demand scenarios.
The scenarios are presented separately for clarity.
EngageOne Compose – Scalability White Paper Page 17 of 27
Scalability White Paper
The scale-up results are similar to the scale-out results – adding more cores significantly increases the
capacity of the system. Once again, the response times can be plotted against the throughput to enable
the usable throughput to be estimated.
EngageOne Compose – Scalability White Paper Page 18 of 27
Scalability White Paper
The usable throughput for each combination of scenario and hardware is as follows:
Cores per app server
(2 servers in cluster)
Database cores Usable tps
Interactive
Usable tps
On-demand
2 4 cores 7 13
4 (Reference configuration) 4 cores 8 27
8 8 cores 19 50
24 20 cores 28 95
EngageOne Compose – Scalability White Paper Page 19 of 27
Scalability White Paper
Plotting these numbers on a graph shows that the system is able to make good use of the additional cores
when executing the On-demand scenario. When executing the Interactive scenario, the throughput
doubles when moving from four cores to eight cores but there is a much smaller increase when moving
from 8 to 24 cores. This is because the database was the limiting factor in the 24-core Interactive test, as
explained later.
Looking at the CPU utilization of the Application Servers and Database Server it is clear that the Database
Server is the bottleneck in the Interactive scenario but not in the On-demand scenario.
EngageOne Compose – Scalability White Paper Page 20 of 27
Scalability White Paper
For the Interactive scenario, when the Application Servers have 8 cores each and the Database Server also
has 8 cores, the Database CPU runs at around 80%. The Database CPU stays at 80% when the Application
servers have 24 cores and the database has 20 cores. 80% is very high for a database CPU and this result
suggests that a larger Database Server might enable the system to sustain more throughput.
For the On-demand scenario, the Database Server remains between 30% and 50% in all tests. The CPUs of
the Application Servers are higher, ranging from around 60% up to 95%. This suggests that the Database
Server was sized appropriately for the workload.
Batch Processing The previous sections have focused on the scalability of the real-time scenarios: Interactive and On-
demand. Batch scenarios were not included in the Scale-out and Scale-up sections because batch
processing does not scale in the same way.
The Batch bundle is usually installed on a single application server within the EngageOne Compose cluster.
This server may be shared with other bundles or it may be dedicated to batch processing.
Batch is often run during a quiet period, such as overnight. The results presented here were obtained by
running batches when there was no other work happening in the system.
The main unit of measurement for batch processing is the rate of document composition. This can be
expressed as documents per hour, per minute or per second. To be consistent with the other data
presented in this paper the unit will be documents per second.
Batch processing takes place on a single server and it uses a limited number of threads. Therefore, moving
it to a larger server or introducing additional servers has little effect on the rate of composition.
The Reference test environment was used, with the Batch bundle running on a single server. The batch
consisted of equal numbers of the three templates described earlier. The results were as follows:
Batch type Number of docs Time taken Rate (per hour) Rate (per second)
Accumulated 100,000 35:38 minutes 168,382 47
Non-accumulated 100,000 28:00 minutes 214,286 60
Database Platforms The reference configuration uses Linux for the EngageOne servers and Oracle for the database.
EngageOne Compose also supports Windows servers and SQL Server databases. All results presented in
this paper were carried out using Linux application servers. It is likely that Windows application servers
would give very similar results and this will be validated in a future test.
This section presents the results of running the reference hardware with SQL Server instead of Oracle. The
database server is the same specification as the Oracle server in the Reference test – RDS instance type
db.m4.2xl, which has 4 cores (8 vCPUs) and 32GB RAM with a 100GB drive with 2000 Provisioned IOPS.
SQL Server 2016 Standard Edition was used.
EngageOne Compose – Scalability White Paper Page 21 of 27
Scalability White Paper
The first graph below compares the throughput of the On-demand and Interactive scenarios using the SQL
Server and Oracle databases.
It can be seen that the throughput of both configurations is very similar. The response time graph below
also shows very similar results for the two types of database.
The Reference configuration (using Oracle) gave usable throughputs of 8 tps for the Interactive Scenario
and 27 tps for the On-demand scenario. The graph below plots the response times against throughput for
the SQL Server tests and shows vertical lines at 8 and 27 tps. It can be seen that the usable throughput on
SQL Server is very similar to the reference results.
EngageOne Compose – Scalability White Paper Page 22 of 27
Scalability White Paper
Capacity Planning Capacity planning is the process of estimating the hardware required to support the anticipated workload
in a given deployment. There are four key steps to the process:
1. Estimate the demand placed on the system by its users and batch operations.
2. Using the data presented in this document, estimate the number of application servers required to
support the demand.
3. Determine the specification of the database server and shared file system required to support the
application servers.
4. Carry out tests to validate that the environment will have sufficient capacity to meet the needs of
its users.
1: Estimate the demand The goal is to estimate the peak throughput that the system needs to support, measured in documents per
second. When the documents per second figure has been estimated it is straightforward to calculate the
hardware requirements.
Often the batch operations can be ignored because they will be run during quiet times of the day. If
batches are to be run at peak times it might be necessary to increase the capacity of the system beyond
the level needed for real-time workloads. Batch is ignored for the example calculations in this section.
Most organizations think about their document production in terms of user numbers or monthly or annual
production figures. This is a good starting point but additional information and some assumptions are
required to arrive at the documents per second figure.
Example 1
Organization A has 2000 back-office staff who will use EngageOne Interactive. The staff all work from
09:00 to 18:00 and they take a one-hour lunch break. On average each user will create 50 new documents
per day.
EngageOne Compose – Scalability White Paper Page 23 of 27
Scalability White Paper
Average documents per second during working day =
Documents per day per user
Divided by hours worked per day
Divided by seconds in an hour
Multiplied by number of users
50
/ 8
/ 3600
x 2000
= 3.5 documents per second
3.5 documents per second is the average, but there will be busy periods and quiet periods. The system
must be sized to easily cope with the busy periods. During the busy period(s) the demand may reach two
or three times the average demand. For this example, we will multiply the average by three and round up
the result, giving a peak load of 11 documents per second being created in EngageOne Interactive.
Example 2
Organization B has exactly the same number of back-office EngageOne Interactive users as Organization A
and they create the same number of documents per day. Their peak load from EngageOne Interactive is
therefore 11 documents per second.
Organization B also plans to integrate EngageOne Compose into their CRM system via the On-demand web
service interface. The CRM system supports the contact center where two shifts of users cover the period
from 07:00 to 22:00 seven days a week. Statistics show that the contact center creates around 50 million
documents per year and that half of these are created between 08:00-11:00 and 18:00-21:00 on
weekdays.
Average documents per second during busy hours =
Total documents per year
Divided by two (half of all doc are in busy periods)
Divided by number of weekdays in a year
Divided by number of busy hours in a day
Divided by number of seconds in an hour
50,000,000
/ 2
/ 260
/ 6
/3600
= 5 documents per second
Even during the busy hours there will be periods of higher and lower demand so the system must be able
to cope with additional load. If we assume that the peak load is double the busy-period average, we arrive
at a figure of 10 on-demand documents per second at peak times.
There is some overlap between the busy periods of the back office and the contact center. Therefore, the
deployment needs to support 11 Interactive documents per second plus 10 on-demand documents per
second.
2: Estimate the number of application servers required The starting point is the number of documents per second achieved with the reference configuration,
which uses two application servers with four cores each.
EngageOne Compose – Scalability White Paper Page 24 of 27
Scalability White Paper
The reference configuration was able to support 8 documents per second for the Interactive scenario or 27
documents per second for the On-demand scenario. Put another way, for each document created through
Interactive, the system can create approximately three on-demand documents. For simplicity, the capacity
of the reference system will be taken to be 8 Interactive documents per second. The number of on-
demand documents will be divided by three to obtain the equivalent number of Interactive documents.
The actual servers used in the deployment environment will probably not be the same as those used in the
reference configuration. The number of documents per second supported by the chosen servers must
therefore be estimated based on the number of cores per server and the type of processor used. The
scale-out results presented in this document show that the capacity of a server is approximately
proportional to the number of cores. However, it is less straightforward to compare the capabilities of
different types of processor and third-party data may need to be consulted.
The graph below shows the maximum throughput for the Interactive and On-demand scenarios for a given
number of cores. It shows that up to 16 cores the same throughput can be achieved by scaling out (more
servers) or scaling up (larger servers). If higher levels of throughput are required it is more efficient to add
more servers than to use larger servers.
Finally, it is not desirable to load the servers right up to their maximum capacity. It is recommended that
only 75% of the server capacity is taken into account for planning purposes.
Example 1
Organization A needs to support a peak load of 11 Interactive documents per second. The reference
configuration provides a maximum usable throughput of 8 documents per second, which is 4 per
application server.
Number of 4-core reference application servers required =
EngageOne Compose – Scalability White Paper Page 25 of 27
Scalability White Paper
Total user demand (Interactive docs per second)
Divided by target capacity of each application server
Plus number of redundant servers (for resilience)
11
/ 3 [= 4 x 75%]
+ 1
= 5 application servers
Using 5 application servers, the system would have a maximum capacity of 5 x 4 = 20 Interactive
documents per second which is significantly more than the 3.5 per second average. This means there is
plenty of “headroom” for busy periods and the system would still be able to cope with the loss of at least
one server.
Example 2
Organization B needs to support a peak load of 11 Interactive documents per second plus 10 on-demand
documents per second. Dividing the on-demand number by three gives the equivalent Interactive number,
which we round up to 4. This gives a total requirement of 11 + 4 = 15 Interactive documents per second.
Number of 4-core reference application servers required =
Total user demand (Interactive docs per second)
Divided by target capacity of each application server
Plus number of redundant servers (for resilience)
15
/ 3 [= 4 x 75%]
+ 1
= 6 application servers
Organization B would therefore require 6 application servers of the reference specification, with 4 cores
each.
The organization may prefer to deploy a smaller number of servers, each with more cores. Each 8-core
application server should be able to deliver twice the throughput of a 4-core server. The calculation of
application servers would then be as follows:
Number of 8-core application servers required =
Total user demand (Interactive docs per second)
Divided by target capacity of each application server
Plus number of redundant servers (for resilience)
15
/ 6 [= 8 x 75%]
+ 1
= 4 application servers
3: Define the database server and shared file system There are several considerations for the database server:
It must have enough processing power (CPU cores) to support the expected workload. Results
presented in this document have shown that the database server needs approximately one core
for every two cores in the application server cluster. For example, a cluster of four application
EngageOne Compose – Scalability White Paper Page 26 of 27
Scalability White Paper
servers with four cores each has a total of 16 cores. Therefore, the database server should have
(at least) 8 cores.
Its disk system must have enough I/O bandwidth so that it does not become a constraint. The
database disk I/O observed during the performance tests was approximately 60 IOPS per tps for
the Interactive scenario and approximately 20 IOPS per tps for the On-demand scenario. These
should be seen as absolute minimum figures and it is always better to provision disks with the
highest possible throughput capability.
There must be enough storage available on the disk system. This document focuses on the
performance and scalability aspects of the system and does not provide guidance on disk space
requirements.
For the shared file system, there are two considerations:
The file system must have enough I/O bandwidth so that it does not become a constraint. In most
deployments a SAN or NAS device incorporating SSD drives will be used. The file system disk I/O
observed during the performance tests was approximately 125 IOPS per tps for the Interactive
scenario and approximately 100 IOPS per tps for the On-demand scenario. Again, it is always
better to provision hardware with the highest possible throughput capability.
The shared file system must have enough space to store all of the in-progress and completed
documents, batch jobs, and other files. This document focuses on the performance and scalability
aspects of the system and does not provide guidance on disk space requirements.
Example 1
Organization A has estimated a peak load of 11 documents per second using the Interactive application.
This load can be handled by 4 x 4-core application servers (ignoring the fifth server, which is just for
redundancy).
The peak load of 11 Interactive documents per second can be handled by a database server with 8 cores
(16 cores divided by 2). This level of load will generate approximately 660 IOPS (60 IOPS multiplied by 11
Interactive documents per second). A file system with at least 1000 IOPS should therefore be provisioned.
11 Interactive documents per second would require approximately 1375 IOPS on the shared file system
(125 IOPS per tps multiplied by 11 documents per second). A file system with at least 2000 IOPS should
therefore be provisioned.
Example 2
Organization B has estimated a peak load equivalent to 15 documents per second using the Interactive
application. This load can be handled by 5 x 4-core application servers (ignoring the sixth server, which is
just for redundancy).
The peak load of 15 Interactive documents per second can be handled by a database server with 10 cores
(20 cores divided by 2). Depending on the available infrastructure, it might be more practical to deploy a
server with 12 or 16 cores. This level of load will generate approximately 900 IOPS (60 IOPS multiplied by
15 Interactive documents per second). A file system with at least 1200 IOPS should therefore be
provisioned, ideally 1500 IOPS, or more if available.
EngageOne Compose – Scalability White Paper Page 27 of 27
Scalability White Paper
15 Interactive documents per second would require approximately 1875 IOPS on the shared file system
(125 IOPS per tps multiplied by 15 documents per second). A file system with at least 2500 IOPS should
therefore be provisioned.
4: Validate the environment through testing There are many factors that can affect the capacity of a system. For example, the templates used in the
performance tests were designed to represent real world documents, but every organization has different
requirements and some templates will be more complex than others. More complex templates might
require more CPU during the composition phase and also may generate more I/O traffic in the shared file
system.
It is not possible to simulate every variation of a user’s work flow and in some implementations the system
might be used in very different ways. It is therefore essential that some testing is carried out to validate
any assumptions made during the capacity planning process. The tests should use the templates from the
production system and test cases should be built to simulate the way the users work. If there are multiple
groups of users carrying out different types of work, test cases should be created for each type of user.
The techniques illustrated in this document can be used to measure the capacity of a “reference” system
and the results can then be extrapolated to validate the capacity planning assumptions.
Diagnostic and Tuning Techniques It is often possible to identify a system’s bottleneck by looking at some basic resource information.
A key indicator is the CPU utilization on each of the servers. If the application servers’ CPUs are running at
more than 75% it suggests that additional application servers are required. If the database server’s CPU is
high it suggests that either the server is not large enough for the load or that some database maintenance
is required.
Measuring the disk I/O of the shared file system will help to identify if there is a constraint whilst reading
or writing to the disk. The number of bytes per second or IOPS should be measured if possible, and these
numbers should be compared with the stated capabilities of the file server, SAN or NAS.
If these basic measurements do not yield any useful results, more detailed information can be collected
from the Java Virtual Machines (JVMs) used by the EngageOne bundles. The EngageOne Compose
Administration Guide explains how to enable Java Management Extensions (JMX) so that third party tools
can monitor resource usage in the JVMs. The resources that can be monitored include:
Memory usage
CPU usage
Threads
The database is a key, shared resource and it is important that it operates efficiently. Housekeeping tasks
should be carried out regularly and a DBA should ensure that appropriate maintenance jobs are run.
Vendor-specific monitoring tools such as SQL Server Profiler and Oracle Enterprise Manager can help to
identify performance issues in their respective databases.