match 31-bit websphere application server performance with...

24
Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z Solve 31-bit virtual memory crowding issues without sacrificing performance by using heap compression and large pages Kishor Patil, Marcel Mitran, Jim Cunningham Software developers, IBM Software Group Applications and Integration Middleware May 2009 © Copyright IBM Corporation, 2009.

Upload: others

Post on 21-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Match 31-bit WebSphere Application Server performance with

new features in 64-bit Java on System z Solve 31-bit virtual memory crowding issues without sacrificing

performance by using heap compression and large pages

Kishor Patil, Marcel Mitran, Jim Cunningham Software developers, IBM Software Group Applications and Integration Middleware

May 2009

© Copyright IBM Corporation, 2009.

Page 2: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Table of contents Abstract........................................................................................................................................1

Introduction .................................................................................................................................2

Prerequisites ...............................................................................................................................3 Large heap support for 64-bit Java in compressed references mode ..................................................... 3 Large page support.................................................................................................................................. 3

Compressed references support...............................................................................................4 Java object shape and compressed references ...................................................................................... 4 Object header compression..................................................................................................................... 5 Object reference compression schemes ................................................................................................. 5 Object reference compression for a heap size of 2 GB or less ............................................................... 5 Object reference compression for heap sizes greater than 2 GB ........................................................... 5 Special compression optimization for 2 GB to 6 GB heap sizes ............................................................. 6 Compressed references support on IBM z/OS........................................................................................ 7 Compressed references support on Linux on System z:......................................................................... 7 IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs option: ................................................. 7

Large page support.....................................................................................................................8 Large page support on IBM z/OS ............................................................................................................ 8 Large page support in Linux on System z ............................................................................................... 9 Java 6 exploitation of large page support ................................................................................................ 9

Using verbose:gc log for compressed references and large pages ....................................10

Performance analysis and guidelines.....................................................................................11 Multithreaded Java benchmark on z/OS................................................................................................ 11 Performance projections for 64-bit compressed references with heap sizes greater than 2 GB .......... 12 DayTrader benchmark running on IBM WebSphere Application Server V7 for z/OS ........................... 13 Java Heap footprint savings using compressed references .................................................................. 14 Garbage collection time savings using compressed references ........................................................... 15

Setting Java options in WebSphere Application Server .......................................................16 Converting a migrated WebSphere Application Server on IBM z/OS to run in 64-bit mode: ................ 16 Enabling compressed references mode in 64-bit WebSphere Application Server V7 on IBM z/OS:.... 16 Enabling large page support for Java heaps in WebSphere Application Server on IBM z/OS: ............ 17 Enabling compressed references and/or large page support options in IBM WebSphere Application Server V7 on Linux on System z: .......................................................................................................... 18

Conclusion.................................................................................................................................19

Resources..................................................................................................................................20

About the authors .....................................................................................................................21 Acknowledgements:...............................................................................................................................21

Trademarks and special notices..............................................................................................22

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009.

Page 3: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Abstract

This article describes a pair of new features available in the IBM® Developer Kit for Java™ 6, 64-bit edition. The compressed references and large pages features were added to the IBM J9

Java virtual machine (JVM) and IBM Testarossa Just-in-Time (JIT) compiler to provide relief for memory footprint growth incurred when migrating from a 31-bit JVM to a 64-bit JVM. This growth in footprint typically increases system memory requirements while also regressing

throughput performance. This paper shows that it is possible to recover the 31-bit footprint and throughput performance using the 64-bit JVM for heap sizes up to 30 GB. We will review the advantages and disadvantages of using 31-bit SDK and 64-bit SDKs, provide a brief

implementation overview, and discuss the performance characteristics of various combinations of heap sizes and Java options.

All developers are encouraged to read through this article, but the intended audience is

enterprise application developers who are deploying Java workloads on the IBM System z10™ mainframe.

Page 4: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Introduction IBM® Developer Kit for Java™ 6 offers a 31-bit and 64-bit edition of the Java virtual machine (JVM) for the z/OS platform. The 31-bit edition has traditionally provided the best application performance and

footprint. As Java applications grow in complexity and scale, the limited virtual memory range (2 GB) of 31-bit address space puts pressure on Java and native heap usage resulting in out-of-memory errors. As such, there is a growing trend for adopting the 64-bit edition of the JVM. The heap relief provided by the

64-bit edition of the JVM comes at a performance and footprint cost. The overhead of using 64-bit wide object references can require up to 40% more Java heap. The inherently bigger objects also affect data locality and hence contribute to higher Translation Look-aside Buffer (TLB) and data cache miss rates,

resulting in worse application performance.

To overcome this performance bottleneck, IBM has introduced large page support in the latest IBM System z® servers (IBM System z10) and compressed references feature in IBM 64-bit SDK for z/OS,

Java Technology Edition, V6, SDK6. This article describes some best practices for taking advantage of the new hardware and the IBM SDK6 to improve 64-bit footprint and performance.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 5: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Prerequisites

Large heap support for 64-bit Java in compressed references mode IBM 64-bit SDK for z/OS, Java Technology Edition, V6, March, 2009 Maintenance Rollup

APAR PK82091 IBM® 64-bit JRE for Linux® on System z™ architecture, Java™ Technology Edition, Version

6, SR4 IBM z/OS V1R7 or later or 64-bit Linux on System z For best performance, IBM System z hardware using the IBM System z10 processors or later IBM z/OS support APAR OA26294 IBM WebSphere Application Server V7.0.0.3 includes the prerequisite Java release Java command line option –Xcompressedrefs to enable this feature

Large page support IBM 64-bit SDK for z/OS, Java Technology Edition, V6, September, 2008 Maintenance Rollup

APAR PK65878 IBM® 64-bit JRE for Linux® on System z™ architecture, Java™ Technology Edition, Version

5 IBM® 31-bit JRE for Linux® on System z™ architecture, Java™ Technology Edition, Version

5 IBM z/OS V1R9 or later or 64-bit Linux on System z™, Kernel 2.6.25 or later (SUSE SLES10

SP2 or RHEL 5) running in an LPAR IBM System z™ hardware using the IBM System z10 processors or later IBM z/OS support APAR OA20902 (only needed for IBM z/OS V1R9) IBM z/OS support APAR OA25485 IBM WebSphere Application Server version V7.0.0.1 includes the prerequisite Java release Java command line option –Xlp to enable this feature

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 6: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Compressed references support Many workloads using 31-bit JVM are reaching the limitations of a 31-bit virtual memory range. Moving these workloads to 64-bit JVM is desirable to reduce heap constraints, but comes at the cost of increased

footprint and reduced throughput.

Some workloads have shown up to 45% increase in average object size, as the object header and references double in width. Keeping the same heap size as 31-bit usually results in more frequent

garbage collection. In some cases, the application may experience out-of-memory error, if the garbage collection cannot satisfy the application memory requirements. Using large heap addresses these issues, but results in significant real memory footprint. Data locality is significantly reduced because the data

cache can hold fewer objects, resulting in a higher rate of data cache and TLB misses. As such, application performance is typically worse. For instance, in a multithreaded benchmark, even with a moderately bigger Java heap, we observed a 19% performance gap between 31-bit and 64-bit JVM.

Similarly, for a WebSphere Application Server banking application, we observed 8% performance gap.

Java object shape and compressed references

The figure below describes an object with two references for a 31-bit, 64-bit, and 64-bit compressed references IBM SDK6 runtime environment:

Figure 1. Java object shapes in 31-bit and 64-bit Java VMs

Each object has two parts: the object header, and object instance fields. The object header contains a reference to the class, 32-bit flags, and a monitor word, which holds the thread ID for any owning lock. Padding may be required to enforce alignment constraints.

As Figure 1 shows, the 64-bit object requires twice the memory used by the 31-bit object. When the 64-bit JVM with compressed references is used, the 64-bit object size is reduced back to the same size as the 31-bit object.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 7: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Object header compression

The clazz field is a reference to memory that contains class-related data such as static fields, reference to class loader, and the virtual function table. In compressed references mode, the class data is allocated

below the 2 GB virtual address, so it can fit into 32 bits. Similarly, the monitor field is compressed to 32 bits by allocating thread data below the 2 GB virtual address. Since all the fields in the object header require 32 bits, there exists no padding in the object header.

Object reference compression schemes

The object data includes instance fields such as integers, floats, doubles, chars, bytes, and object references. In compressed references mode, 64-bit object references are compressed to 32-bit values by using one of several compression schemes. The different schemes represent a trade-off between the

maximum Java heap size and the path-length incurred for compressing and decompressing the references.

The maximum Java heap is specified by the user by using –Xmx option. Note that even if the application

specifies a smaller starting heap using –Xms, the compression scheme is chosen based on the maximum heap rather than the starting heap.

Object reference compression for a heap size of 2 GB or less

For heap sizes of 2 GB or less, the Java heap is allocated in a virtual address range below 2^32. Within

this address range, the most significant 32 bits of the Java object reference are zeroes, hence only the low word is needed to represent the object reference.

Although a theoretical 4 GB heap can be allocated in this range, operating system restrictions limit the

size of the Java heap to 2 GB. For Linux® on System z, the amount of heap that can be allocated depends on the specifics of the Linux distribution. More details on this will follow in a separate section.

On IBM System z, SDK6 uses 32-bit instructions to store or compare object references. When an object

reference needs to be de-referenced, the high 32 bits need to be cleared to build a well-formed 64-bit pointer. This decompression is achieved by zero-extending the compressed 32-bit value when loading it out of the Java heap.

On the IBM System z9® processor and earlier models, the zero-extension and subsequent de-reference may result in an address generation interlock (AGI) pipeline stall. The IBM System z10 pipeline implements a bypass to remove the stall for this event, thus providing the best performance for 64-bit

compressed references. As mentioned earlier, the System z10 processor also includes large page support, another key feature for 64-bit performance.

Object reference compression for heap sizes greater than 2 GB

Java heaps that are greater than 2 GB cannot be allocated in the 2^32 virtual range which means that the

most significant 32 bits of the references may be non-zero. However, since all objects in the 64-bit Java heap are aligned at 64-bit boundaries, the least-significant 3 bits of the virtual address are always zero. The IBM JVM compresses the object references by right shifting the address by 1, 2, or 3 depending

upon where the top of the Java heap falls, which in turn depends on maximum requested heap size.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 8: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Table 1 below summarizes the maximum heap sizes, and respective shift amounts for each reference compression scheme.

Max heap size specified by –Xmx option

Top of the heap located Shift amount used for object reference compression*

2 GB or less below 2^32 0*

6 GB or less below 2^33 1*

14 GB or less below 2^34 2*

30 GB or less below 2^35 3*

Greater than 30GB above 2^35 Only supported without compressed

references

Table 1. Supported shifting modes for compressed references

* Linux on System z kernel and other programs may fragment the virtual memory ranges below 2^35.

This may force the Java heap to be located in a higher virtual memory range resulting in smaller thresholds for maximum heap and different shift values than stated in table 1.

The compression scheme is determined at JVM startup and is dependent on the virtual address range

where the top of the Java heap falls. The same compression scheme is applied to all object references for the life of that JVM.

Compression is applied on an object reference before it is stored in another object’s instance field. As

such, the reference is appropriately shifted right and then only the least significant 32 bits of the shifted value are stored.

When an object reference is read from the heap for the purpose of being de-referenced, it is read as a 32-

bit zero-extended compressed value into a 64-bit register. The value is then shifted left appropriately. The result is a fully formed 64-bit virtual address that can be de-referenced.

The additional shifting operations add path-length that represents a performance cost to the Java

application. This extra path-length is paid in exchange for reduced footprint and improved data locality. In some applications, the benefits of the reduced footprint and increased locality will provide a net gain in overall performance.

Special compression optimization for 2 GB to 6 GB heap sizes

For the shift-by-1 compression scheme (2 GB to 6 GB Java heap), the IBM JVM on System z can, in many cases, remove the cost of the shift operation by exploiting z-architecture’s base/index register

memory references. Explicit shifting is still required for array accesses and the GC runtime. As such, the performance of the shift-by-1 compression scheme is still worse than that of the shift-by-0 compression scheme. However, shift-by-1 does typically perform better than shift-by-2 or shift-by-3 compression

schemes. Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z

© Copyright IBM Corporation, 2009

Page 9: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Compressed references support on IBM z/OS

APAR OA26294 provides a direct assembler interface to z/OS Real Storage Manager (RSM), which allows memory allocations in the 2 GB (2^31) to 32 GB (2^35) virtual address range. The IBM JVM uses

this API to allocate the Java heap in this virtual address range. It is noted that memory allocated using the RSM API can be backed by either 4 KB pages or large pages.

Compressed references support on Linux on System z:

The size and location of the Java heap on Linux on System z will depend on the Linux distribution and

application configuration. In this section, some tricks and tips for allocating the optimal Java heap in the 0 to 2^32 virtual address range are provided.

For Linux on System z, 64-bit executables are always loaded at virtual address 2^31. The Java heap

must be contiguous, hence Java is limited to approximately 2 GB below 2^31 bar. The linker is responsible for selecting the base address at which the application is placed. By creating a custom Java launcher, the default linker script can be modified to use a different base address. The default linker

script can be captured by running "ld –verbose” as such:

% ld –verbose >myLinkerScript

The first line after the SECTIONS statement in the myLinkerScript should read PROVIDE (__executable_start = 0x80000000); . = 0x80000000 + SIZEOF_HEADERS;

The 0x80000000 value can be changed to construct a modified linker script that can be used to link a

custom Java launcher. The new value must be less than 2^32, hence it is best to move the __executable_start to the lower end of the address range. The system memory map should be

inspected to find the right virtual address from which the largest contiguous virtual address range can be

made available for the Java heap. The system memory map is found in /proc/XXXX/maps, where XXXX is a process ID (PID) of the Java process.

IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs option:

When the –Xcompressedrefs option is specified with a –Xmx value less-than 30 GB, the JVM will try to allocate the Java heap in the 2^31 to 2^35 virtual range. The shift amount for object compression will be automatically selected based on where the top of the Java heap is located. The verbose garbage

collection log can be used to find out shift value used by the JVM. Details on this topic will follow in a later section. As discussed earlier, the application performance will depend on the shift value. If the requested heap cannot be allocated below virtual address 2^35, the 64-bit JVM will fail to start (it will not

automatically switch to default mode).

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 10: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Large page support Virtual memory provides the illusion that applications can allocate and use memory addressable by the size of a pointer. As such, a 31-bit application can use up to 2 GB (2^31) of virtual memory, while a 64-bit

application can use 2^64 bytes of virtual memory.

The amount of real storage on a system can be much smaller than the amount of virtual memory used. To provide the illusion of large virtual storage, the operating system keeps track of virtual memory ranges

and dynamically maps them to real or absolute real storage ranges using a Dynamic Address Translation (DAT) structures. To improve the performance for virtual-to-real address translation, a special hardware table, called the Translation Look-aside Buffer (TLB), is used to cache recently used virtual-to-real

mappings.

If an application requires a large virtual footprint, the TLB may not provide sufficient addressability to map the application’s working set. In the event that a translation lookup is not found in the TLB, a full lookup in

the DAT structures is required, which can degrade application performance significantly. As the TLB is a fixed-size hardware buffer, the maximum amount of memory that the TLB can map is defined by the page size. In this respect, most standard hardware and operating systems use 4 KB pages. However, with

growing application footprints, support for larger page sizes has recently emerged.

Large page support on IBM z/OS

The IBM System z10 processors introduced support for 1 MB pages. IBM z/OS provides an assembler interface for allocating virtual memory using large pages in z/OS V1R9 through APAR OA20902 and

APAR OA25485. The large pages are defined at z/OS System Initial Program Load (IPL) using the LFArea=xxxxxxG keyword on IEASYSxx. Currently, the large pages are fixed (not swappable) and are backed by real memory. Hence it is advisable to allocate large pages in such a way that other application

code using normal pages could run without encroaching on the large pages. If the system is constrained for 4 KB pages and the large pages are still available, it will convert the available large pages into 4 KB pages. Also should the need for large pages arise, it will try to coalesce previously demoted large pages

(1 MB pages) to satisfy large pages request, and may do so by swapping out some 4 KB pages. However, if large pages are allocated, as they are not swappable, the allocated large pages cannot be converted to 4 KB pages.

Currently, IBM z/OS only supports the allocation of large pages above the 2 GB virtual memory bar; hence they are not available to 31-bit applications. Of note as well, IBM z/OS does not currently support locating 64-bit executable code above 2 GB virtual memory address and as such application executable

code may not reside in memory backed by large pages. However, 64-bit applications can still gain performance value by allocating data in virtual memory backed by large-pages in the following two ways:

1. Since each 1 MB large page represents 256 times more virtual memory than a 4 KB page, fewer

TLB entries are required to represent the data footprint of the application. This more efficient use of TLB resources will result in fewer TLB misses for data accesses.

2. As a result of reducing the number of TLB entries required for the data footprint of the application,

more TLB entries are available for the executable code, hence TLB misses on instruction fetches can be reduced or eliminated.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 11: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Our experimentations in a controlled environment showed 7% performance improvement to a Java multithreaded benchmark on z/OS. The current limitations of large page support described above are

subject to change in future z/OS releases.

Large page support in Linux on System z

Large pages can also be exploited on Linux on System z (kernel level 2.6.25 or later). However, there are many differences in setup, applicability, and performance when compared with z/OS.

The Linux on System z kernel supports 2 MB large pages (vs. 1 MB pages on z/OS). It uses the hugetlbfs API to emulate large pages. When the Linux on System z kernel is running in an LPAR on the IBM System z10 hardware, it emulates the 2 MB large pages by using two real large pages. When

running on older hardware, or when running on IBM z/VM, the Linux on System z kernel uses software simulation to provide the same support but with little performance benefit.

Our experimentations in a controlled environment showed 2% performance improvement using large

pages on a Java multithreaded benchmark on Linux on System z.

Linux on System z does not have any restrictions on virtual memory ranges for large page exploitation, so 31-bit applications can also benefit from large pages.

Linux on System z can also use large pages for executable code, so processor stalls due to TLB misses on instruction fetches can be reduced or eliminated.

Java 6 exploitation of large page support

IBM 64-bit SDK for z/OS, Java Technology Edition, V6 has recently introduced support for optionally

allocating Java heap and internal data for the virtual machine using large pages. IBM® 64-bit JRE for Linux® on System z™ architecture, Java™ Technology Edition, Version 5 and IBM® 31-bit JRE for Linux® on System z™ architecture, Java™ Technology Edition, Version 5 also support using large

pages.

By default, the JVM allocates normal 4 KB pages. The user may request that the JVM allocate the Java heap using large pages by specifying the –Xlp option. If the system does not have large pages enabled,

or does not have the required number of large pages available to satisfy the allocation request, the JVM will fall back to using normal 4 KB pages. The user may use –Xverbose:gc option to know what page size was used.

The performance benefit from large pages depends on the application characteristics. If the application was experiencing TLB misses with normal pages, then large pages can provide a significant performance boost. The Java applications that allocate lots of Java objects and cause frequent garbage collection will

typically benefit from large pages. The data access pattern can also affect the benefits of large pages.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 12: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Using verbose:gc log for compressed references and large pages IBM J9 garbage collector verbose logs now provide information about the compressed references and large pages mode utilized by the JVM instance. It can be enabled by specifying –verbose:gc command

line option.

In the above garbage collector verbose log, the compressedRefs attribute indicates that the JVM is using compressed references mode. The compressedRefsShift attribute indicates that the compression

shift amount is zero, so the Java heap was allocated below virtual address 2^32.

The pageSize attribute indicates that 1 MB (large) pages are used to back the Java heap.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 13: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Performance analysis and guidelines Compressed references and large pages performance were measured with several benchmarks on IBM

System z10 processor-based systems. Results from these measurements are presented in this section. The following charts display performance improvements that are based on measurements obtained using standard IBM benchmarks in a controlled environment. The actual throughput that any user application

will experience depends on the user system configuration, workload, and the application characteristics. Therefore, there is no assurance that an individual user can achieve throughput improvements that are equivalent to the performance ratios stated here. Users may experience significantly better or worse

application performance.

Both compressed references and large pages work to reduce latencies in data access. The two features are complementary to each other but attempt to address the same performance bottleneck. More

specifically, the compressed references feature compresses heap pointers so that more objects fit in the data caches, TLB and pages on the system. Large page support increases the addressable area of the TLB entries, thus increasing the TLB capacity and reducing the number of misses.

Multithreaded Java benchmark on z/OS

This benchmark was run on a 16-way System z10 dedicated z/OS LPAR with 16 GB of memory. The application spawns an increasing number of worker threads to measure scalability of the system. The overall throughput increases as the number of worker threads increases. When reaching system

capacity, the throughput remains fixed and becomes independent of any new worker threads. This benchmark generates a lot of objects and hence drives a lot of garbage collection. As such, improvements in data locality and improvements in GC performance are well showcased by this

benchmark.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 14: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Multi-threaded benchmark 64 Bit Compared to 31-bit Performance

z10 16-way z/OS 1.9 Java 6 SR3

64-bit 2 GB 64-bit 2 GB 64-bit 2 GB CompRefs +

64-bit 2 GB Large Pages CompRefs LargePages 10%

5% 5%

1% 0%

-5%

-10%

-12% -15%

-20% -19%

Figure 2. Multithreaded Benchmark Performance Comparison

It is noted that the maximum Java heap available to the 31-bit JVM when running this workload is 1450 MB.

The 64-bit edition of IBM SDK6 can allocate a bigger heap than 1450 MB, but even with increased heap

size of 2 GB, it experiences a 19% drop in throughput when compared with the 31-bit edition. This drop in throughput is due to a 40% increase in heap footprint. The significant loss in data locality due to larger object size results in an increase in data cache and TLB misses in the application code and during

garbage collection.

The effect of TLB misses can be reduced by using large pages to back the Java heap. The 64-bit edition of Java 6 with large (1 MB) pages backed 2 GB reduces the performance gap to 12%.

The effect of data locality can be reduced by using heap compression. When running the 64-bit edition in compressed references with 2 GB heap backed by normal 4 KB pages, it outperforms the 31-bit edition by 1%.

When we put both features together, the 64-bit edition running in compressed references mode with large (1 MB) pages backed 2 GB heap outperforms the 31-bit edition by 5% because of the combined effect of improved data locality and fewer TLB misses.

Performance projections for 64-bit compressed references with heap sizes greater than 2 GB

To measure the overhead of the shift-based compression schemes, performance measurements were done by fixing the Java heap to 2 GB (to allow for a direct comparison to shift-by-0 results). The

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 15: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

multithreaded Java benchmark measurements showed that the compression schemes shift-by-2 and shift-by-3 add an overhead of 3.12% over shift-by-0 scheme. For shift-by-1, which benefits from an

optimization to exploit the memory reference instructions available on the z-architecture, the cost of shifting adds only 2.24% performance overhead over shift-by-0 scheme.

However, increasing the heap size beyond 2 GB may reduce garbage collection overhead to the extent

that the extra overhead of shift-by-{1,2,3} compression scheme will be an acceptable trade-off.

This paper focuses on scenarios where moderate increase in heap size (to 2 GB or less) is tolerated to bridge the performance gap between 64-bit and 31-bit. The performance discussions on large heaps

beyond 2 GB, and small heaps less than 2 GB (same as 31-bit) are limited in order to reduce the scope of this paper.

DayTrader benchmark running on IBM WebSphere Application Server V7 for z/OS

The DayTrader benchmark is a stock trading 3-tier application running on IBM WebSphere Application

Server. It is an IBM variant of Apache DayTrader [http://cwiki.apache.org/GMOxDOC20/daytrader.html].

Figure 3. DayTrader benchmark performance comparison

The benchmark was run in a 3-tier setup consisting of a dedicated 12-way System z10 LPAR running WebSphere Application Server Version V7.0.0. on z/OS, a dedicated 8-way System z10 LPAR running

IBM DB2® on z/OS, and two client machines driving an automated workload comprised of 2000 users performing various trading activities.

The baseline measurement is the 31-bit WebSphere Application Server 7 with a 900 MB heap. A 900 MB

heap is the typical maximum 31-bit Java heap when running WebSphere Application Server on zOS.

The 64-bit WebSphere Application Server 7 can allocate a much bigger Java heap than 900 MB. With the 2 GB Java heap, the 64-bit WebSphere Application Server V7 lags behind the 31-bit edition by 7.83%.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

DayTrader 1.2 Java Performance64-bit performance compared to 31-bit base

z10 12+8 3-tier H/W Configuration

-7.83%

-4.85%

-2.46%

0.00%

-10.00%

-8.00%

-6.00%

-4.00%

-2.00%

0.00%

2.00% 64-bit? 2GB

64-bit 2GB+CR

64-bit 2GB+LP

64-bit 2GB+CR+LP

Administrator
Note
Accepted set by Administrator
Administrator
Note
MigrationConfirmed set by Administrator
Page 16: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

When we used 64-bit WebSphere Application Server V7 in compressed references mode using 2 GB heap, the throughput gap was reduced to 4.85%.

When we used large pages to back the 2 GB heap, but run the 64-bit WebSphere Application Server V7 in default mode, the throughput gap was reduced to 2.46%.

When combining the 2 GB heap with compressed references and large pages, the 64-bit WebSphere

Application Server V7 matches the 31-bit WebSphere Application Server V7 performance.

Java Heap footprint savings using compressed references

In order to completely understand the performance improvements offered by compressed references, it’s important to understand the effect on garbage collection. In a previous section entitled “Java Object Shape and Compressed References”, an example was given that shows the reduction in size of 64-bit objects when using compressed references. This savings is demonstrated in figure 4, which shows the amount of garbage collected per request for the DayTrader benchmark.

JVM Heap Footprint Savings With Compressed References

134

193

131

0

50

100

150

200

250

31-bit 64-bit 64-bit + CR

Gar

bag

e (K

B)

per

Req

ues

t

Figure 4. Amount of garbage collected per request for the DayTrader benchmark

Figure 4 shows that, using a 900 MB heap, the amount of garbage collected grows from 134 KB per

request in 31-bit mode to 193 KB (+44%) using standard 64-bit. However, using 64-bit compressed references this amount is reduced back to equivalence of 31-bit mode. Each value is computed by dividing the amount of heap memory freed for a given time interval by the total number of requests

completed during that time. Verbose GC logging must be enabled in order to acquire the information necessary for this analysis.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 17: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Garbage collection time savings using compressed references

The previous section showed the footprint savings realized using 64-bit compressed references. While this allows for increased scalability by freeing space for more objects in an equivalent JVM heap size, it

does not tell a complete story of garbage collection efficiency. Perhaps an even more important metric is the amount of time spent in garbage collection. Figure 5 shows GC time relative to the amount of time required for GC in 31-bit mode using a 900 MB JVM heap size. The time interval used is the same for the

previous section on JVM footprint savings.

Garbage Collection Time Savings With Compressed References

1.00

1.92

1.44

0.90

0.63

0.00

0.25

0.50

0.75

1.00

1.25

1.50

1.75

2.00

2.25

31-bit 64-bit Standard 64-bit Compressed 64-bit Standard 64-bit Compressed

Tim

e R

ela

tiv

e t

o 3

1-b

it 9

00

MB

He

ap

(900 MB Heap) (900 MB Heap) (900 MB Heap) (2048 MB Heap) (2048 MB Heap)

Figure 5. Time spent doing garbage collection for the DayTrader benchmark

Figure 5 shows that the amount of time spent doing garbage collection using standard 64-bit mode is

almost double (1.92x) that of 31-bit mode. Using 64-bit compressed references mode, this time is reduced to less than 1.5x (1.44). As expected, GC time is reduced even further by increasing the JVM heap size to 2048 MB, but this is due solely to less frequent GC cycles. The magnitude of improvement

of compressed references mode using a 2048 MB heap is similar to the 900 MB case.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 18: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Setting Java options in WebSphere Application Server

Converting a migrated WebSphere Application Server on IBM z/OS to run in 64-bit mode:

With WebSphere Application Server V7, the default mode when configuring new servers is 64-bit. The

configuration tool allows the client to configure new servers as 31-bit, but they need to make that selection.

However, when existing servers are migrated, their addressing mode is preserved; that is, a 31-bit server

will be migrated as a 31-bit server, and a 64-bit server will be migrated as a 64-bit server.

To switch the addressing mode of a specific server to 64-bit: Navigate to the Application Server Settings page in the Administrative Console: Servers >

Server Types > WebSphere application servers > server_name. Check the Run in 64-bit JVM mode check box. Recycle the server to make the change effective.

Refer to the WebSphere Application Server 7 Information Center topic entitled "Converting a migrated server to run in 64-bit mode" for considerations when switching addressing modes.

Enabling compressed references mode in 64-bit WebSphere Application Server V7 on IBM z/OS:

To enable a 64-bit JVM to run in the compressed references mode, you need to specify a new environment variable in WebSphere Application Server configuration:

In the administrative console, click: Servers > Server Types > WebSphere application servers

> server_name. Click the Configuration tab, and then under Server Infrastructure section, click Java and

process management > ProcessDefinition > servant. Then in the additional properties section, click Environment entries.

Add/update the environment entry for IBM_JAVA_OPTIONS as follows.

If you see an existing environment entry named IBM_JAVA_OPTIONS, edit it to append the Java option –Xcompressedrefs to the existing value.

Otherwise, click New to create a new environment entry.

Fill in following values in their respective fields of the form:

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 19: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Name: IBM_JAVA_OPTIONS Value: -Xcompressedrefs Description: Enable 64-bit Compressed References mode

Click Apply to update the WebSphere Application Server environment.

Restart WebSphere Application Server to start WebSphere Application Server in compressed references mode.

The above procedure updates the ‘was.env’ file in the WebSphere Application Server configuration

directory. The change will apply the settings to all (servant, control, and adjunct) regions.

It is noted that supplying –Xcompressedrefs as a generic JVM argument, will cause WebSphere Application Server to fail to start with unsupported Java option error. If the application requires bigger

than 30 GB Java heap, 64-bit default mode should be used instead.

Enabling large page support for Java heaps in WebSphere Application Server on IBM z/OS:

To use large pages with WebSphere Application Server version 7.0.0., large pages must first be set up on the IBM z/OS system running on IBM System z10 processor. Instructions for how to do this are outlined

in the documentation from IBM z/OS support APAR OA25485.

Since large pages are only available above 2^31 virtual address on IBM z/OS, the WebSphere Application Server needs to run in 64-bit mode.

To use large pages for Java heap, the –Xlp option must be specified in the WebSphere Application Server configuration options as a generic JVM argument section. In the administrative console for WebSphere Application Server, click:

Servers > Server Types > WebSphere application servers > server_name.

Click the Configuration tab, and then under Server Infrastructure section, click:

Java and process management > Process definition > servant > Java virtual machine.

Enter the –Xlp command line argument in the Generic JVM arguments field.

When using WebSphere Application Server version 7.0.0.1 on IBM z/OS,–Xlp may be applied to WebSphere Application Server configuration options for adjunct, servant and control region separately

following the same procedure as above.

These changes can be confirmed by checking the configuration files in the WebSphere Application Server configuration directory, adjunct.jvm.options, servant.jvm.options, control.jvm.options.

If limited large pages are available, it is recommended that large pages be applied to the servant region first.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 20: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Enabling compressed references and/or large page support options in IBM WebSphere Application Server V7 on Linux on System z:

Both command line options –Xcompressedrefs and –Xlp can be specified as generic JVM arguments to enable compressed references and large pages in WebSphere Application Server running on Linux for

System z.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 21: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Conclusion

As applications continue to grow in complexity and scale, the limitations of the 31-bit address space on

z/OS and Linux on System z are becoming more apparent. IBM WebSphere Application Server version 7.0.0.3 uses IBM 64-bit SDK for z/OS, Java Technology Edition, V6, March, 2009 Maintenance Rollup. The latter supports a pair of features for improving the performance of 64-bit Java. Compressed

references and large pages are complementary features that attempt to alleviate the performance bottleneck resulting from the increased footprint when using 64-bit Java. More specifically, compressed references compresses heap pointers so that more objects fit into the hardware data caches, translation

look-aside buffer (TLB), and pages available on the system. Large pages increase the addressable area of the TLB, thus reducing the number of page-table accesses.

By exploiting compressed references, customer applications running in 64-bit mode can now achieve a

Java heap footprint similar to that observed in 31-bit mode. Additionally, these customers can now move the Java heap above the 2^31 virtual address, which frees up below-the-bar storage for other uses, while allowing the Java heap to be backed with large pages. Combining the advantages of larger Java heaps,

heap compression, and large pages, customer applications could observe significantly improved throughput, sometimes out-performing 31-bit performance.

The two flagship performance features of IBM 64-bit SDK for z/OS, Java Technology Edition, V6, (March,

2009 Maintenance Rollup), showcase how IBM is exploiting the latest IBM System z hardware, driving changes in the z/OS operating system, adding intrinsic innovations in compiler technology, and making them available to IBM customers through changes in middleware such as WebSphere Application Server.

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 22: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Resources These Web sites provide useful references to supplement the information contained in this document:

WebSphere Compressed References Technology white paper

ftp://ftp.software.ibm.com/software/webserver/appserv/was/WAS_V7_64-bit_performance.pdf

Translation Look Aside Buffer (TLB)

http://en.wikipedia.org/wiki/Translation_Lookaside_Buffer

Large page support on Linux

http://linuxgazette.net/155/krishnakumar.html

SHARE presentation on zOS Real Storage Manager (RSM) Large Page Support

http://ew.share.org/proceedingmod/abstract.cfm?abstract_id=19388&conference_id=20

IBM System z10 support for large pages

http://www.research.ibm.com/journal/rd/531/tzortzatos.pdf

IBM z/OS APAR OA20902 for large page support

http://www-01.ibm.com/support/docview.wss?rs=112&context=SWG90&context=SWGA0&context=SWGB0

&context=SWG80&q1=large+page+access&uid=isg1OA20902&loc=en_US&cs=utf-8&lang=en

IBM z/OS APAR OA25485 for large page support

http://www-01.ibm.com/support/docview.wss?rs=112&context=SWG90&context=SWGA0&context=SWGB0&

context=SWG80&q1=large+page+access&uid=isg1OA25485&loc=en_US&cs=utf-8&lang=en

IBM z/OS APAR OA26294 for large compressed references heap support

http://www-01.ibm.com/support/docview.wss?rs=112&context=SWG90&context=SWGA0&context=SWGB0&

context=SWG80&q1=OA26294&uid=isg1OA26294&loc=en_US&cs=utf-8&lang=en

IBM Developer Kits for Java download

https://www.ibm.com/developerworks/java/jdk/

Apache DayTrader

http://cwiki.apache.org/GMOxDOC20/daytrader.html

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 23: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

About the authors

Kishor Patil is a software developer with the IBM Testarossa JIT compiler team at IBM Toronto Lab. You can reach Kishor at [email protected].

Marcel Mitran is a technical manager with the IBM Testarossa JIT compiler team at the IBM Toronto Lab. You can reach Marcel at [email protected].

Jim Cunningham is a performance analyst with IBM WebSphere team at IBM Poughkeepsie Lab. You can reach Jim at [email protected].

Acknowledgements:

The authors would like to thank and acknowledge the following individuals from IBM for their contributions to this paper:

TR-JIT team: Derek Inglis, Joran Siu and Levon Stepanian

Java Performance team: Clark Goodrich, James Perlik and John Rankin

WebSphere Application Server on z/OS team: Mike Cox, Colette Manoni, and William Scott

z/OS RSM team: Elpida Tzortzatos

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Page 24: Match 31-bit WebSphere Application Server performance with ...public.dhe.ibm.com/partnerworld/pub/whitepaper/1d71a.pdf · IBM SDK6 JVM behavior on IBM z/OS with –Xcompressedrefs

Match 31-bit WebSphere Application Server performance with new features in 64-bit Java on System z © Copyright IBM Corporation, 2009

Trademarks and special notices © Copyright IBM Corporation 2009.

References in this document to IBM products or services do not imply that IBM intends to make them

available in every country.

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked

terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A

current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other

countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Information is provided "AS IS" without warranty of any kind.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly

available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the

supplier of those products.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending

upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the

ratios stated here.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part

of the materials for this IBM product and use of those Web sites is at your own risk.