Linux Porting, Performance Measurements, and Scaling Advantage
using the z13 (IBM Mainframes)
Moriyoshi Ohara, STSM, Emerging Workload Characterization & Acceleration, IBM Research
June 2015, MongoDB World, New York, USA
Authors: Bryan Chan, Dale Hoffman, Yasushi Negishi, Moriyoshi Ohara, Hartmut Penner, Stefan Wirag, Otto Wohlmuth
MongoDB for Linux on z Systems
©2015 IBM Corporation 31 May 20152
Trademarks
Notes:
Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the
user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.
IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.
All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual
customer configurations and conditions.
This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services
available in your area.
All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-
IBM products should be addressed to the suppliers of those products.
Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
This information provides only general descriptions of the types and portions of workloads that are eligible for execution on Specialty Engines (e.g, zIIPs, zAAPs, and IFLs) ("SEs"). IBM authorizes customers to use IBM SE only to execute the processing of Eligible Workloads of specific Programs
expressly authorized by IBM as specified in the “Authorized Use Table for IBM Machines” provided at www.ibm.com/systems/support/machine_warranties/machine_code/aut.html (“AUT”). No other workload processing is authorized for execution on an SE. IBM offers SE at a lower price than
General Processors/Central Processors because customers are authorized to use SEs only to process certain types and/or amounts of workloads as specified by IBM in the AUT.
* Registered trademarks of IBM Corporation
The following are trademarks or registered trademarks of other companies.
* Other product and service names might be trademarks of IBM or other companies.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and
other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.
Windows Server and the Windows logo are trademarks of the Microsoft group of countries.
ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java and all Java based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.
Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.
Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.
The following are trademarks of the International Business Machines Corporation in the United States and/or other countries.
AIX*
BladeCenter*
Build Forge*
CICs*
ClearCase*
Cognos*
DB2*
DB2 Connect
Domino*
FileNet*
HiperSockets
IMS
Informix*
InfoSphere
Lotus*
Maximo*
MQSeries*
Parallel Sysplex*
POWER*
POWER7*
Proventia*
PR/SM
Quickr
Rational*
Smarter Cities
SPSS*
System z*
Tivoli*
WebSphere*
z/Architecture*
zEnterprise*
z/OS*
z/VM*
©2015 IBM Corporation 31 May 20153
Linux on z Systems (IBM Mainframes)
MongoDB porting to Linux on z Systems
MongoDB performance measurements and results
MongoDB in a Docker environment on z Systems
Future directions
Agenda
©2015 IBM Corporation 31 May 20155
Linux on z Systems – StructureMany Linux software packages did not require any code change
Linux Applications
Instruction Set Architecture and I/O Hardware
Linux Kernel
HW Dependent Drivers
Linux Applications
Generic Drivers
Network Protocols Filesystems
Platform Dependent Code
BackendGNU Runtime Environment
Process
Management
Memory
Management
Architecture
Independent
Code
Ba
cke
nd
GN
U C
om
plie
r S
uite
1.81 % platform specific code in Linux Kernel 2.6.25
0.55 % of platform
specific code in
Glibc 2.5
0.28 % platform specific code in GCC 4.1
©2015 IBM Corporation 31 May 20157
Linux is Linux,but are all Linux infrastructure solutions identical?
, while Linux is Linux, the underlying infrastructure
(hardware and infrastructure software) directly affects the
Linux workloads.
No
©2015 IBM Corporation 31 May 20158
Enterprise grade Linux solutionWhile „Linux is Linux“, the underlying platform is providing differentiation of the Linux solutions.
An “enterprise grade Linux” solution, in our understanding, has defined characteristics:
IT simplicity, allowing to run up to hundreds of different workloads in parallel on one
server
Easy workload integration of new and existing data and applications
Flexible server provisioning, simple to manage
High productivity, based on efficient systems and life cycle management
Highest resource utilization levels
High levels of quality of service – security, availability, reliability
“Enterprise-grade isn’t just about specific features, rather it is about delivering a strategy that
enables a consistent architectural model with the support and service necessary for [the] …
complex environment that organizations find themselves in.” - Ben Kepes, contributor to Forbeswww.forbes.com/sites/benkepes/2013/12/18/what-does-enterprise-grade-really-mean
©2015 IBM Corporation 31 May 20159
Linux on z13The enterprise grade Linux solution
z13 1
Up to10 TB
>3X more available
memory
Up to141
Configurable cores
Up to85
Configurable LPARs
IBMzAware
Maximize service
levels
LargerCache
More workloads per server
Crypto Express5S
Performance and function
SMT,SIMD
Enhanced performance
Enterprise grade Linux solution:
IBM GDPS® Virtual Appliance
Continuous availability & Disaster recovery
IBM Spectrum Scale(IBM GPFS technology)
Clustered file system
SOD*:
KVM for z SystemsOpen source virtualization
IBM Infrastructure SuiteManagement suite for z/VM and Linux
IBM Wave for z/VMIntuitive virtualizationmanagement
IBM z/VMVirtualization with efficiency at scale
IBM z13Unmatched servertechnology & capacity
* All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.1 Total capacity improvement over zEC12 of 40+ percent
©2015 IBM Corporation 31 May 201510
Data center simplicity inside one server
Trusted operations
Unrivaled economics
Linux on IBM z13
LPAR Logical Partition = subset of hardware resources,
virtualized as a separate computer;
up to 85 LPARs can be configured
IFL Integrated Facility for Linux = core;
up to 141 cores (IFLs) on IBM z13™ (z13)
Virtual.
Mgmt.
Hypervisor providing efficiency at scale and
virtualization management for easy administration,
provisioning, automation
Linux
Guest
virtual Linux guests running workloads such as
mobile, analytics, databases, Java™ apps, etc.
– in a cloud;
up to thousands Linux guests can be hosted on a z13
Physicalresources
Memory
IFLs
I/O and Network
z/OS® z/VSE®VirtualizationManagement
LPAR LPAR LPAR LPAR LPAR LPAR
LinuxGuests
LinuxGuests
Virtualizedresourcesin LPARs
Linux Linux
©2015 IBM Corporation 31 May 201513
Linux on IBM z SystemsThe real alternative to x86 server sprawl
It’s easy and cost-effective.- Dundee City Council
Great degree of flexibility and scalability.
- Halkbank
Quickly and cost-effectively deploy innovative services.
- Banca Carige
Maintenance and support effort reduced by at least 65%.
- Algar Telecom
Operates even when resources are at 100% utilization.
- Bank of Tokyo-Mitsubishi UFJ
Differentiates in level of service and quality of service.
- L3C LLP
A full room of
servers
One footprint with the
size of a refrigeratorversus
> +
Unmatched Linux capacity
©2015 IBM Corporation 31 May 201514
Overview IBM Mainframe – z Systems
MongoDB porting to Linux on z Systems
MongoDB performance measurements and results
MongoDB in a Docker environment on z Systems
Future directions
Agenda
©2015 IBM Corporation 31 May 201515
IBM has ported MongoDB to z and POWER
– The ports boil down to byte order-related fixes
– No advanced architecture skills required (Linux is Linux is Linux)
What is the best way forward to enable MongoDB for z and POWER (and likely other big-
endian platforms)?
Expanding MongoDB's platform coverage
x86 ARM SPARC POWER IBM z
Availability Yes LE only No LE only No
©2015 IBM Corporation 31 May 201516
Community patched MongoDB 1.8 to build and run on SPARC
– Keep the in-memory BSON data in little-endian format to avoid complications with wire
protocol and memory-mapped files
– Use templates and operator overloading to avoid explicit byte swapping and code bloat
IBM migrated the patch to MongoDB 2.6
– v8 3.14 had been ported to both POWER and z
– Cannot use the bundled version of v8 (build with --use-system-v8)
Porting of MongoDB 3.0 is complete
On-going porting effort
©2015 IBM Corporation 31 May 201517
SERVER-14852 introduced AAE-safe read/write primitives
– Idea very similar to the original port
– New code should be written to be endianness-agnostic using these primitives
– Causes code bloat in mmap_v1 storage engine, compared to existing patch
New WiredTiger storage engine needs some big-endian patches, and a write barrier
implementation for s390x (few lines of code in wiredtiger/src/include/gcc.h)
gperftools: now required for the thread-caching memory allocator (tcmalloc)
– Changes needed in gperftools-2.2/src/base/linux_syscall_support.h
– Build with --allocator=system for now
Current status: 100% success on scons smokeCppUnittests and scons smoke
Differences in the MongoDB 3.0 port
©2015 IBM Corporation 31 May 201518
Reading little-endian data
ll = *reinterpret_cast<const unsigned long long *>( value() ); // before
ll = ConstDataView( value() )::readLE<unsigned long long>(); // after
Writing little-endian data
reinterpret_cast<int *>( buf )[0] = documentSize(); // before
DataView( buf ).writeLE((int) documentSize()); // after
Other APIs exist
– Big-endian and native-endian data types
– Data cursors for converting multiple data items from longer document buffers
Replace reinterpret casts with DataView APIs
©2015 IBM Corporation 31 May 201519
Declare fields with the “little” template to force little-endian operations on such fields
Template fitted with overloaded operators to avoid changing all uses
– Much less impact on code than using the DataView APIs
Example:
Templatize types of mmap_v1 data fields
class BtreeData_V0 {DiskLoc parent;DiskLoc nextChild;unsigned short _wasSize;unsigned short _reserved1;int flags;int emptySize;int topSize;int n;
class BtreeData_V0 {DiskLoc parent;DiskLoc nextChild;little<unsigned short> _wasSize;little<unsigned short> _reserved1;little<int> flags;little<int> emptySize;little<int> topSize;little<int> n;
©2015 IBM Corporation 31 May 201520
Example: swap the order of two fields so that a comparison of the struct can be performed
with a single 64-bit comparison, in both little-endian and big-endian
class OpTime {#if MONGO_CONFIG_BYTE_ORDER == 4321
unsigned secs;unsigned i;
#elseunsigned i;unsigned secs;
#endif}
Try to minimize use of this macro for better code quality and reliability
Guard explicit changes with macro
©2015 IBM Corporation 31 May 201521
0.4% files patched (1 new file added), 0.14% code modified
– Excluding gperftools at the moment
Light impact on code base (3.0)
©2015 IBM Corporation 31 May 201522
MongoDB resources for z customers
Questions Answers
Where is the code? • Open-source port: https://github.com/ibm-linux-on-z/mongo
• How-To: https://github.com/ibm-linux-on-z/docs/wiki/Building-MongoDB
• No official binaries yet; technology preview available on request
I need help! • If you find bugs with the z port, let us know!
• For general help using MongoDB, try the community:
• http://www.mongodb.org/about/community/
• https://plus.google.com/communities/115421122548465808444
• https://groups.google.com/d/forum/mongodb-user
• Reference manual at http://docs.mongodb.org/manual/
What are our plans with
MongoDB on z?
• Port version 3.0+ and merge IBM port back to master branch
• Propose partnership with MongoDB to support z customers
• Containerize MongoDB for cloud environments on z
Who do I contact? • Dale Hoffman ([email protected]) & Stefan Wirag
([email protected]) for use cases and customer questions
• Bryan Chan ([email protected]) for technical assistance
©2015 IBM Corporation 31 May 201523
Overview IBM Mainframe – z Systems
MongoDB porting to Linux on z Systems
MongoDB performance measurements and results
MongoDB in a Docker environment on z Systems
Future directions
Agenda
©2015 IBM Corporation24 31 May 2015
Experimental environment
20 CPU cores
YCSB benchmark (client emulator)
1 to 8 CPU cores
MongoDBdaemon
MongoDB 3.0 w/o sharding
10 CPU cores
YCSB benchmark (client emulator)
10 CPU cores1 to 8 CPU cores
…
MongoDB routing service
MongoDB daemonsone core per daemon
MongoDB 2.6 w/ sharding
©2015 IBM Corporation25 31 May 2015
Memory: 64GB
IBM z13– Bare-metal LPAR, 28 cores provisioned– DS8800 High Performance Storage
Intel Haswell E5-2697 v3– Lenovo server– 14 cores per socket, 28 cores total– WD1002FAEX SATA3.0 HDD
Intel Haswell E5-2699 v3– HP ProLiant server– 18 cores per socket, 36 cores total– HP P440 RAID controller
Hardware configuration
z13
Intel Haswell E5-2697v3 on Big Horn Peak Chassis Intel Haswell E-2699 v3
on HP ProLiant DL380
Gen9 Server
©2015 IBM Corporation 31 May 201526
Benchmark
– YCSB (client emulator): v0.1.4
– Workloads
Write-heavy (50% reads, 50% writes)
Read-mostly (95% reads, 5% writes)
Read-only (100% reads)
– Parameters
Operation Count=10000000
Record Count=100000
mongodb.writeConcern=normal (2.6)
WiredTiger cacheSize=32GB (3.0)
target=1000000
MongoDB
– v2.6.6
– v3.0.0
System Software
– Linux Distribution: RHEL 7.1 (Maipo)
– Linux Kernel: 3.10.0-229
– Java for YCSB: IBM SDK Java 7.1.3.0
– File System: xfs
Software configuration
©2015 IBM Corporation 31 May 201527
YCSB driving workload through mongos to 8 mongod instances (i.e. shards)
Write-related issues with 2.6 observed
– z13 provided better out-of-the-box performance in write-related workloads
– MongoDB 3.0 expected to fix these issues
MongoDB 2.6 performance on z13
0
20000
40000
60000
80000
100000
120000
140000
160000
write-heavy read-mostly read-only
Thro
ughput
(tra
nsactions/s
ec)
E5-2697 v3 HT
z13 noSMT
©2015 IBM Corporation 31 May 201528
MongoDB 2.6 did not scale well on Haswell with write-heavy workloads
MongoDB 2.6 on z13 scaled up to 8 cores but was expected to run into scaling issues beyond this
MongoDB 3.0 expected to fix the scaling limitations
MongoDB 2.6 scaling
0
20000
40000
60000
80000
100000
120000
140000
1 2 4 8
Thro
ughput
(tra
nsactions/s
ec)
Number of cores assigned to MongoDB daemons
z13 noSMT (read-mostly)
z13 noSMT (write-heavy)
E5-2697 v3 HT (read-mostly)
E5-2697 v3 HT (write-heavy)
©2015 IBM Corporation 31 May 201529
Used WiredTiger storage engine
Eliminated sharding to simplify setup, and to stress-test scalability of mongod
– Multiple YCSB threads drive workload directly to a single mongod instance
– Various number of cores were assigned to mongod
Results are very preliminary; the 3.0 port for z is fresh
Two different Haswell servers were used (E5-2697 v3 as well as E5-2699 v3)
Initial runs on z13 did not have SMT enabled (default)
– Ran on Haswell with HyperThreading both enabled and disabled, for comparison
– Future runs in next few weeks will enable SMT on z13
MongoDB 3.0 experiments
©2015 IBM Corporation 31 May 201530
With write-heavy workloads, MongoDB scales better on z13 than on Haswell– NUMA impact is relatively small on a z13 LPAR spanning multiple nodes
z13 provides 1.3x to 2.3x advantage over Haswell– SMT when enabled expected to improve this
MongoDB 3.0 scalability
0
20000
40000
60000
80000
100000
120000
140000
160000
1 2 4 6 8
Thro
ughput
(tra
nsactions/s
ec)
Number of cores assigned to MongoDB daemon
YCSB A (write-heavy)
E5-2697 v3 noHT
E5-2697 v3 HT
E5-2699 v3 noHT
E5-2699 v3 HT
z13 noSMT
©2015 IBM Corporation 31 May 201531
With read-mostly workloads, MongoDB similarly scales better on z13 than on Haswell
z13 provides 1.2x to 1.7x advantage over Haswell
– SMT when enabled expected to improve this
MongoDB 3.0 scalability
0
50000
100000
150000
200000
250000
300000
350000
1 2 4 6 8
Thro
ughput
(tra
nsactions/s
ec)
Number of cores assigned to MongoDB daemon
YCSB B (read-mostly)
E5-2697 v3 noHT
E5-2697 v3 HT
E5-2699 v3 noHT
E5-2699 v3 HT
z13 noSMT
©2015 IBM Corporation 31 May 201532
Overview IBM Mainframe – z Systems
MongoDB porting to Linux on z Systems
MongoDB performance measurements and results
MongoDB in a Docker environment on z Systems
Future directions
Agenda
©2015 IBM Corporation 31 May 201533
What is Docker?
– Docker is a platform for developers and sysadmins to
develop, ship, and run applications
– Docker lets you quickly assemble applications from
components and eliminates the friction that can come
when shipping code
– Docker lets you get your code tested and deployed into
production as fast as possible
Why Docker?
– Faster delivery of your applications
– Deploy and scale more easily
– Get higher density and run more workloads
– Faster deployment makes for easier management
Docker
Source: https://docs.docker.com/
Server
Host OS
Docker Engine
Bins/Libs Bins/Libs
App A
App A
‘
App B
App B
‘
App A
‘
App B
‘
App B
‘
App B
‘Container
Container are isolated, but
share OS and, where
appropriate, bins/libraries
©2015 IBM Corporation 31 May 201534
Mongo
instance
create
MongoDB w/ sharding as Docker container on z Systems
mongod
shardn
mongod
shard1
db1 dbn cf1
Supervisord config file
DockerfileDocker
Containermongos mongos
mongod
config1
©2015 IBM Corporation 31 May 201535
Overview IBM Mainframe – z Systems
MongoDB porting to Linux on z Systems
MongoDB performance measurements and results
MongoDB in a Docker environment on z Systems
Future directions
Agenda
©2015 IBM Corporation 31 May 201536
Update the MongoDB 3.0 port by end of June
– Complete the rewrite of the port to use the new primitives, fix unit tests
– Port WiredTiger (complete), gperftools, etc. to z Systems
– Port client drivers: C++ (complete), Node.js, Java, Python, etc.
– Catch up with master branch and contribute code
Performance work
– Re-run the performance benchmark with SMT enabled on z13 for MongoDB 3.0
– Benchmarking of more sophisticated MongoDB configurations
– Benchmarking of more complex workloads incorporating MongoDB
If you have use cases that you want us to look at, then please contact us
Future directions
Appendix
©2015 IBM Corporation 31 May 201538
Projections for SMT-2 on z13
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
200000
1 2 4 6 8
Thro
ughput
(tra
nsactions/s
ec)
Number of cores assigned to mongod
YCSB A (write-heavy)
E5-2697 v3 noHT
E5-2697 v3 HT
E5-2699 v3 noHT
E5-2699 v3 HT
z13 noSMT
z13 SMT2 (proj.)
©2015 IBM Corporation 31 May 201539
Projections for SMT-2 on z13
0
50000
100000
150000
200000
250000
300000
350000
400000
1 2 4 6 8
Thro
ughput
(tra
nsactions/s
ec)
Number of cores assigned to mongod
YCSB B (read-mostly)
E5-2697 v3 noHT
E5-2697 v3 HT
E5-2699 v3 noHT
E5-2699 v3 HT
z13 noSMT
z13 SMT2 (proj.)
©2015 IBM Corporation 31 May 201549
Enabling Open Source Docker for z customersItem Content
Docker binaries for technology preview
http://www.ibm.com/developerworks/linux/linux390/docker.html
• Open Source Docker for RHEL and SLES
• Compute checksum & compare against the checksum listed in Download section.
• “HowTo” Document for first steps: http://containerz.blogspot.com/
Private Registry Creation
• Easily build your own base image for your for containers • No Ubuntu? Easily create a RHEL 7.x base image using mkimage-yum.sh
https://github.com/docker/docker/blob/master/contrib/mkimage-yum.sh• Easily create a 'test private registry' docker image to run on Linux on z Systems
• Use base image you created & follow the process from https://github.com/docker/docker-registry/blob/master/ADVANCED.md
• Registered users can push-pull images to repository• Docker is Docker is Docker … on Linux on Z too!
z Images
IBM managed images uploaded to "ibmcom" and default namespaces in Docker hubPost-fix image names with "_s390x" until multi-arch support is available.Establishing closer partnership with Docker – discussions in progressCan accommodate in the Docker hub public registry with push/pull capability• IBM z images• Public 3rd party z images
ContactsDale Hoffman ([email protected]) for Docker use cases and customer inputUtz Bacher ([email protected]) for binary & “HowTo“ critiqueDoug Davis ([email protected]) for questions on Docker hub images