Download - OEL - RHEL - Performance Tuning
-
8/21/2019 OEL - RHEL - Performance Tuning
1/93
Red Hat Subject Matter Experts
Red Hat Enterprise Linux 6
Performance Tuning Guide
Opt imizing subsystem throughput in Red Hat Enterprise Linux 6
Edit ion 4.0
-
8/21/2019 OEL - RHEL - Performance Tuning
2/93
-
8/21/2019 OEL - RHEL - Performance Tuning
3/93
Red Hat Enterprise Linux 6 Performance Tuning Guide
Opt imizing subsystem throughput in Red Hat Enterprise Linux 6
Edit ion 4.0
Red Hat Subject Matter Experts
Edited by
Don Domingo
Laura Bailey
-
8/21/2019 OEL - RHEL - Performance Tuning
4/93
Legal Notice
Copyright 2011 Red Hat, Inc. and others.
This document is licensed by Red Hat under the Creative Commo ns Attribution-ShareAlike 3.0Unported License. If you dis tribute this do cument, or a modified versio n of it, you must provideattribution to Red Hat, Inc. and provide a link to the original. If the document is modified, all RedHat trademarks must be removed.
Red Hat, as the licenso r of this document, waives the right to enforce, and agrees no t to assert,Section 4d o f CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the InfinityLogo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and o thercountries.
Linux is the registered trademark o f Linus Torvalds in the United States and o ther countries.
Java is a regis tered trademark o f Oracle and/or its affiliates.
XFS is a trademark of Silicon Graphics International Co rp. or its subsidiaries in the UnitedStates and/or o ther countries.
MySQL is a registered trademark o f MySQL AB in the United States, the European Unio n andother countries.
Node.js is an o fficial trademark of Joyent. Red Hat Software Collections is not formallyrelated to o r endorsed by the official Joyent Node.js open so urce o r commercial project.
The OpenStack Word Mark and OpenStack Logo are either registered trademarks/servicemarks or trademarks/service marks of the OpenStack Foundation, in the United States and o ther
countries and are used with the OpenStack Foundation's permiss ion. We are not affiliated with,endorsed or sponso red by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.
Abstract
The Performance Tuning Guide describes ho w to o ptimize the performance of a systemrunning Red Hat Enterprise Linux 6 . It also documents performance-related upgrades in RedHat Enterprise Linux 6 . While this guide contains procedures that are field- tested and proven,
Red Hat recommends that you properly test all planned configurations in a testing environmentbefore applying it to a production environment. You sho uld also back up all your data and pre-tuning configurations.
http://creativecommons.org/licenses/by-sa/3.0/ -
8/21/2019 OEL - RHEL - Performance Tuning
5/93
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table of Contents
Preface
1. Do cumentConventions
1.1. T ypographic Conventions
1.2. Pull-quo te Conventions
1.3. No tes and Warning s
2. Getting Help andGiving Feedb ack
2.1. Do You Need Help?
2.2. We Need Feedback!
Chapt er 1. O verview
1.1. Ho wto read this bo ok
1.1.1. Audience
1.2. Release overview
1.2.1. Summary of features
1.2.1.1. New features i n Red H at Enterpri se Linux 6
1.2.1.1.1. New in 6 .5
1.2.2. Horizontal Scalability1.2.2.1. Parallel Co mputing
1.2.3. Distrib uted Systems
1.2.3.1. Communication
1.2.3.2. Storage
1.2.3.3. Converged Networks
Chapt er 2. Red Hat Ent erpriseLinux 6 Performance Features
2.1. 6 4-Bit Suppo rt
2.2. Tic ket Spi nloc ks
2.3. Dynamic Lis t Structure
2.4. Tic kless Kernel
2.5. Co ntrol Gro ups
2.6. Storag e and File System Impro vements
Chapt er 3. Monitoring and Analyzing System Performance
3.1. The proc File System
3.2. GNOME and KDE System Monito rs
3.3. Built-in Command-line Monitori ng Too ls
3.4. irqb alance
3.5. Tuned and ktune
3.6. App lication Profilers3.6.1. SystemTap
3.6.2. OProfi le
3.6.3. Valgrind
3.6.4. Perf
3.7. Red Hat Enterpris e MRG
Chapter 4. CPU
Topo logy
Threads
Interrupts
4.1. CPU Topo logy4.1.1. CPU and NUMA Top olo gy
4.1.2. Tuning CPU Performance
4.1.2.1. Setting CPU Affinity with taskset
4
4
4
5
6
6
6
7
8
8
8
9
9
9
9
1011
11
12
13
14
16
16
16
17
17
18
19
21
21
21
22
23
24
2525
26
26
27
27
29
29
29
29
3030
31
33
T able of Cont ents
1
-
8/21/2019 OEL - RHEL - Performance Tuning
6/93
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . .
4.1.3. Hard ware p erformance po licy (x86_energy_perf_po licy)
4.1.4. turbostat
4.1.5. numastat
4.1.6. N UMA Affinity Manag ement Daemon (numad)
4.1.6 .1. Benefits o f numad
4.1.6.2. Mod es o f op eratio n
4.1.6.2.1. Using numad as a service
4.1.6.2.2. Using numad as an executable
4.2. CPU Sched uling
4.2.1. Realtime scheduling po licies
4.2.2. Normal scheduling p ol icies
4.2.3. Po lic y Selectio n
4.3. Interrup ts and IRQ Tuning
4.4. Enhancements to N UMA in Red Hat Enterp rise Linux 6
4.4.1. Bare-metal and Scalab ility Op timizations
4.4.1.1. Enhancements in topology-awareness
4.4.1.2. Enhancements in Multi-processor Synchronization
4.4.2. Virtualization Optimizations
Chapter 5. Memory
5.1. Huge Translation Lookaside Buffer (HugeTLB)
5.2. Huge Pages and Transparent Huge Pages
5.3. Using Valgrind to Profile Memory Usage
5.3.1. Profili ng Memo ry Usag e with Memcheck
5.3.2. Profiling Cache Usage with Cachegrind
5.3.3. Pro filing Heap and Stack Space with Mass if
5.4. Capacity Tuning
5.5. Tuning Vir tual Memory
Chapter 6. Input/Out put
6.1. Features
6 .2. Analysis
6.3. Too ls
6.4. Configuration
6 .4.1. Comp letely Fair Q ueuing (CFQ)
6.4.2. Deadline I/O Scheduler
6.4.3. Noo p
Chapt er 7. File Systems
7.1. Tuning Considerations for File Systems
7.1.1. Formatting Op tions
7.1.2. Mount Op tions
7.1.3. File sys tem maintenance
7.1.4. App lication Consi derations
7.2. Profiles for file system performance
7.3. File Systems
7.3.1. The Ext4 File System
7.3.2. The XFS File System
7.3.2.1. Basic tuning for XFS
7.3.2.2. Ad vanced tuning fo r XFS
7.3.2.2.1. Optimizing for a large number of files
7.3.2.2.2. Optimizing for a large number o f files in a sing le directo ry
7.3.2.2.3. Op timising for co ncurrency
35
35
36
38
39
39
39
40
40
41
41
42
42
43
43
43
44
44
46
46
46
47
47
48
49
50
53
56
56
56
58
6 2
6 2
6 4
6 5
67
6 7
6 7
6 7
6 9
6 9
6 9
70
70
71
71
71
72
72
72
Red Hat Ent erprise Linux 6 Performance Tuning G uide
2
-
8/21/2019 OEL - RHEL - Performance Tuning
7/93
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
7.3.2.2.5. Op timising for sustained metadata mod ifications
7.4. Clustering
7.4.1. Glo b al File System 2
Chapt er 8. Net working
8.1. Network Performance Enhancements
Receive Packet Steering (RPS)
Receive Flow St eering
get sockopt support for T CP thin- streams
T ransparent Proxy (TProxy) suppo rt
8.2. Op timized Network Settings
Socket receive buffer size
8 .3. Overview o f Packet Receptio n
CPU/cache aff inity8.4. Resolving Co mmon Q ueuing/Frame Loss Issues
8 .4.1. NIC Hard ware Buffer
8 .4.2. Socket Queue
8.5. Multicast Consi derations
8.6. Receive-Side Scaling (RSS)
8 .7. Receive Packet Steering (RPS)
8 .8. Receive Flow Steering (RFS)
8.9 . Accelerated RFS
Revision Hist ory
73
74
74
77
77
77
77
78
78
78
80
8 0
818 1
8 1
8 2
8 3
8 3
8 4
8 5
8 6
87
T able of Cont ents
3
-
8/21/2019 OEL - RHEL - Performance Tuning
8/93
Preface
1. Document Conventions
This manual uses several con ventions to h ighlight certain words an d phrases and draw attention to
specific pieces of information.
1.1. T ypographic Conventions
Four typographic conventions are used to ca ll attention to specific words and phrases. These
conventions, and the circumstances they apply to, are as follows.
Mono-spaced Bold
Used to hig hlight system input, includ ing shell command s, file names and paths. Also used to
high light keys and key combinations. For example:
To see the contents of the file my_next_bestselling_novelin you r currentworking d irectory, enter th e cat my_next_bestselling_novelcommand at the
shell prompt and press Enterto execute the command.
The above includes a file name, a shell command and a key, all presented in mono-spaced bold and
all distingu ishab le thanks to context.
Key combina tions can be distinguished from an in dividua l key by the plus sign that connects each
part of a key combination . For example:
Press Enterto execute the command.
Press Ctrl+Alt+F2to switch to a virtual termina l.
The first example high ligh ts a particula r key to press. The second example highligh ts a key
combination : a set of three keys pressed simultaneously.
If sou rce code is discussed, class na mes, methods, functions, variab le names and returned va lues
mentioned within a paragraph will be presented as above, in mono-spaced bold. For example:
File-related classes include filesystemfor file systems, filefor files, and dirfordirectories. Each class has its own asso ciated set of permissions.
Proportional Bold
This denotes words or phrases encountered o n a system, including a pplication names; dialog -box
text; labeled buttons; check-box and rad io-bu tton labels; menu titles and submenu titles. For
example:
Choose System Preferences Mousefrom the main menu bar to laun ch
Mouse Preferences. In the Buttonstab , select the Left-handed mousecheck
box and click Closeto switch the primary mouse button from the left to the right(making the mouse suitable for use in the left hand).
To insert a special character into a gedit file, choose Applications
Accessories Character Map from the main menu bar. Next, choose Search Findfrom the Character Map menu bar, type the name of the character in the
Searchfield an d click Next. The character you sought will be highlighted in the
Red Hat Ent erprise Linux 6 Performance Tuning G uide
4
-
8/21/2019 OEL - RHEL - Performance Tuning
9/93
Character Table. Double-click this highlighted character to place it in the Text
to copyfield and then click the Copybutton. Now switch back to your document andchoose Edit Pastefrom the gedit menu b ar.
The above text inclu des app lication names; system-wide menu names and items; app lication -specific
menu n ames; and buttons an d text found within a GUI interface, all presented in proportional bo ld
and all distinguisha ble by context.
Mono-spaced Bold Italicor Proportional Bold Italic
Whether mono-spaced bold or prop ortional bold, the addition of italics indicates replaceable or
variable text. Italics d enotes text you do not input literally o r displayed text tha t chang es dependin g
on circumstance. For example:
To con nect to a remote machine using ssh, type ssh [email protected] a
shell prompt. If the remote machine is example.comand your username on that
machine is john, type ssh [email protected].
The mount -o remount file-systemcommand remounts the named file system.
For example, to remount the /homefile system, the command is mount -o remount/home.
To see the version o f a currently installed packag e, use the rpm -qpackage
command. It will return a result as follo ws:package-version-release.
Note the words in b old italics a bove: username, domain.name, file-system, package, version and
release. Each word is a p laceho lder, either for text you enter when issuing a command or for text
disp layed by the system.
Aside from standard usage for presenting the title of a work, italics denotes the first use of a new and
important term. For example:
Publican is a DocBookpublishing system.
1.2. Pull-quote Conventions
Terminal ou tput and source code listings are set off visually from the surroun ding text.
Output sent to a termina l is set in mono-spaced romanand presented thus:
books Desktop documentation drafts mss photos stuff svnbooks_tests Desktop1 downloads images notes scripts svgs
Source-code listings are also set in mono-spaced romanbut add syntax highlig hting as follows:
staticintkvm_vm_ioctl_deassign_device(structkvm *kvm, structkvm_assigned_pci_dev *assigned_dev){ intr = 0; structkvm_assigned_dev_kernel *match;
mutex_lock(&kvm->lock);
match = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
assigned_dev->assigned_dev_id); if(!match) { printk(KERN_INFO "%s: device hasn't been assigned before, " "so cannot be deassigned\n", __func__); r = -EINVAL;
Preface
5
-
8/21/2019 OEL - RHEL - Performance Tuning
10/93
gotoout; }
kvm_deassign_device(kvm, match);
kvm_free_assigned_device(kvm, match);
out: mutex_unlock(&kvm->lock);
returnr;}
1.3. Notes and Warnings
Fina lly, we use three visua l styles to draw attention to information that might otherwise be overlooked.
Note
Notes are tips, shortcuts or alternative approaches to the task a t hand . Ignoring a note shouldhave no nega tive consequences, but you might miss out on a trick that makes your life easier.
Important
Important boxes detail things that are easily missed: configuration changes that only apply to
the current session, or services that need restarting before an update will app ly. Ignoring a
box labeled Important will no t cause data loss but may cau se irritation an d frustration.
Warning
Warnings shou ld no t be ignored. Ignoring warnings will most likely cause data loss.
2. Get t ing Help and Giving Feedback
2.1. Do You Need Help?
If you experience difficulty with a procedure described in this do cumentation, visit the Red Hat
Customer Portal at http://access.redhat.com. Through the customer portal, you can:
search o r browse through a knowledgebase of technical support articles abou t Red Hat products.
submit a suppo rt case to Red Hat Globa l Support Services (GSS).
access other product documentation.
Red Hat also hosts a large number of electronic mailin g lists for discussio n o f Red Hat software and
techno logy. You can find a list of publicly availab le mailing lists athttps://www.redhat.com/mailman/listinfo . Click on the name of any mailin g list to subscribe to that list
or to access the list archives.
Red Hat Ent erprise Linux 6 Performance Tuning G uide
6
https://www.redhat.com/mailman/listinfohttp://access.redhat.com/ -
8/21/2019 OEL - RHEL - Performance Tuning
11/93
2.2. We Need Feedback!
If you find a typographica l error in this manual, or if you have though t of a way to make this manua l
better, we would love to hear from you! Please submit a report in Bugzilla : http://bugzilla.redhat.com/
aga inst the product Red Hat Enterprise Linux 6.
When submitting a bug report, be sure to mention the manua l's identifier: doc-
Performance_Tuning_Guide
If you have a suggestion for improving the documentation , try to be as specific as possib le when
describing it. If you have found an error, please includ e the section number and some of the
surrounding text so we can find it easily.
Preface
7
http://bugzilla.redhat.com/ -
8/21/2019 OEL - RHEL - Performance Tuning
12/93
Chapter 1. Overview
The Performance Tuning Guideis a comprehensive reference on the configu ration and o ptimization of
Red Hat Enterprise Linux. While this release also con tains information on Red Hat Enterprise Linux 5
performance capabi lities, all in structions su ppl ied herein are specific to Red Hat Enterprise Linux 6.
1.1. How to read this book
This bo ok is d ivided in to chap ters discussing specific subsystems in Red Hat Enterprise Linux. The
Performance Tuning Guidefocuses on three major themes per subsystem:
Features
Each subsystem chapter describes performance features un ique to (or implemented
differently in) Red Hat Enterprise Linux 6. These chapters also discu ss Red Hat Enterprise
Linux 6 u pdates that significantly improved the performance of specific sub systems over
Red Hat Enterprise Linux 5.
Analysis
The book also enumerates performance indica tors for each specific subsystem. Typica l
values for these ind icators a re described in the context of specific services, helping you
understand their significance in real-world, production systems.
In addition, the Performance Tuning Guidealso sh ows d ifferent ways o f retrieving
performance da ta (that is, profiling) for a subsystem. Note that some of the profiling too ls
showcased here are documented elsewhere with more detail.
Configuration
Perhaps the most important information in this book are instructions on how to adjust the
performance o f a specific sub system in Red Hat Enterprise Linux 6. The Performance Tuning
Guideexplain s how to fine-tune a Red Hat Enterprise Linux 6 subsystem for sp ecific
services.
Keep in mind tha t tweaking a specific sub system's performance may a ffect the performance of
another, sometimes adversely. The default configuration of Red Hat Enterprise Linux 6 is optimal for
mostservices running under moderateloads.
The procedures enumerated in the Performance Tuning Guidewere tested extensively by Red Hat
engineers in bo th lab and field. However, Red Hat recommends that you p roperly test all p lanned
configura tions in a secure testing envi ronment before applyin g it to your production servers. Youshould also back up all da ta and co nfiguration information before you start tuning your system.
1.1.1. Audience
This book is sui table for two types of readers:
System/Bus iness Analyst
This b ook enumerates and explains Red Hat Enterprise Linux 6 performance features a t a
high level, providing enough information on how subsystems perform for specific
workloads (both b y defau lt and when o ptimized). The level of d etai l used in describ ing RedHat Enterprise Linux 6 p erformance features helps p otential customers and sales engineers
understand the suitabil ity of this p latform in provid ing resource-intensive services at an
acceptable level.
Red Hat Ent erprise Linux 6 Performance Tuning G uide
8
-
8/21/2019 OEL - RHEL - Performance Tuning
13/93
The Performance Tuning Guidealso provides links to more detailed documentation on each
feature whenever possib le. At that detail level, readers can understand these performance
features enough to form a high-level strategy in d eploying an d op timizing Red Hat
Enterprise Linux 6. This a llows readers to both develop andevaluate infrastructure
proposals.
This feature-focused level of documentation is su itable for readers with a high-level
understanding o f Linux subsystems and enterprise-level networks.
System Administrat or
The procedures enumerated in this bo ok are suitab le for system administrators with RHCE
skill level (or its equ ivalent, tha t is, 3-5 years experience in deploying and managing
Linux). The Performance Tuning Guideaims to provide as much detail as possible about the
effects of each configuration ; this means describin g any performance trade-offs that may
occur.
The underlying skill in performance tuning lies not in knowing how to ana lyze and tune a
subsystem. Rather, a system administrator adept at performance tun ing knows how to
ba lance and optimize a Red Hat Enterprise Linux 6 system for a specific purpose. This meansalsoknowing which trade-offs and performance penalties are acceptable when attempting
to implement a con figura tion design ed to bo ost a specific subsystem's performance.
1.2. Release overview
1.2.1. Summary of features
1.2.1 .1. New features in Red Hat Ent erprise Linux 6
Read this section for a brief overview of the performance-related ch anges inclu ded in Red Hat
Enterprise Linux 6 .
1.2.1.1.1. New in 6 .5
Updates to the kernel remove a bottleneck in memory manag ement and improve performance by
allowing I/O load to spread across multiple memory poo ls when the irq-tab le size is 1 GB or
greater.
The cpupowerutils packa ge has been upda ted to include the turbostat and
x86_energy_perf_policytools. These too ls are newly documented in Section 4.1.4,
turbostat and Section 4.1.3, Hardware performance po licy (x86_energy_perf_policy) .
CIFS now suppo rts larger rsize options and a synchrono us readpages, allowing for significant
increases in through put.
GFS2 now provides an Orlov block a llocator to increase the speed a t which block a llocation
takes place.
The virtual-hostsprofile in tuned has been adjusted. The value for
kernel.sched_migration_costis now 5000000nanoseconds (5 milliseconds) instead o f
the kernel default 500000nanoseconds (0.5 milliseconds). This reduces contention at the runqueue lock for large virtualization h osts.
The latency-performanceprofile in tuned has been adjusted. The value for power
management qua lity of service, cpu_dma_latencyrequirement, is now 1instead of 0.
[1]
Chapt er 1. O verview
9
-
8/21/2019 OEL - RHEL - Performance Tuning
14/93
Several op timization s are now included in the kernel copy_from_user()and
copy_to_user()functions, improv ing the performance of both.
The perf too l has received a number of updates and enhancements, inclu ding:
Perf can now use hardware counters provided by the System z CPU-measurement counter
facility. There are four sets of hardware coun ters ava ilab le: the basic counter set, the problem-
state counter set, the crypto-activity counter set, and the extended counter set.
A new command, perf trace, is now available. This enables strace-like behavior usingperf in frastructure to a llow add itiona l targets to be traced. For further information, refer to
Section 3 .6.4, Perf .
A script browser has been added to enable users to view all availab le scripts for the current
perf data file.
Several ad ditional sample scripts are now ava ilable.
Ext3 has been upd ated to reduce lock contention, thereby improving the performance of multi-
threaded write operations.
KSM has been updated to be aware of NUMA topo logy, allowing it to take NUMA locality in to
account while coalescing pa ges. This prevents performance drops related to pages being moved
to a remote node. Red Hat recommends avoid ing cross-node memory merging when KSM is in
use.
This update introduces a new tunable, /sys/kernel/mm/ksm/merge_nodes, to con trol this
behavior. The default valu e (1) merges pages across different NUMA nodes. Setmerge_nodesto
0to merge pages only on the same node.
hdparmhas been updated with several new flags, including --fallocate, --offset, and -R
(Write-Read-Verify enablement). Additionally, the --trim-sector-rangesand --trim-sector-ranges-stdinoptions replace the --trim-sectorsop tion, allowing more than asing le sector range to b e specified. Refer to the man page for further information about these
options.
1.2.2. Horizontal Scalability
Red Hat's efforts in improv ing the performance of Red Hat Enterprise Linux 6 focus on scalability.
Performance-boosting features are evaluated primarily based on how they affect the platform's
performance in different areas of the workload spectrum that is, from the lonely web server to the
server farm mainframe.
Focusing on scalabili ty allows Red Hat Enterprise Linux to maintain its versatility for different types of
workloads and purposes. At the same time, this means that as your business grows and your
workload sca les up, re-co nfiguring your server environment is less prohib itive (in terms of cost and
man-hours) and more intuitive.
Red Hat makes improvements to Red Hat Enterprise Linux for both horizontal scalabilityand vertical
scalability; however, horizon tal scalabili ty is the more generally app licab le use case. The idea behind
horizontal scalability is to use multiple standard computersto distribute heavy workloads in order to
improve performance and reliability.
In a typical server farm, these standard computers come in the form of 1U rack-mounted servers andblade servers. Each standard computer may be as small as a simple two-socket system, although
some server farms use large systems with more sockets. Some enterprise-grade networks mix large
and small systems; in such cases, the large systems are high performance servers (for example,
Red Hat Ent erprise Linux 6 Performance Tuning G uide
10
-
8/21/2019 OEL - RHEL - Performance Tuning
15/93
database servers) and the small ones are dedicated app lication servers (for example, web or mail
servers).
This type of sca lab ility simplifies the growth of your IT infrastructure: a medium-sized business with
an a ppro priate load migh t only need two pizza box servers to suit all their needs. As the business
hires more people, expands its operations, increases its sales volumes and so forth, its IT
requirements increase in both volume and complexity. Horizontal sca lab ility allows IT to simply
deploy additional machines with (mostly) identical configurations as their predecessors.
To summarize, horizontal scalability adds a layer of abstraction that simplifies system hardware
administration. By developing the Red Hat Enterprise Linux platform to scale horizon tally, increasing
the capacity and performance of IT services can be as simple as adding new, easily con figured
machines.
1.2.2 .1. Parallel Comput ing
Users benefit from Red Hat Enterprise Linux's horizontal sca lab ility not just because it simplifies
system hardware administration; but also because horizontal scalability is a suitable development
philosophy given the current trends in hardware advancement.
Consider this: most complex enterprise applica tions have thousa nds of tasks that must be performed
simultaneously, with different coordination methods between tasks. While early computers ha d a
single-core processor to juggle all these tasks, virtually all processors available today have multiple
cores. Effectively, modern computers put multiple cores in a sing le socket, making even sing le-socket
desktops or laptops multi-processor systems.
As of 20 10, stand ard Intel an d AMD processors were available with two to sixteen cores. Such
processors are prevalent in pizza box or blade servers, which can now con tain as many as 40 cores.
These low-cost, high -performance systems bring la rge system capab ilities and ch aracteristics into
the mainstream.
To achieve the best performance and utiliza tion o f a system, each core must be kept busy. This
means that 32 separate tasks must be runn ing to take advan tage of a 32 -core blade server. If a
blade chass is con tains ten of these 32-core blades, then the entire setup can pro cess a minimum of
320 tasks simultaneously. If these tasks a re part of a single job, they must be coord ina ted.
Red Hat Enterprise Linux was developed to adapt well to hardware development trends and ensure
that bu sinesses can fully b enefit from them. Section 1.2.3, D istributed Systems explores the
technologies that enab le Red Hat Enterprise Linux's horizo ntal sca lab ility in greater detail.
1.2.3. Distributed Systems
To fully realize horizontal scala bility, Red Hat Enterprise Linux uses many components of distributed
computing. The technologies that make up d istributed computing are divided in to three layers:
Communication
Horizontal scala bility requires many tasks to be performed simultaneously (in paral lel). As
such , these tasks must have interprocess communicationto coordinate their work. Further, a
platform with horizontal scalability should be able to share tasks across multiple systems.
Storage
Storage via local disks is not sufficient in addressing the requirements of horizontalscala bility. Some form of distributed or sha red storage is needed, one with a layer of
abstraction that allows a single storage volume's capacity to grow seamlessly with the
addition of new storage hardware.
Chapt er 1. O verview
11
-
8/21/2019 OEL - RHEL - Performance Tuning
16/93
Management
The most important duty in d istributed computing is the managementlayer. This
management layer coordinates all software and hardware compon ents, efficiently
managing communication, storage, and the usage of shared resources.
The follo wing sections describe the technologies within each layer in more detail.
1.2.3.1. Co mmunicatio n
The communication layer ensures the transport of data, and is composed of two parts:
Hardware
Software
The simplest (and fastest) way for multiple systems to communicate is through shared memory. This
entails the usage of familiar memory read/write operations; shared memory has the high bandwidth,
low latency, and low overhead o f ordina ry memory read/write operations.
Ethernet
The most common way o f communicating between computers is over Ethernet. Today, Gigabit Ethernet
(GbE) is provided by defau lt on systems, and most servers include 2-4 ports of Gigab it Ethernet. GbE
provides good b andwidth and latency. This is the foundation of most distributed systems in use
today. Even when systems include faster network ha rdware, it is still common to u se GbE for a
dedicated management in terface.
10GbE
Ten Gigabit Ethernet(10GbE) is rapidly growing in acceptance for high end and even mid-range
servers. 10GbE provides ten times the bandwidth o f GbE. One of its major a dvantages is with modern
multi-core processors, where it restores the ba lance between communication and computing. You
can compare a single core system using Gb E to an eigh t core system using 10GbE. Used in this way,
10GbE is especially valuable for maintaining overall system performance and avoiding
communication bottlenecks.
Unfortunately, 10GbE is expensive. While the cost of 10GbE NICs has come down, the price of
interconnect (especially fibre op tics) remains h igh, and 10GbE network switches are extremely
expensive. We can expect these prices to decline over time, but 10GbE today is most heavily u sed in
server room backbones and performance-critical applications.
Infiniband
Infiniband offers even higher performance than 10GbE. In ad dition to TCP/IP and UDP network
connections used with Ethernet, Infiniband also supports shared memory communication. This
allows Infiniband to work between systems via remote direct memory access(RDMA).
The use of RDMA allows Infiniband to move data d irectly between systems without the overhead of
TCP/IP or socket connections. In turn, this reduces latency, which is critical to some applica tions.
Infiniban d is most commonly used in High Performance Technical Computing(HPTC) applications
which requ ire high bandwid th, low la tency and low overhead . Many supercomputing app lica tions
benefit from this, to the point that the best way to improve performance is by investing in Infiniband
rather than faster processors or more memory.
RoCE
Red Hat Ent erprise Linux 6 Performance Tuning G uide
12
-
8/21/2019 OEL - RHEL - Performance Tuning
17/93
RDMA over Converged Ethernet(RoCE) implements Infiniband-style communications (inc lud ing
RDMA) over a 10GbE infrastructure. Given the cost improvements associated with the growing
volume of 10GbE produ cts, it is reasonable to expect wider usage of RDMA and RoCE in a wide
range of systems and app lications.
Each of these communication methods is fully-su pported b y Red Hat for use with Red Hat Enterprise
Linux 6 .
1.2.3.2. St orage
An environ ment that uses distribu ted computing u ses multiple instances of shared storag e. This can
mean one of two things:
Multiple systems storing data in a single location
A storage unit (e.g. a volume) composed o f multiple storage appliances
The most familiar example of storage is the local d isk drive mounted on a system. This is approp riate
for IT operations where all appl ication s are hosted on on e host, or even a small number of hosts.
However, as the infrastructure scales to do zens or even hun dreds of systems, managing as manylocal storage disks becomes difficult and complicated.
Distributed storage adds a layer to ease and automate storage hardware administration as the
business scales. Having multiple systems share a handful of storage instan ces reduces the number
of devices the administrator needs to manage.
Consolidating the storage capabilities of multiple storage appliances into one volume helps both
users and administrators. This type of distributed storage provides a layer of abstraction to storage
poo ls: users see a single unit of storage, which an administrator can easily grow by adding more
hardware. Some techno log ies that enable distributed storage also provide added benefits, such as
failover and multipathing.
NFS
Network File System(NFS) allows multiple servers or users to mount and use the same instance of
remote storage via TCP or UDP. NFS is commonly used to ho ld data sha red by multiple applica tions.
It is also convenient for bu lk storage of large amounts of data.
SAN
Storage Area Networks(SANs) use either Fibre Channel or iSCSI protoco l to p rovide remote access to
storage. Fibre Channel infrastructure (such as Fibre Channel host bus adapters, switches, and
storage arrays) combines high performance, high bandwidth, and massive storage. SANs separatestorage from processing, providing considerable flexibility in system design.
The other majo r advantage of SANs is tha t they prov ide a mana gement environment for performing
major storage hardware administrative tasks. These tasks in clud e:
Controlling access to storage
Managing large amoun ts of data
Provisioning systems
Backing up and replicating data
Taking snapshots
Supporting system failover
Chapt er 1. O verview
13
-
8/21/2019 OEL - RHEL - Performance Tuning
18/93
Ensuring data integrity
Migrating data
GFS2
The Red Hat Global File System 2(GFS2) file system provides several specialized capa bili ties. The
basic function of GFS2 is to provid e a single file system, includ ing concurrent read/write access,
shared across multiple members of a cluster. This means that each member of the cluster sees exactlythe same data "on d isk" in the GFS2 filesystem.
GFS2 allows all systems to have concurrent access to the "disk". To maintain data integrity, GFS2
uses a Distributed Lock Manager(DLM), which on ly allows one system to write to a specific location a t
a time.
GFS2 is especially well-suited for failover applications that require high availability in storage.
For further information about GFS2, refer to the Global File System 2. For further information about
storage in general, refer to the Storage Administration Guide. Both are available from
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .
1.2.3.3. Converged Net works
Communication over the network is normally done through Ethernet, with storage traffic us ing a
dedicated Fib re Channel SAN environment. It is common to ha ve a dedica ted network or serial link
for system manag ement, and perhaps even heartbeat . As a result, a sing le server is typically on
multiple networks.
Providing multiple connections on each server is expensive, bulky, and complex to manage. This
gave rise to the need for a way to consolid ate all conn ections into on e. Fibre Channel over Ethernet
(FCoE) and Internet SCSI(iSCSI) address this n eed.
FCoE
With FCoE, standard fibre chann el commands and data pa ckets are transported over a 10GbE
physical infrastructure via a single converged network adapter(CNA). Standard TCP/IP ethernet traffic
and fibre channel storage operations can be transported via the same link. FCoE uses one physical
network interface card (an d one cable) for multiple logica l network/storage connections.
FCoE offers the following a dvantages:
Reduced number of conn ections
FCoE reduces the number of network connections to a server by ha lf. You can still choose
to have multiple connections for performance or availabili ty; however, a single connection
provides both storage and network connectivity. This is especially helpful for pizza box
servers and b lade servers, since they both h ave very limited sp ace for components.
Lower cos t
Reduced number of connections immedia tely means reduced number of cab les, switches,
and o ther networking equipment. Ethernet's h istory also features great economies of scale;
the cost of networks drops d ramatically a s the number of devices in the market goes from
millions to b illions, as was seen in the decline in the price of 100Mb Ethernet and gigab it
Ethernet devices.
[2]
Red Hat Ent erprise Linux 6 Performance Tuning G uide
14
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ -
8/21/2019 OEL - RHEL - Performance Tuning
19/93
Similarly, 10GbE will a lso become cheaper as more bus inesses adapt to its use. Also, as
CNA hardware is integrated into a sing le chip, widespread use will also increase its volume
in the market, which will result in a sign ificant price drop over time.
iSCSI
Internet SCSI(iSCSI) is ano ther type of con verged network pro tocol; it is an alternative to FCoE. Like
fibre channel, iSCSI provides blo ck-level storage over a network. However, iSCSI does no t provide a
complete management environment. The main advan tage of iSCSI over FCoE is that iSCSI provides
much of the capab ility and flexibil ity of fibre channel, but at a lower cost.
[1] Red Hat Certified Engineer. For more information, refer to
http://www.redhat.com/training/certifications/rhce/.
[2] Heartbeatis the exchange o f messag es b etween systems to ensure that each system is s till
functioning. If a system "lo ses heartbeat" it is assumed to have failed and is shut do wn, with another
system taking o ver fo r it.
Chapt er 1. O verview
15
http://www.redhat.com/training/certifications/rhce/ -
8/21/2019 OEL - RHEL - Performance Tuning
20/93
Chapter 2. Red Hat Enterprise Linux 6 Performance Features
2.1. 64-Bit Support
Red Hat Enterprise Linux 6 supports 64-bit processors; these processors can theoretically use up to
16 exabytesof memory. As of general availability (GA), Red Hat Enterprise Linux 6.0 is tested and
certified to support up to 8 TB of physical memory.
The size of memory supported by Red Hat Enterprise Linux 6 is expected to grow over several minor
updates, as Red Hat continues to in troduce an d improve more features that enab le the use of larger
memory blocks. Fo r cu rrent details, see https://access.redhat.com/site/articles/rhel-limits . Example
improvements (as of Red Hat Enterprise Linux 6.0 GA) are:
Huge pages and transpa rent huge pages
Non-Uniform Memory Access improvements
These improvements are outlined in g reater detail in the sections that follow.
Huge pages and transparent huge pages
The implementation of huge pagesin Red Hat Enterprise Linux 6 allows the system to manage memory
use efficiently across d ifferent memory worklo ads. Huge pages dynamically u tilize 2 MB pages
compared to the standard 4 KB page size, allowing app lications to scale well while processing
gigabytes an d even terabytes o f memory.
Huge pag es are difficul t to manu ally create, manage, and use. To address this, Red Hat Enterprise 6
also features the use of transparent huge pages(THP). THP automatically manages many of the
complexities invo lved in the use of huge pages.
For more information on huge pages and THP, refer to Section 5.2, Huge Pages and Transparent
Huge Pages .
NUMA improvement s
Many new systems no w support Non-Uniform Memory Access(NUMA). NUMA simplifies the desig n a nd
creation of ha rdware for large systems; however, it also a dds a layer of complexity to app lication
develop ment. For example, NUMA implements b oth local and remote memory, where remote memory
can take several times longer to access than loca l memory. This feature has performance
implications for operating systems and a pplications, an d should be configured carefully.
Red Hat Enterprise Linux 6 is better op timized for NUMA use, thanks to several additional featuresthat help mana ge users and applica tions o n NUMA systems. These features include CPU affinity,
CPU pinn ing (cpusets), numactl and control group s, which allow a p rocess (affinity) or applica tion
(pinning ) to "b ind" to a specific CPU or set of CPUs.
For more information about NUMA support in Red Hat Enterprise Linux 6, refer to Section 4.1.1, CPU
and NUMA Topology .
2.2. Ticket Spinlocks
A key part of any system design is ensuring that one process does not alter memory used by anotherprocess. Uncontrolled data change in memory can result in data corruption and system crashes. To
prevent this, the operating system allows a process to lock a piece of memory, perform an operation ,
then un lock or " free" the memory.
Red Hat Ent erprise Linux 6 Performance Tuning G uide
16
https://access.redhat.com/site/articles/rhel-limits -
8/21/2019 OEL - RHEL - Performance Tuning
21/93
One common implementation of memory locking is through spin locks, which allow a process to keep
checking to see if a lock is availa ble and take the lock as soon a s it becomes available. If there are
multiple processes competing for the same lock, the first one to request the lock after it has been freed
gets it. When all processes have the same access to memory, this approach is " fair" a nd works qu ite
well.
Unfortunately, on a NUMA system, not all processes ha ve equal a ccess to the locks. Processes on
the same NUMA node as the lock have an unfair advan tage in obtaining the lock. Processes on
remote NUMA nodes experience lock starva tion and degraded performance.
To address this, Red Hat Enterprise Linux implemented ticket spinlocks. This feature adds a
reservation queue mechanism to the lock, al lowing allprocesses to take a lock in the order that they
requested it. This eliminates timing problems and unfair ad van tages in lock requests.
While a ticket spinlock has slightly more overhead than an ordinary spinlock, it scales better and
provides better performance on NUMA systems.
2.3. Dynamic List Structure
The operating system requires a set of information on each processor in the system. In Red Hat
Enterprise Linux 5, this set of information was a lloca ted to a fixed-size array in memory. Information
on each individu al processor was obtained by indexing into this array. This method was fast, easy,
and straightforward for systems that contained relatively few processors.
However, as the number of processors for a system grows, this method p rodu ces signi ficant
overhead. Because the fixed-size array in memory is a sing le, shared resource, it can become a
bottleneck as more processors attempt to access it at the same time.
To address this, Red Hat Enterprise Linux 6 uses a dynamic list structurefor processor information.
This al lows the array used for processor information to be allocated dyna mically: if there are only
eight processors in the system, then only eight entries are created in the list. If there are 2048processors, then 2048 entries a re created as well.
A dynamic list structure allows more fine-grain ed lockin g. For example, if information needs to b e
updated a t the same time for processors 6, 72, 183, 657, 931 and 1546, this can be done with greater
parallelism. Situations like this obviously occur much more frequently on large, high-performance
systems than on small systems.
2.4. Tickless Kernel
In previous versions of Red Hat Enterprise Linux, the kernel used a timer-based mechanism thatcontinuously p rodu ced a system interrupt. Du ring each interrupt, the systempolled; that is, it checked
to see if there was work to be done.
Dependin g on the setting, this system interrupt or timer tickcou ld occur several hu ndred or several
thousand times per second. This ha ppened every second, regardless of the system's worklo ad. On a
ligh tly load ed system, this impactspower consumptionby preventing the processor from effectively
using sleep states. The system uses the least power when it is in a sleep state.
The most power-efficient way for a system to op erate is to do work as quickly as possib le, go into the
deepest sleep state possible, and sleep a s long as p ossib le. To implement this, Red Hat Enterprise
Linux 6 uses a tickless kernel. With this, the interrupt timer has been removed from the idle loop,
transforming Red Hat Enterprise Linux 6 into a completely in terrupt-driven environment.
The tickless kernel allows the system to go into deep sleep states du ring idle times, and respon d
quickly when there is work to be don e.
Chapt er 2. Red Hat Ent erprise Linux 6 Performance Feat ures
17
-
8/21/2019 OEL - RHEL - Performance Tuning
22/93
For further information, refer to the Power Management Guide, available from
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .
2.5. Control Groups
Red Hat Enterprise Linux p rovides many useful options for performance tuning . Large systems,
scaling to hundreds of processors, can b e tuned to deliver superb performance. But tuning these
systems requ ires considerable expertise and a well-defined workload. When large systems were
expensive and few in number, it was acceptable to g ive them special treatment. Now that these
systems are mainstream, more effective tools are needed.
To further complica te things, more powerful systems are being u sed now for service consolid ation .
Workloads that may h ave been run ning on four to eight older servers are now placed in to a sin gle
server. And as d iscussed earlier in Section 1.2.2.1, Para llel Computing , many mid-range systems
nowadays contain more cores than yesterday's high-performance machines.
Many modern ap plications are designed for parallel processing, using multiple threads or p rocesses
to improve performance. However, few applications can make effective use of more than eight
threads. Thus, multiple applica tions typica lly need to be installed on a 32 -CPU system to maximizecapacity.
Consider the situation : small, inexpensive mainstream systems are no w at parity with the performance
of yesterday's expensive, high-performance mach ines. Cheaper high-performance mach ines gave
system architects the ability to conso lida te more services to fewer machin es.
However, some resources (such as I/O and network communication s) are shared, and do not grow
as fast as CPU count. As such , a system housing multiple app lications can experience degraded
overall p erformance when one ap plication h ogs too much o f a sing le resource.
To ad dress this, Red Hat Enterprise Linux 6 n ow supports control groups(cgroups). Cgroups allow
administrators to allocate resources to specific tasks as needed. This means, for example, being a bleto alloca te 80% of four CPUs, 60GB of memory, and 40% of disk I/O to a database application. A
web a pp lica tion running o n the same system could be given two CPUs, 2GB of memory, and 50% of
availab le network band width.
As a result, both da tabase and web applica tions deliver goo d performance, as the system prevents
both from excessively consuming system resources. In addi tion, many aspects of cgroup s are self-
tuning, allowing the system to respond acco rdingly to changes in workload.
A cgroup has two major components:
A list of tasks assigned to the cgroup
Resources alloca ted to those tasks
Tasks assigned to the cgroup run withinthe cgroup. Any child tasks they spawn a lso run within the
cgroup. This a llows an a dministrator to manage an entire application a s a single unit. An
administrator can also configure allocations for the following resources:
CPUsets
Memory
I/O
Network (bandwidth)
Within CPUsets, cgroups a llow administrators to configure the number of CPUs, affinity for specific
Red Hat Ent erprise Linux 6 Performance Tuning G uide
18
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ -
8/21/2019 OEL - RHEL - Performance Tuning
23/93
CPUs or nodes , and the amoun t of CPU time used by a set of tasks. Using cgroups to configure
CPUsets is vital for ensuring good overall performance, preventing an application from consuming
excessive resources at the cost of other tasks while simultaneously ensuring tha t the app lication is
not starved for CPU time.
I/O bandwidth an d network band width are managed by other resource controllers. Aga in, the
resource controllers allow you to determine how much bandwidth the tasks in a cgroup can
consume, and ensure that the tasks in a cgroup n either consu me excessive resources nor are
starved of resources.
Cgroup s allo w the administrator to define and a llocate, at a high level, the system resources that
various applications need (and will) consume. The system then automatically manages and
balances the various a pplica tions, delivering g ood predictable performance and optimizing the
performance of the overall system.
For more information on how to use control g roups, refer to the Resource Management Guide,
available from http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .
2.6. Storage and File Syst em Improvements
Red Hat Enterprise Linux 6 also features several improvements to storage and file system
management. Two of the most no table advances in this version are ext4 and XFS sup port. For more
comprehensive coverage of performance improvements relating to storage and file systems, refer to
Chapter 7, File Systems.
Ext4
Ext4 is the default file system for Red Hat Enterprise Linux 6. It is the fourth generation version of the
EXT file system family, supporting a theoretical maximum file system size of 1 exabyte, and single file
maximum size of 16TB. Red Hat Enterprise Linux 6 supports a maximum file system size of 16TB, anda single file maximum size of 16TB. Other than a much larger storage capacity, ext4 also inclu des
several n ew features, such as:
Extent-based metadata
Delayed a llocation
Journal check-summing
For more in formation abou t the ext4 file system, refer to Section 7.3.1, The Ext4 File System.
XFS
XFS is a robu st and mature 64-bit journalin g file system that supports very large files and file systems
on a sing le host. This file system was o riginally developed by SGI, and has a long history of running
on extremely large servers and storage arrays. XFS features include:
Delayed a llocation
Dyna mically-allocated inodes
B-tree indexing for scala bili ty of free space management
Online defragmentation and file system growing
Sophisticated metadata read-ahead algorithms
[3]
Chapt er 2. Red Hat Ent erprise Linux 6 Performance Feat ures
19
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ -
8/21/2019 OEL - RHEL - Performance Tuning
24/93
While XFS scales to exabytes, the maximum XFS file system size supported by Red Hat is 100TB. For
more information about XFS, refer to Section 7.3.2, The XFS File System .
Large Bo ot Drives
Tradition al BIOS supports a maximum disk size of 2.2TB. Red Hat Enterprise Linux 6 systems using
BIOS can suppo rt disks larger than 2.2TB by using a new disk structure called Global Partition Table
(GPT). GPT can on ly be used for da ta disks; it cannot be used for boot drives with BIOS; therefore,
boot drives can o nly b e a maximum of 2.2TB in size. The BIOS was originally created for the IBM PC;
while BIOS has evolved considerably to adapt to modern hardware, Unified Extensible Firmware
Interface(UEFI) is designed to support new and emerging hardware.
Red Hat Enterprise Linux 6 also supports UEFI, which can be used to replace BIOS (still sup ported).
Systems with UEFI runnin g Red Hat Enterprise Linux 6 a llow the use of GPT and 2 .2TB (and la rger)
partitions for both boot partition and d ata partition.
Important UEFI for 32-bit x86 systems
Red Hat Enterprise Linux 6 does not support UEFI for 32-bi t x86 systems.
Important UEFI for AMD64 and Intel 64
Note that the boot configu rations of UEFI and BIOS differ sign ificantly from each other.
Therefore, the installed system must boot using the same firmware that was used du ring
installation . You cann ot install the operating system on a system that uses BIOS and then
boot this instal lation on a system that uses UEFI.
Red Hat Enterprise Linux 6 supports version 2.2 o f the UEFI specification. Hardware that
supports version 2.3 of the UEFI specification o r later shou ld bo ot and operate with Red Hat
Enterprise Linux 6, but the add itiona l functionality defined by these later specifications will not
be ava ilab le. The UEFI specifications a re available from http://www.uefi.org/specs/agreement/.
[3] A node is generally defined as a set of CPUs o r co res within a soc ket.
Red Hat Ent erprise Linux 6 Performance Tuning G uide
20
http://www.uefi.org/specs/agreement/ -
8/21/2019 OEL - RHEL - Performance Tuning
25/93
Chapter 3. Monitoring and Analyzing System Performance
This chapter briefly introduces tools that can be used to monitor and ana lyze system and app lication
performance, and poin ts out the situation s in which each tool is most useful. The data collected by
each too l can reveal bo ttlenecks or other system problems that con tribute to less-than-optimal
performance.
3.1. The proc File Syst em
The proc" file system" is a directory that contains a hierarchy o f files that represent the current stateof the Linux kernel. It allows applica tions and u sers to see the kernel's v iew of the system.
The procdirectory also contains information abou t the hardware of the system, and an y currently
runn ing processes. Most of these files are read-only, but some files (primarily those in /proc/sys)can be manipulated b y users and app lications to communicate configuration changes to the kernel.
For further information about viewing and editing files in the procdi rectory, refer to the Deployment
Guide, availa ble from http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .
3.2. GNOME and KDE System Monitors
The GNOME and KDE desktop environments both have graphica l tools to ass ist you in monitoring
and modifying the behavior of your system.
GNO ME System Monito r
The GNO ME System Monito rdisp lays basic system information and allows you to monitor system
processes, and resource or file system usage. Open it with the gnome-system-monitorcommandin the Terminal, or click on the Applicationsmenu, and select System Too ls > System
Monitor.
GNO ME System Monito rhas four tabs:
System
Displays basic in formation a bou t the computer's hardware and software.
Processes
Shows active processes, and the relation ships between those processes, as well asdetailed information abou t each process. It also lets you filter the processes disp layed, and
perform certain actions on those processes (start, stop, kill, cha nge priority, etc.).
Resources
Displays the current CPU time usage, memory and swap space usag e, and network usage.
File Systems
Lists all moun ted file systems alongside some basic information about each , such as the
file system type, mount poin t, and memory usa ge.
For further information about the GNO ME System Moni to r, refer to the Help menu in the
app lication, or to the Deployment Guide, availab le from
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .
Chapt er 3. Monitoring and Analyzing Syst em Performance
21
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ -
8/21/2019 OEL - RHEL - Performance Tuning
26/93
KDE System Gu ard
The KDE System Gu ard allows you to monitor current system load and processes that are running .
It also lets you perform actions on processes. Open it with the ksysguardcommand in theTerminal, or click on the Kickoff Application Launcherand select Applications> System>
System Monit or .
There are two tabs to KDE System Gu ard :
Process Tabl e
Displays a list of all running processes, alphabetically by default. You can also sort
processes by a nu mber of other properties, inclu ding total CPU usage, physica l or sha red
memory usag e, owner, and p riority. You can also filter the visible results, search for specific
processes, or perform certain actions on a process.
System Load
Displays historical g raphs of CPU usage, memory and swap space usage, and n etwork
usage. Hover over the graphs for detailed ana lysis and graph keys.
For further information about the KDE System Gu ard , refer to the Help menu in the app lication.
3.3. Built-in Command-line Monitoring Tools
In addition to graphical monitoring tools, Red Hat Enterprise Linux provides several tools that can be
used to moni tor a system from the command lin e. The advantage of these tools is tha t they can be
used ou tside run level 5. This section discu sses each tool briefly, and suggests the purposes to
which each tool is best suited.
top
The top tool p rovides a dyna mic, real-time view of the processes in a runn ing system. It can disp lay
a variety of information, includ ing a system summary and the tasks currently being manag ed by the
Linux kernel. It also has a limited ab ility to manipu late processes. Both its operation an d the
information it displays are highly configurable, and an y configuration details can be made to persist
across restarts.
By defau lt, the processes shown are ordered by the percentage of CPU usage, giving an easy view
into the processes that are consuming the most resources.
For detailed information a bout using top , refer to its man page: man top.
ps
The pstool takes a snapshot of a select group of active processes. By default this group is limited to
processes owned by the current user and a ssocia ted with the same termina l.
It can provide more detailed in formation abo ut processes than top , but is not dynamic.
For detailed information a bout using ps, refer to i ts man page: man ps.
vmstat
vmstat (Virtual Memory Statistics) ou tputs instan taneous reports about your system's p rocesses,
memory, paging, b lock I/O, interrupts an d CPU activity.
Red Hat Ent erprise Linux 6 Performance Tuning G uide
22
-
8/21/2019 OEL - RHEL - Performance Tuning
27/93
Although it is not dyn amic like top , you can specify a sampling interval, which lets you observe
system activity in near-real time.
For detailed information a bout using vmstat , refer to its man page: man vmstat.
sar
sar(System Activity Reporter) collects and reports information about today's system activity so far.
The defau lt output covers today's CPU utilization a t ten minu te intervals from the beginning o f theday:
12:00:01 AM CPU %user %nice %system %iowait %steal %idle12:10:01 AM all 0.10 0.00 0.15 2.96 0.00 96.7912:20:01 AM all 0.09 0.00 0.13 3.16 0.00 96.6112:30:01 AM all 0.09 0.00 0.14 2.11 0.00 97.66...
This too l is a useful alternative to a ttempting to create periodic reports on system activity throu gh top
or similar tools.
For detailed information a bout using sar, refer to i ts man page: man sar.
3.4. irqbalance
irqbalanceis a command lin e tool that distribu tes hardware interrupts across processors to
improve system performance. It runs as a daemon by defau lt, but can be run o nce on ly with the --
oneshotoption.
The following parameters are useful for improving performance.
--powerthresh
Sets the number of CPUs that can id le before a CPU is p laced into powersave mode. If more
CPUs than the threshold are more than 1 standard d eviation below the average softirq
workload and no CPUs are more than one standard deviation ab ove the average, and
have more than o ne irq assign ed to them, a CPU is placed into p owersave mode. In
powersave mode, a CPU is not part of irq balancing so that it is not woken unnecessarily .
--hintpolicy
Determines how irq kernel affinity hinting is handled. Valid valu es are exact(irq affinity
hint is always ap plied), subset(irq is balanced, but the assigned o bject is a subset of theaffinity hint), or ignore(irq affinity hin t is igno red completely).
--policyscript
Defines the location of a script to execute for each interrupt request, with the device path
and irq number passed as arguments, and a zero exit code expected by irqbalance. The
script defined can sp ecify zero or more key value pai rs to guide irqbalancein managing
the passed irq.
The following are recognized as valid key value pairs.
ban
Valid values are true(exclude the passed irq from balancing ) or false(performbalancing o n this irq).
Chapt er 3. Monitoring and Analyzing Syst em Performance
23
-
8/21/2019 OEL - RHEL - Performance Tuning
28/93
balance_level
Allows user override of the balance level of the passed irq. By default the bala nce
level is based on the PCI device class of the device that owns the irq. Valid va lues
are none, package, cache, or core.
numa_node
Allows user override of the NUMA node that is considered local to the passed irq .If information about the local node is not specified in ACPI, devices are
considered equ idis tant from all nodes. Valid values are integers (starting from 0)
that identify a specific NUMA node, and -1, which sp ecifies that an irq should beconsidered equidistant from all nodes.
--banirq
The interrupt with the specified interrupt request number is a dded to the list of banned
interrupts.
You can a lso use the IRQBALANCE_BANNED_CPUSenvironment variable to specify a mask o f CPUs
that are ignored b y irqbalance.
For further details, see the man pag e:
$ man irqbalance
3.5. Tuned and ktune
Tuned is a daemon tha t monitors and collects data on the usage of various system compon ents,
and uses that information to dynamically tun e system settings as required. It can react to chan ges in
CPU and network use, and adjust settings to improve performance in active devices or reduce power
consumption in inactive devices.
The accompanying ktunepa rtners with the tuned-admtool to p rovide a number of tuning profiles
that are pre-configured to enha nce performance and reduce power consumption in a nu mber of
specific use ca ses. Edit these profiles or create new profiles to create performance so lution s tailo red
to your environment.
The profiles provided as part of tuned-adminclude:
default
The defaul t power-savin g profile. This is the most basic power-savin g profile. It enables
on ly the disk and CPU plug-ins. Note that this is not the same as turnin g tuned-admoff,
where bo th tuned and ktuneare disabled.
latency-performance
A server profile for typica l latency performance tun ing. This profile disables dynamic tuning
mechanisms and transp arent hugepages. It uses the performancegoverner for p-states
through cpuspeed, and sets the I/O schedu ler to deadline. Add itiona lly, in Red Hat
Enterprise Linux 6.5 a nd later, the profile requests a cpu_dma_latencyvalue of 1. In Red
Hat Enterprise Linux 6.4 and earlier, cpu_dma_latencyrequested a va lue of 0.
throughput-performance
A server profile for typica l throughput performance tuning. This p rofile is recommended if
Red Hat Ent erprise Linux 6 Performance Tuning G uide
24
-
8/21/2019 OEL - RHEL - Performance Tuning
29/93
the system does not ha ve enterprise-class s torage. throughput-performance disa bles power
saving mechanisms and enables the deadlineI/O scheduler. The CPU governor is set to
performance. kernel.sched_min_granularity_ns(scheduler minimal preemption
gran ularity) is set to 10milliseconds, kernel.sched_wakeup_granularity_ns
(scheduler wake-up gran ula rity) is set to 15milliseconds, vm.dirty_ratio(virtualmemory dirty ratio) is set to 40%, and transp arent huge pages are enabled.
enterprise-storage
This profile is recommended for enterprise-sized server configurations with enterprise-class
storage, including battery-backed controller cache protection and management of on-disk
cache. It is the same as the throughput-performanceprofile, with one addition: file
systems are re-mounted with barrier=0.
virtual-guest
This profile is optimized for virtual machines. It is based on the enterprise-storageprofile, but also d ecreases the swappiness of virtual memory. This p rofile is availab le in
Red Hat Enterprise Linux 6 .3 and later.
virtual-host
Based on the enterprise-storagep rofile, virtual-hostdecreases the swappin ess ofvirtual memory and enables more aggressive writeback of dirty pag es. Non-roo t and non-
boot file systems are mounted with barrier=0. Add itiona lly, as of Red Hat Enterprise Linux
6.5, the kernel.sched_migration_costparameter is set to 5milliseconds. Prior to Red
Hat Enterprise Linux 6.5, kernel.sched_migration_costused the default value of 0.5milliseconds. This p rofile is availa ble in Red Hat Enterprise Linux 6.3 and later.
Refer to the Red Hat Enterprise Linu x 6 Power Management Guide, availab le from
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ , for further information abouttuned and ktune.
3.6. Applicat ion Profilers
Profiling is the process of gathering information about a program's behavior as it executes. You
profile an appl ication to determine which areas of a program can b e optimized to increase the
prog ram's overall sp eed, reduce its memory usag e, etc. App lication profiling too ls help to simplify
this process.
There are three supported profiling too ls for use with Red Hat Enterprise Linux 6 : SystemTap ,
OProfile and Valgrind . Documenting these profilin g tools is o utside the scope of this guide;
however, this section does prov ide links to further information and a brief overview of the tasks for
which each profiler is su itable.
3.6.1. SystemT ap
SystemTap is a tracing and probing tool that lets users monitor and ana lyze operating system
activities (particularly kernel activities) in fine detail. It provides information s imilar to the output of
tools like netstat, top , psand iostat , but includes additional filtering and an alysis options for the
information tha t is collected.
SystemTap p rovides a deeper, more precise analysis of system activities and a pp lication behavior to
allow you to pinpo int system and app lication bo ttlenecks.
Chapt er 3. Monitoring and Analyzing Syst em Performance
25
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ -
8/21/2019 OEL - RHEL - Performance Tuning
30/93
The Function Callgraph plug-in for Eclipse uses SystemTap a s a back-end, allowing it to thoroughly
monitor the status of a prog ram, including fun ction ca lls, returns, times, and user-space variables,
and display the information visually for easy op timization.
For further information about SystemTap, refer to the SystemTap Beginners Guide, available from
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .
3.6.2. OProfile
OProfile (oprofile ) is a system-wide performance monitoring too l. It uses the processor's d edicated
performance monitoring h ardware to retrieve information about the kernel and system executables,
such as when memory is referenced, the number of L2 cache requests, and the number of hardware
interrupts received. It can a lso be used to determine processor usage, and which ap plica tions and
services are used most.
OProfile can also be used with Eclipse via the Eclipse OProfile plug-in. This plug-in a llows users to
easily determine the most time-consuming areas o f their code, and perform all command-line
functions of OProfile with rich visua lization of the results.
However, users should be aware of several OProfile limitations:
Performance monitoring sa mples may not be precise - because the processor may execute
instructions ou t of order, a sample may be recorded from a nearby instruction , instead o f the
instruction that triggered the in terrupt.
Because OProfile is system-wide and expects processes to start and stop multiple times, samples
from multiple runs are allowed to accumulate. This means you may need to clear sa mple data
from previous runs.
It focuses on identifying prob lems with CPU-limited p rocesses, and therefore does not identify
processes that are sleeping whi le they wait on locks for other events.
For further information about using OProfile, refer to the Deployment Guide, availab le from
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ , or to the oprofile
documentation on your system, located in /usr/share/doc/oprofile-.
3.6.3. Valgrind
Valg rind p rovides a number of detection a nd p rofiling tools to help improve the performance and
correctness of your app lication s. These tools can detect memory and thread-related errors as well as
heap, stack an d array o verruns, allowing you to easily loca te and correct errors in your a pplication
code. They can also pro file the cache, the heap, and branch-prediction to identify factors that mayincrease application speed and minimize application memory use.
Valgrind ana lyzes your a pplication by running it on a synthetic CPU and instrumenting the existing
applica tion code as it is executed. It then prints "co mmentary" clearly identifying each process
invo lved in applica tion execution to a user-specified file descriptor, file, or network so cket. The level
of instrumentation varies depending on the Valg rind too l in use, and its settings, but it is important to
note that executing the instrumented code can take 4-50 times lon ger than n ormal execution .
Valgrind can be used on your application as-is, without recompiling. However, because Valgrind
uses debugging information to pinpoint issues in your code, if your ap plication an d support libraries
were no t compi led with debugging in formation ena bled, recompi ling to include this information is
highly recommended.
As of Red Hat Enterprise Linux 6.4, Valgrind integrates with gdb(GNU Project Debug ger) to improve
debugging efficiency.
Red Hat Ent erprise Linux 6 Performance Tuning G uide
26
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ -
8/21/2019 OEL - RHEL - Performance Tuning
31/93
More information about Valgrind is ava ilab le from the Developer Guide, available from
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ , or by using the man
valgrindcommand when the valgrindpackage is installed. Accompanying documentation can alsobe found in:
/usr/share/doc/valgrind-/valgrind_manual.pdf
/usr/share/doc/valgrind-/html/index.html
For in formation about how Valgrind can be used to profile system memory, refer to Section 5.3,
Using Valgrin d to Pro file Memory Usage.
3.6.4. Perf
The perf tool p rovides a number of useful performance counters that let the user assess the impact
of other commands on their system:
perf stat
This command prov ides overall statistics for common performance events, inclu dinginstructions executed and clock cycles consumed. You can use the option flags to gather
statistics on events other than the default measurement events. As of Red Hat Enterprise
Linux 6 .4, it is possible to use perf statto filter monitoring b ased on one or more
specified con trol groups (cgroups). For further information, read the man page: man perf-
stat.
perf record
This command records performance data into a file which can be later analyzed using
perf report. For further details, read the man page: man perf-record.
perf report
This command reads the performance data from a file and ana lyzes the recorded data. For
further details, read the man page: man perf-report.
perf list
This command lists the events available on a particular machine. These events will vary
based on the performance monitoring h ardware and the software configu ration o f the
system. For further information, read the man pag e: man perf-list.
perf top
This command performs a similar function to the top tool. It generates and disp lays a
performance counter pro file in realtime. For further information, read the man pag e: man
perf-top.
perf trace
This command performs a similar function to the strace too l. It monitors the system calls
used by a specified thread or process and all sign als received by that applica tion.
Add itiona l trace targets are availab le; refer to the man pa ge for a full l ist.
More information about perf is availa ble in the Red Hat Enterprise Linux Developer Guide, availab le
from http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ .
Chapt er 3. Monitoring and Analyzing Syst em Performance
27
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/http://access.redhat.com/site/documentation/Red_Hat_Enterprise_Linux/ -
8/21/2019 OEL - RHEL - Performance Tuning
32/93
. .
Red Hat Enterprise MRG's Realtime component includes Tuna, a too l that allows users to both
ad just the tunable valu es their system and view the results of those cha nges. While it was developed
for use with the Realtime compon ent, it can also be used to tune standard Red Hat Enterprise Linux
systems.
With Tuna, you can adjust or disab le unnecessary system activity, includin g:
BIOS parameters related to power management, error detection, and system management
interrupts;
network settings, such as interrupt coalescing , and the use of TCP;
journaling activi ty in journaling fi le systems;
system logging ;
whether interrupts and user p rocesses are han dled by a specific CPU o r range of CPUs;
whether swap space is used; an d
how to deal with out-of-memory exceptions.
For more detailed con ceptual in formation about tun ing Red Hat Enterprise MRG with the Tuna
interface, refer to the "General System Tuning" chapter of the Realtime Tuning Guide. For detailed
instructions ab out using the Tuna interface, refer to the Tuna User Guide. Both gu ides are available
from http://access.redhat.com/site/documentation/Red_Hat_Enterprise_MRG/.
Red Hat Ent erprise Linux 6 Performance Tuning G uide
28
http://access.redhat.com/site/documentation/Red_Hat_Enterprise_MRG/ -
8/21/2019 OEL - RHEL - Performance Tuning
33/93
Chapter 4. CPU
The term CPU, which stands for central processing unit, is a misnomer for most systems, since central
implies single, whereas most modern systems have more than o ne processing uni t, or core.
Physically, CPUs are contained in a packag e attached to a motherboard in a socket. Each socket on
the motherboard has va rious con nections: to other CPU sockets, memory co ntrollers, interrupt
controllers, and other peripheral devices. A socket to the operating system is a logica l grou ping of
CPUs and a ssociated resources. This concept is central to most of our discussions on CPU tuning .
Red Hat Enterprise Linux keeps a wealth of statistics about system CPU events; these statistics are
useful in pla nn ing o ut a tuning strategy to improve CPU performance. Section 4.1.2, Tunin g CPU
Performance discusses some of the more useful statistics, where to find them, and how to an alyze
them for performance tuning.
Topology
Older computers had relatively few CPUs per system, which allowed a n a rchitecture known as
Symmetric Multi-Processor(SMP). This meant that each CPU in the system had similar (or symmetric)access to availa ble memory. In recent years, CPU count-per-socket has g rown to the po int that trying
to give symmetric access to a ll RAM in the system has become very expensive. Most h igh CPU coun t
systems these days ha ve an arch itecture known as Non-Uniform Memory Access(NUMA) instead o f
SMP.
AMD processors h ave ha d this type of a rchitecture for some time with their Hyper Transport(HT)
interconnects, while Intel has begun implementing NUMA in their Quick Path Interconnect(QPI)
designs. NUMA and SMP are tuned differently, since you need to account for the topologyof the
system when a llocating resources for an app lication.
Threads
Inside the Linux operating system, the unit of execution is kn own as a thread. Threads ha ve a register
context, a stack, and a segment of executab le code which they run on a CPU. It is the job o f the
operating system (OS) to sch edule these threads on the availa ble CPUs.
The OS maximizes CPU utilization by load-balancing the threads across available cores. Since the
OS is primarily co ncerned with keeping CPUs busy, it does not make optimal decisions with respect
to app lication performance. Moving an app lication thread to a CPU on ano ther socket can worsen
performance more than simply waiting for the current CPU to become available, since memory access
operations can slow drastically across so ckets. For high -performance app lications, it is usua lly
better for the designer to determine where threads are placed. Section 4.2, CPU Scheduling
discu sses how to best alloca te CPUs and memory to best execute applica tion threads.
Interrupts
One of the less obvious (bu t nonetheless importan t) system events that can impact appl ication
performance is the interrupt(also known as IRQs in Linux). These events are handled by the
operating system, and are used by peripherals to signal the arrival o f data or the completion of an
operation , such as a network write or a timer event.
The manner in which the OS or CPU that is executing application code han dles an interrupt does notaffect the app lication 's functiona lity. However, it can impact the performance of the app lication. This
chapter also discusses tips on preventing interrupts from adversely impacting application
performance.
Chapt er 4. CPU
29
-
8/21/2019 OEL - RHEL - Performance Tuning
34/93
4.1. CPU Topology
4 .1.1. CPU and NUMA T opo logy
The first computer processors were uniprocessors , meaning that the system had a sin gle CPU. The
illusion of executing processes in parallel was done by the operating system rapidly switching the
sing le CPU from one thread o f execution (process) to another. In the quest for increasing system
performance, designers noted tha t increasing the clock rate to execute instructions faster on lyworked up to a p oin t (usually the limitations on creating a stable clock waveform with the current
technology). In an effort to get more overall system performance, designers ad ded another CPU to the
system, allowing two para llel streams of execution. This trend o f addin g processors has continued
over time.
Most early multiprocessor systems were designed so that each CPU had the same logica l pa th to
each memory location (usually a p aral lel bus). This let each CPU access any memory location in the
same amount of time as any other CPU in the system. This type of architecture is known a s a
Symmetric Multi-Processor (SMP) system. SMP is fine for a small number of CPUs, but once the CPU
count gets above a certain po int (8 or 16), the number of para llel traces requ ired to allow equa l
access to memory uses too much of the availa ble board real estate, leaving less room forperipherals.
Two new concepts combined to allow for a hig her number of CPUs in a system:
1. Serial buses
2. NUMA topologies
A serial bus is a sing le-wire communication pa th with a very high clock ra te, which transfers data as
packetized bursts. Hardware designers began to use serial bu ses as high -speed in terconnects
between CPUs, and between CPUs and memory controllers and other peripherals. This means that
instead o f requ iring between 32 and 6 4 traces on the boa rd from eachCPU to the memory subsystem,there was now onetrace, substantially reducing the amount of space required on the board.
At the same time, hardware designers were packin g more transistors into the same space by reducin g
die sizes. Instead o f putting ind ividual CPUs directly onto the main b oard, they started packing them
into a processor package as multi-core processors. Then, instead of trying to provide equal access
to memory from each processor package, design ers resorted to a Non-Uniform Memory Access
(NUMA) strategy, where each package/socket combination has one or more dedicated memory area
for high speed access. Each socket also has an in terconn ect to other sockets for slower access to
the other sockets' memory.
As a simple NUMA example, suppose we have a two-socket motherboard, where each socket has
been popula ted with a quad-core package. This means the total number of CPUs in the system is
eight; four in each socket. Each socket also ha s an a ttached memory ba nk with four gigabytes of
RAM,