system tuning perf final

8/3/2019 System Tuning Perf Final

1/84

Delivering Performance on Sun: SystemTuning

Technical White Paper

Greg Schmitz and A llan Esclamado


2/84

Please

Recycle

1999 Sun Microsystems, Inc.

901 San Antonio Road, Palo Alto, California 94303 U.S.A

RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to restrictions as set forth insubparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013 and FAR 52.227-

19.

The product described in this manual may be protected by one or more U.S. patents, foreign patents, or pending applications.

TRADEMARKS

Sun, Sun Microsystems, the Sun logo, Solaris, NFS, Ultra, UltraComputing, Ultra Enterprise, XGL, XIL Creator, Creator3D,

SunVTS, and OpenWindows are trademarks or registered trademarks of Sun Microsystems, Inc.in the United States and other

countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International,

Inc. in the United States and other countries. Products bearing SPARC trademarks are based upon an architecture developed by

Sun Microsystems, Inc.

THIS PUBLICATION IS PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,

INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A

PARTICULAR PURPOSE, OR NON-INFRINGEMENT.

THIS PUBLICATION COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES

ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN; THESE CHANGES WILL BE INCORPORATED IN

NEW EDITIONS OF THE PUBLICATION. SUN MICROSYSTEMS, INC. MAY MAKE IMPROVEMENTS AND/OR

CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED IN THIS PUBLICATION AT ANY TIME.


3/84

i

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2. Tuning the Solaris Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Reconfiguring the Solaris Kernel. . . . . . . . . . . . . . . . . . . . . . . . . 4

3. Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Memory Paging and Swapping . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Processes and Paging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Managing Swap SpaceThe swap Command . . . . . . . . . . . . . 13

Process Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Add ress Space Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4. Tuning Disk Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Disk Performance Specifications . . . . . . . . . . . . . . . . . . . . . . . . . 17

Disk Interface Stand ards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

RAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Disk Activity and Performanceiostat . . . . . . . . . . . . . . . . . . . 32

Managing Disk Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33


4/84

ii Delivering Performance on Sun: System Tuning April 1999

Volume Managemen t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5. File System Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Unix File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Tmpfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

VERITAS File System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Tun ing File Systemstunefs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6. Networking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Selecting the Right Network Technology . . . . . . . . . . . . . . . . . . 44

Ensuring Data Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Evaluating System Performance . . . . . . . . . . . . . . . . . . . . . . . . . 51

Redu cing N etwork Congestion . . . . . . . . . . . . . . . . . . . . . . . . . . 51

7. Summary of Guidelines for System Performance Evaluation 61

A. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

References

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77


5/84

1

Introduction

1

This document focuses on techniques for tuning the Sun computing

environment for maximum performance. It is aimed at system administrators

and sophisticated end users wh o d o their own system ad ministration. Each

chapter concentrates on a different subsystem of the computing environment

and the specific things that can be done to increase performance. The

subsystem topics are presented in the order recommended for tuning analysisand optimization:

Chap ter 2 Tun ing the Solaris Kernel identifies the p aram eters that can

be tuned to ensure the Solaris operating environment operates at peak

efficiency.

Chapter 3 Memory discusses the impact of paging and swapping

schemes, and identifies tools adm inistrators can u se to effectively monitor

and manage memory and swap space.

Chap ter 4 Tun ing Disk Subsystems focuses on the key p erformance

criteria, important disk technologies, and data layout schemes that ensure

high p erformance I/ O. Sun tools that can aid in th e detection, isolation, and

correction of p erformance-related issues are also d iscussed .

Chap ter 5 File System Tuning looks at the high performan ce file

systems available from Sun, as w ell as the tools and techniques that can be

employed to ensure that these file systems continue to operate at peak

efficiency.


6/84

2

Delivering Performance on Sun: System Tuning April 1999

1

Chap ter 6 focuses on the netw orking technologies available today, as well as

the issues surrounding choosing a networking technology, ensuring data

integrity, evaluating network performance, and reducing network

congestion. Tools available to aid these efforts on Sun systems are also

discussed.

Chap ter 7 highlights the m ajor fun ctions of the Solaris kernel, and discusses

the key tables and environment variables that need to be tuned in order to

ensure it operates efficiently.

Other Performance White Papers and Sun Documentation

This paper, as well as companion documents, can be found at

http://www.sun.com

on th e World Wide Web. These docum ents include:

Delivering Performance on Sun: Optimizing Applications for Solaris

helps

software developers construct high performan ce app lications with the use of

proper compiler flags. It also discusses the development of high

performance multithreaded applications.

Delivering Performance on Sun: Tools for High Performance Distributed

Computing

addresses performance issues for distributed computingplatforms and the compute-intensive applications they run.

In addition, all Sun documentation can be found at http://docs.sun.com

.


7/84

3

Tuning the Solaris Kernel

2

The Solaris operating system, called the kernel

, is loaded into memory w hen

the system boots. The kernel m anages all of the ph ysical resources of the

computing system. Applications interface with the kernel through a series of

system calls.

Key kernel functions includ e:

Implemen ting m any types of file systems (UFS, NFS, tmp fs, etc.) and

perm itting p rocesses to create, read, w rite, and a ccess these files.

Managing daemons, system processes which manage system resources.

Moving large grou ps of information, called pages

, to and from the disk into

main m emory as needed . This concept is known as virtual memory

.

Keeping track of all active p rocesses and deciding w hich one gets to ru n

next.

Managing device drivers

, the progr ams th at control phy sical devices like d isk

drive subsystems.

Managing netw ork services and IPC facilities.

Providing system management functions, such as booting and error

handling.

Providing oth er functions that m ake system resources available to processes.


8/84

4 Delivering Performance on Sun: System Tuning April 1999

2

Reconfiguring the Solaris Kernel

The standard Solaris kernel is dynamic, supporting all devices, pseudo-

devices, and op tions by au tomatically loading mod ules when they are needed.

Unlike previous versions of SunOS, there is no need to manually configure the

Solaris kernel.

The kernel needs to be reconfigured after hardware has been installed, and a

reconfiguration reboot (boot -r) performed before it can be recognized an d

used by the system. For some configuration changes it is sufficient to run the

drvconfig comm and followed by the disks or tapes command to

recognize devices recently added to the system.

The /etc/system File

The / etc/ system file houses a wide variety of system configuration and tuning

parameters. Administrators can tune many of the parameters contained in

/ etc/ system to increase system performan ce, includ ing buffer cache, file

system, STREAMS, interprocess comm un ication, memor y ma nagem ent, and

other p arameters.

bufhwm, the maximum size for buffer cache memory usage, in KB

ufs_ninode, the maximum size of the inode table

ncsize, the size of the d irectory nam e lookup cache

nstrpush, the maximum number of STREAMS pushes allowed

strmsgsz, the m aximum size for a STREAMS message a user can create

strctlsz, the maximum size of the ctl part of a message

strthresh, the maximum amount of dynamic memory the STREAMS

subsystem can consume, in bytes

sadcnt, the number ofsad devices

msginfo_msgmap, the number of entries in the message map

msginfo_msgmax , the maximum message size

msginfo_msgmnb, the maximu m bytes on queue

msginfo_msgmni, the number of message queue identifiers

msginfo_msgssz , th segment size of a message (should be a multiple of the

word size)

msginfo_msgtql, the number of system message headers

msginfo_msgseg, the nu mber of m essage segments (mu st be < 32768)

seminfo_semmap, the number of entries in the semaphore map seminfo_semmni, the number of semaphore identifiers

seminfo_semmns, the number of semaphores in the system


9/84

Tuning the Solaris Kernel 5

2

seminfo_semmnu , the number of processes using the undo facility

seminfo_semmsl, the maximu m n um ber of semaphores, per id

seminfo_semopm , the maximum num ber of operations, per semaphore call

seminfo_semume, the maximu m n um ber of undo structures per process

seminfo_semvmx , the semaph ore maximu m value

seminfo_semaem, the maximum value for adjustment on exit

shminfo_shmmax , the maximu m shared memory segment size

(set shmsys:shminfo_shmmax = 0xffffffff, or

set shmsys:shminfo_shmmax = 4292967295)

shminfo_shmmin, the minimum shared mem ory segment size

shminfo_shmmni, the num ber of shared m emory identifiers

shminfo_shmseg, the segments, per process lotsfree, if freemem drop s below lotsfree, the system starts to steal pages

from p rocesses

tune_t_fsflushr, the rate at which fsflush is run, in seconds

tune_t_minarmem, the minimum available resident (not swappable) memory

needed to avoid deadlock, in pages

tune_t_minasmem, the minimum available swap pable memory needed to

avoid deadlock, in pages

tune_t_flckrec, the maximum number of active frlocks

lwp_default_stksize, the size of the kernel stack for lwp s. This value shou ldnot be ad justed un less there is a kernel overflow. lwp _default_stksize is

expressed in bytes an d mu st be a m ultiple of PAGESIZE bytes.

npty, the total nu mber of 4.0 or 4.1 pseu do-ttys configured

pt_cnt, the total number of 5.7 pseudo-ttys configured

priority_paging, enabling the sytem to p lace a boun dary around the file

cache, ensuring that file system I/ O does not cause app lication pag ing

consistent_coloring, identifies the method used to improve hit ratios for the

CPU external cache 0 (uses va rious virtu al ad dress bits), 1 (ph ysical

add ress is set to virtu al ad dress), 2 (bin h opp ing), 6 (SBSD page coloring

scheme)

Administrators can modify these kernel parameters while the system is

runn ing via the adb comm and, or by editing the / etc/ system file and

rebooting the system.

More information / etc/ system tun able parameters can be found in the System

Administration Volume IImanu al located at http://docs.sun.com on th e World

Wide Web.


10/84


2

Kernel Tables

A variety of kernel tables can be tuned by system administrators to enhance

the p erformance of the Solaris operating en vironmen t. Table 2-1 identifies thesetables and the relevant environment variables they control, as well as

suggested settings for each.

Table 2-1 System administrators can tune key Solaris kernel tables

Many of the traditional Unix kernel limits have been removed over the years.

Solaris now dy nam ically sets limits that increase on large memory

configura tions. Trad itional tables that no longer exist or need to be tun ed

includ e the text table, open fi le table, callout table, and clist table.

More information on configu ring kernel tables can be found in Sun Performance

and Tuning, Java and the Internet, Chapter 14: Kernel Algorithms and Tuning.

maxusers Kernel Parameter

Over time, the Solaris kernel has been mod ified to enab le it to scale according

to the h ardw are capabilities of the w orkload it is run ning. By self-configu ring

and tuning, Solaris reduces adm inistrative burd ens and dynam ically ad apts to

changing bu siness needs. While earlier versions needed to be h and -tuned , each

successive version of Solaris 2 has converted han d-ad justed values into

adaptively changing limits.

One kernel parameter that remains an important factor in system performance

is the maxusers environm ent va riable. As eviden ced in Table 2-1, many

Solaris kernel parameters are defined relative maxusers, rendering it perhaps

the most critical variable administrators can tune. Indeed, maxusers

Variable Default Setting

Process Table max_procs 20 + 16 * maxusers

Per User Process Limit maxuprc max_nprocs - 5

Inode Cache ufs_ninode max_nprocs + 16 + maxuers + 64

Quota Table ndquot (maxusers * NMOUNT) / 4 + max_nprocs

Directory Name Lookup Cache ncsize max_nprocs + 16 + maxusers + 64


11/84

Tuning the Solaris Kernel 7

2

determines whether or not the Solaris kernel has sufficient room to

accomm oda te users comfortably. (It is importa nt to n ote that maxusers is not

a limit on the nu mber of simu ltaneous u sers supported by the system.)

Increasing th e maxusers par ameter increases the size of the Solaris kernel,

and consequently decreases the amount of memory available. Administrators

shou ld, therefore, follow a few simp le guidelines to ensure the kern el is tun ed

effectively for users of the system:

The maxusers variable is autom atically set by Solaris according to the

amount of physical memory available in the system. The automatic

minimum value for maxusers is 8, and the au tomatic maximum is 1024.

The maxusers par ameters is set a little less than th e amou nt of RAM in thesystem, in megabytes.

For NFS servers w ith small amou nts of mem ory (un der 512 MB), it can be

beneficial to increase the DN LC and inode cache limits.

There is no need to tu ne in most cases and only a small imp act on performance

can be expected from any kernel tuning.

Additional information on the impact of the maxusers kernel parameter can

be found in Sun Performance and Tunin g, Java and the In ternet, Chapter 14: Kernel

Algorithms and Tuning.


12/84


2


13/84

9

Memory 3

Memory Paging and Swapping

When memory becomes tight, the kernel begins to employ techniques known

as paging an d swapping. Paging occurs when the kernel puts part of an

executing program out to disk, enabling the pages it occupied to be used for

other pu rposes. When the process needs to access the page of memory tha t was

moved to disk, apage faultoccurs, forcing th e process to wait u ntil the page is

copied ba ck into m emory. This scheme fosters the efficient u se of mem ory, as

only the parts of a program that are in use at the time the program is running

need to be in memory and available for use.

Swapp ing occurs w hen th e kernel stores an entire curren tly inactive process to

disk, enabling the pages it occupied to be used by other executing programs.

To continue th e process is paged ba ck in grad ually. Swa pp ing is a par t of the

normal housekeeping performed by the Solaris kernel.

Paging and Swapping Parameters

The Solaris kernel utilizes several values stored in the / etc/ system file to

determine when to start paging and swapp ing, actions that begin and end

when the memory system crosses a threshold set as part of the kernel paging

algorithm:

lotsfree, the amount of memory required by the p age daemon. If memorydrops below this limit, the page daemon starts looking for pages to reuse.

desfree, the level at which swapping begins.


14/84


3

minfree, the minimum memory level acceptable to the system.

slowscan, the num ber of pages scanned p er second wh en lotsfree is reached .

fastscan, the num ber of pages scanned per second w hen minfree is reached .

handspreadpages, the distance between the hand that sets a reference flag

and the hand that clears it. The two hands keep track of which pages of

mem ory ha ve been accessed recently. If the p ages have n ot been accessed in

a certain amount of time, they are paged out.

maxpgio, the maximum n um ber of page I/ O operations per second the

system will schedule. maxpgio is typically set to 40, assuming there is little

swap space on mu ltiple devices. If system swap space is on fast and/ or

multiple disks, maxpgio should be increased to take advantage of the higher

performance devices.

More information on paging in the Solaris kernel can be found in Su n

Performance and Tuning, Java and the Internet, Chapter 13: RAM and Virtual

Memory. Information on the / etc/ system file can be found in the System

Administration Guide, Volume IIlocated at http://docs.sun.com on the World Wide

Web.

Processes and Paging

Und erstanding the n um ber of processes running on a system and how they are

affected by p aging is essential to overall performan ce. System a dm inistrators

should routinely monitor processes and learn how they are utilizing system

resources. To aid this effort, Sun has p rovided several tools for ad ministrators

as part of the Solaris operating env ironment: vmstat, swap , sar, ps and Find

Process. Each tool can be used in a variety of wa ys, includ ing d etermining

memory, paging, and sw apping m easurements.

Memory Measurements

The first step in evaluating processes is determining the paging and swapping

activity occurr ing on the system. Both the vmstat an d sar comm ands can be

used to mon itor paging activity. While sar is better for information logging

purposes, vmstat tends to provide more detailed information. Correlations

exist between the ou tpu t of these tools, as discussed below.


15/84

Memory 11

3

Swap space (vmstat swap, sar -r freeswap, and swap -s)

Each command describes the amount of available swap space, although in

different forms. The vmstat swap command identifies the amount ofavailable swap space, in KB wh ile the sar -r freeswap reports in 512

byte blocks. The swap -s command shows available swap space in addition

to other metrics. Administrators should note that when available swap

space is exhausted the system will be unable to use additional memory.

Free memory (vmstat free an d sar -r freemem)

Administrators can determine the amount of free memory the pages of

RAM that are ready imm ediately to be used w hen a p rocess starts execution

or needs more memory. The vmstat free comm and reports free memory

in KB, the while sar -r freemem comm and reports it in p ages.

Reclaims (vmstat re)

The nu mber of p ages reclaimed from the free memory list are called reclaims.

Pages that hav e been taken from a process but not yet reused by a nother can

be reclaimed, thereby avoiding the full page fault that would typically

result. The vmstat re command can be used to identify reclaimed

memory.

Page Fault Measurements

Identifying when page faults are occurring is an important step in evaluating

system performance.

Attaches to existing p ages (vmstat at an d sar -p atch)

The vmstat at an d sar -p atch comm ands measure the num ber of

attaches to shared m emory pages.

Pages paged in (vmstat pi an d sar -p pgin, ppgin)

The vmstat pi comm and reports the number of KB/ second, w hile the sar

command reports the number of page faults and the n um ber of pages paged

in by swap space for file system read operations. Administrators should

note th at the file system block size is 8 KB. Consequen tly, there m ay be tw o

pages, or 8 KB, paged in per page fau lt on m achines with 4 KB pag es. Also

notew orthy is the fact that UltraSPARC machines em ploy 8 KB pages;almost all other CPUs utilize 4 KB pages.


16/84


3

Minor faults (vmstat mf an d sar -p vflt)

Minor page faults are caused by add ress space or hardw are add ress

translation faults that can be resolved without performing a page-inoperation. Such operations a re fast and requ ire little CPU u tlization,

eliminating the need for a process to have to stop and wait.

Other fault types (sar -p pflt, slock, vmstat -s copy-on-write,

zero fill)

Many oth er types of page fault conditions are possible, and can be identified

using the above commands. More information on these and other fault

conditions can be found in the corresponding man pages, as well as Su n

Performance and Tuning, Java and the Internet, Chapter 13: RAM and Virtual

Memory.

Page-out Measurements

The vmstat an d sar commands p rovide information on the amoun t of data

paged out to the sw ap space or filesystem:

Pages paged out (vmstat po an d sar -g pgout, ppgout)

The vmstat po command reports the num ber of KB/ second p age out,

while sar reports the number of page-outs and the num ber of pages paged

out to swap or file system space. Administrators should note that the

clustering of swap space writes can lead to a large number of pages written

per p age-out.

Pages freed (vmstat fr an d sar -g pgfree)

The Solaris operating environm ent keeps track of the pages freed the rate

at which memory is returned to the free list. The vmstat fr command

reports the amount of memory freed in KB freed per second, while the sar

command reports the number of pages freed p er second.

Short-term memory deficit (vmstat de)

Deficit is a paging parameter that provides a measure of hysteresis for the

page scanner when there is a period of high memory d emand . If the value

returned by the v mstat de command is non-zero, then mem ory was being

consumed quickly and extra free memory will be reclaimed in anticipationof its impending use.


17/84

Memory 13

3

Page daemon scanning rate (vmstat sr an d sar -g pgscan)

The vmstat sr an d sar -g pgscan commands can be used to

determine the nu mber of pages scanned by the page d aemon as it looks forpages used infrequently. If the page daemon scanning rate stays above 200

pages p er second for long period s of time, then a m emory sh ortage is likely.

Pages freed by the inode cache

Statistics related to the inod e cache u sed by the Un ix file system (UFS) are

also indicators of system p erformance. In p articular, ufs_ipf measu res UFS

inode cache reuse, a feature that can cause pages to be freed when an

inactive inod e is freed. ufs_ipf is the nu mber of inod es with p ages freed as a

percentage of the total number of inodes freed.

Swapping M easurements

The sar an d vmstat commands also supp ly system swap ping information.

Pages swap ped in (vmstat -S si an d sar -w swpin,bswin)

The vmstat -S si comm and reports the number of KB/ second swap ped

in, while the sar -w swpin command reports the number of swap -inoperations. The sar -w bswin command reports the number of 512 byte

blocks swap ped in.

The vmstat -S so comm and reports the number of KB/ second swap ped

out, while the sar -w swpot command reports the number of 512 byte

blocks swapp ed out.

More information on analyzing paging statistics can be found in Su n

Performance Tuning, Java and the Internet, Chapter 13.

Managing Swap Space The swapCommand

The efficient operation of a system relies on the effective utilization of both

memory an d sw ap space. System ad ministrators must ensure the system h as

sufficient swap resources so the paging and swapping activities performed by

the kernel can operate smoothly. Sun provides the swap utility in the Solaris

operating environment to facilitate the addition, deletion, and monitoring of

the swap areas used by the m emory manager.


18/84


3

Adding a swap area

The -a option of the swap comm and can be used by system ad ministrators

to add a swap area to the system. A swap file must be specified, such as/ dev/ dsk/ c0t0d0s1, as well as a length for the swap area in 512 byte blocks.

A swap space mu st be at last 16 blocks in length.

Deleting a swap area

The -d option of the swap comm and can be used by system ad ministrators

to delete a swap area from the system. Once deleted, blocks will no longer

be allocated from this area. In add ition, all swap blocks previou sly in use in

this swap area will have been m oved to other swap areas.

Listing swap area status

The -l option of the swap comm and can be used by system ad ministrators

to list the status of all swap areas on th e system. Several pertinent pieces of

information are reported, including: the path name for the swap area, the

major/ minor d evice num ber if a block special device is employed , the

length of the swap area, and the available free space.

Printing summary information

The -s option of the swap comm and can be used by system ad ministrators

to summary information for all swap areas on the system. Total swap space

usage an d av ailability is provided. System ad ministrators can determ ine the

total amount of swap space allocated, the amount not currently allocated

but reserved for future use, the total amount that is either allocated or

reserved, and the amount available for future reservation and allocation.

Distributing Swap SpaceTo achieve peak system performance, system administrators must distribute

swap space so it can be u tilized effectively. Following a few simple gu idelines

can dramatically improve system performance:

Place swap partitions or swap files on as many different disks as possible.

Place swap partitions or swap files on the fastest disks in the system.

If more th an on e d isk is available per controller, limit the sw ap space to only

one of those d isks.

3


19/84

Memory 15

3

Although swapping to remote (NFS) file systems is permitted, this practice

should be avoided if at all possible. Swapping to remote file systems not

only degrades swap performance, it also increases network traffic.

Avoid p lacing sw ap fi les on file systems th at are susp ected of excess

fragmentation or are almost full. If possible, swap files should be p laced in a

dedicated disk partition to eliminate fragmentation concerns.

More information on managing swap space can be found in the swap(1M)

man page.

Process Manager

The new Process Manag er in Solaris 7 enables adm inistrators to quickly obtain

process related information in the CDE environm ent (Figure 3-1). The Process

Manager enables administrators to:

Show u ser, system or a ll processes

Sort processes by resource consum ption

Suspend , resume, or kill processes

Diagnose sick processes and contact their owning user via Address Manager

More information on the Process Manager can be found in the Solaris

operating environment administrative guides located at http://docs.sun.com on

the World Wide Web.

Obtaining Process Informationps

System ad ministrators need to obtain information abou t active processes in the

system. The ps command aids this effort by reporting the process ID, terminal

identifier, cumu lative execution time, and th e comman d n ame by d efault. Moreinformation, including processes owned by a specific user, can be obtained by

utilizing the options specified in the ps(1M) man p age.

Address Space Map

The /usr/proc/bin/pmap comm and can be used to print the add ress space

map for each running process. Furthermore, the new -x option av ailable in the

Solaris 7 operating environment prints resident/ shared/ private mappingdetails, providing more information than ever before.

3


20/84


3

More information on the pm ap command can be found in the proc(1M) man

page.


21/84

17

Tuning Disk Subsystems 4

This chapter first describes options for selecting disk subsystems which may

improv e system p erformance. Following is a d escription of the tools and

utilities for analyzing the p erformance of these disk subsystem s.

It is not u nu sual for storage capacity requirements to grow by as mu ch as 100%

per year. Coincident with this growth is an increasing need to ensure ready

access to data whenever it is needed. System administrators know that rawcapacity and performance do not stand alone both scalability and high

availability are importan t pr actical requiremen ts for ensuring econom ical and

reliable operation and information access. Indeed, disk subsystems must be

configured and tuned to match the performance, availability, and cost

requirements of users.

Focusing on disk sub systems, this chap ter describes key performan ce criteria,

introdu ces impor tant d isk technologies, and d escribes Sun tools that can aid in

the d etection, isolation, and correction of p erformance-related issues.References for more detailed tuning information are provided throughout.

Disk Performance Specifications

A number of factors are helpful in determining the relative performance of

disk subsystems to one another:

Disk capacity, the amount of data th at can be stored on the d isk or in the d isk

subsystem.

4


22/84


4

Seek time, the amount of time it takes to move the disk drives heads from

one track to another, and is a product of the performance of the actuator.

Disk drives typically spend ten times more time seeking than transferring

data, making seek time perhaps the most important indicator of a disksperformance. Some disk vendors quote the time it takes to traverse the

entire disk; others cite the time it takes to traverse on e-third of the d isk.

Rotational speed, the speed a t wh ich the disk spins. After the head h as moved

to the app ropriate track, the disk mu st rotate until the d ata is und erneath

the head. The disks rotational speed determines its latencythe amount of

time it takes to set up for a data transfer.

Data transfer rate, the amoun t of data th at can be transferred from the disk to

the host per unit time. Both theoretical maximum and typical transfer rates

are often quoted . Data transfer rate is quoted in Kilobytes/ second,

Megabytes/ second, or Gigabytes/ second.

Zone bit recording, a technique which enables data to be recorded at variable

densities throughout the disk, thereby providing higher capacities. This

method breaks the surface of the disk into different zones, each with a

different number of bytes per track, enabling the bit density to remain

constant across the disks sur face. Drives w ith this feature offer improved

transfer rate an d capacity.

Bus bandwidth, the amount of data that can be handled by the system bu s.

SCSI bus band width can be mad e faster and wider.

Controller, a device which coordinates read and write requests between the

host and the d isk drive.

Buffer size, the amou nt of cache em ployed on th e disk controller. Some

drives utilize separate read and write buffers; others use a unified cache.

Mean Tim e Between Failure (M TBF), the average time betw een failures.

Mean Time to Data Loss (MTDL), the average time a disk subsystem can

operate before another failure causes da ta to be lost.

More information on these and related topics can be found in Sun Performance

and Tuning, Java and the Internet, Chapter 8: Disks, and Configuration and Capacity

Planning for Solaris Servers, Chapter 7: Storage Subsystem Architecture.


23/84

4


24/84


4

Table 4-1 highlights the key characteristics of the various SCSI standards.

Table 4-1 The SCSI protocols provide a wide variety of performance characteristics

More information on SCSI can be found in Configuration and Capacity Planning

for Solaris Servers, Chapter 7: Storage Subsystem Architecture.

Fibre Channel

Fibre Channel is a high performance interconnect standard designed for

bidirectional, point-to-point communications between high performance

wor kstations, peripherals, and large host systems. It offers a variety of benefits

over other link-level protocols:

Efficiency and high performance

Scalability

Operation over long distances

Easy to cable

Supports popular high level protocols

Sup ports switched networks

Secure from RF snoop ing

Invulnerab le to RF interference

SCSI Fast SCSIFast and Wide

SCSIUltra SCSI

Asynchronous/Synchronous

Asynchronous or

Synchronous

Single-ended

Synchronous

Differential

Synchronous

Differential

Synchronous

Differential

Transfer Rate 5 MB/second 10 MB/second 20 MB/second 40 MB/second

Number of Pins 50 pins 50 pins 68 pins 68 pins

Width 8 bits 8 bits 16 bits 16 bits

Maximum Cable Length 6 m 3 m synchronous

25 m differential

3 m synchronous

25 m differential

3 m synchronous

12 m differential

Number of Targets 7 7 15 15

4


25/84

Tuning Disk Su bsystems 21

4

Theory of Operation

Channels are a generic class of communications technology designed to move

data from one p oint to another with h igh speed an d minimum latency. Forspeed , functions such as error correction and retry-on-busy are typically

implemented in hardware. Because they do not have sophisticated software-

based rou ting and switching capabilities, chan nels typically opera te in

environments w here every device on the channel is defined and known in

advance.

At its most basic, Fibre Chann el is just a high sp eed serial connection betw een

two buffers. Indifferent to the format of the data, Fibre Channel only p rovides

the error checking and necessary signaling to ensure the reliable movement ofdata between the source and destination buffers.

Fibre Channel employs a mechanism know n as a fabric to establish conn ections

between ports. The fabric, like the telephon e netw ork, has all the need ed

intelligence to make rou ting d ecisions. The only concern of a p ort is to ensure

its proper conn ection to the fabric. Each fabric has an ad dress space that allows

up to 16 million ad dresses. A fabric can be simp le, or a mix of switches and

hubs th at connect a complex and widely dispersed network of ports.

Reliability

There are no limits on the size of a transfer between ap plications using Fibre

Chan nel. Each sequence is broken into frames, the framesize being negotiated

by th e por ts and the fabric. A 32-bit cyclic redu nd ancy check ensures th at

transm ission errors can be reliably detected.

Fibre Channel Arbitrated LoopAn imp ortant enhancement to Fibre Channel has been the d evelopment of

Fibre Chan nel Arbitrated Loop (FC-AL), developed specifically to meet th e

need s of storage interconnects. Employing a simple loop topology, FC-AL can

supp ort both simple configurations and sophisticated arrangements of hubs,

switches, servers, and storage systems (Figure 4-1). Furth ermore, by using SCSI

protocols over the m uch faster, more robust Fibre Chan nel link, FC-AL

provides higher levels of performance without requiring expensive and

complex changes to existing dev ice d rivers.

4


26/84


4

Figure 4-1 FC-ALs loop topology can support both simple and complexconfigurations.

Key Benefits of FC-AL

FC-AL offers a host of benefits over other p rotocols, enabling storage systems

based on it to foster the creation of new app lications that take full advan tage of

some impressive sp ecifications:

Gigabit bandwidth

FC-AL is capable of data tran sfer rates of up to 100 MB/ second (du al

ported ), with 400 MB/ second envisioned for the futu re far faster than

SCSI, Serial Storage Architecture (SSA), or P1394 (Firewire).

Suitability for networks

In addition to performance, Fibre Channel is ideal for building networks.

Employing hubs and switches similar to those used in networks, Fibre

Channel enables complex arrangements of storage and systems to be

connected together in highly scalable, highly ava ilable netw orks, or fabrics.

Use of existing SCSI protocols

FC-AL allows SCSI comm and packets to be sent over a h igh speed ph ysical

medium , reducing software and firmw are costs and m inimizing impact on

existing software.

4


27/84


Node addressability far better than SCSI

With th e ability to su pp ort u p to 126 FC-AL devices on a single host

adaptor, cost and implementation complexity is greatly reduced. Using

optical fiber media, a single FC-AL loop can support nodes with a

separation of up to ten kilometers from the host.

Unique device naming convention

FC-AL employs un ique, 64-bit w orld w ide nam es for devices. Disk drives,

host adapters, and interface boards are identified by a world w ide name,

enabling improved d iagnostics and software sup port.

Greatly simplified wiring and cabling requirements

Because Fibre Channel is a largely optical, simple serial protocol, electrical

interference and expensive cabling are much less of an issue than with the

complex parallel data paths used by SCSI.

In add ition, FC-AL imp lementations support redund ant d ata paths, hot

pluggable components, multiple host connections, and dual ported drives

features that 15 year-old SCSI technology was never intended to support.

Configuration Flexibility

Fibre Channel Arbitrated Loop enables storage systems to implement several

loop configurations, each designed to address varying levels of performance

and availability. Utilizing bypass circuits and multiplexors, FC-AL based

storage d evices like the Sun StorEdge A5000 perm it the constru ction of

multiple loops within a single enclosure, improving both performance and

ava ilability (Figu re 4-2).

More information on Fibre Channel technology can be found in Su n

Performance and Tuning, Java and the Internet, Chapter 8: Disks, and Configuration

and Capacity Planning for Solaris Servers, Chapter 7: Storage Subsystem

Architecture, and in the w hite pap ers listed in th e References section of this

document.

4


28/84


Figure 4-2 FC-ALs loop topology supports both single and dual loop configurations.

RAID

Redun dan t Arrays of Inexpensive Disks, or RAID systems, were first formally

defin ed by researchers at Berkeley in 1987. The most imp ortant p art of their

idea was to combine the power of small, inexpensive drives to achieveperforma nce equal to a single large, expensive d isk. Because RAID requ ired

mu ltiple spind les, features were also need ed that could p rotect against the data

loss that could be caused by the failure of any one of the d rives. These original

safeguards subsequently became an important feature in their own right, with

man y u sers pu rchasing RAID systems just for th eir availability.

Toda y, disk arra ys are sold a s an intelligently man aged collection of disk d rives

organized to op timize performan ce for a particular set of tasks. All RAID

systems achieve higher performance and / or reliability by replicating orspanning data across multiple disks. Exactly how this is done has profound

effects on subsystem performance, cost, and reliability.

Two metrics are used in assessing the performance of disk subsystems:

Transfer Rate

Transfer rate is the speed (measured in MB/ second ) with wh ich a subsystem

can move data through its controller. In RAID systems, read and write

transfer rate perform ance can vary considerably, and is particularly valuablefor app lications tha t have to move large am oun ts of data qu ickly, like

document imaging or digital video.

4


29/84


I/O Operations per Second (IOPS)

IOPS are a m easure of the ability of a storage system to h and le multiple,

indep end ent I/ O requests in a certain period of time. RAID systems with

high transfer rates do not always have good transaction rate performance.

Database and transaction processing systems are examples of applications

that typically require high I/ O rate p erformance.

Recovery After a Failure

When a disk in a protected RAID system does fail, the array continues to

operate in reduced mode. In reduced m ode, array performance may be lowered,

depend ing on the RAID level being u sed and wh ether reads or w ritespredominate. Upon encountering a failure, the simplest RAID systems inform

the administrator of the problem, leaving them to organize the replacement of

failed compon ents and to begin the restoration of norm al operations. The most

soph isticated systems can au tomatically begin reconstructing data to a reserve

dr ive called a hot spare. Hot sparing allows the array to automatically return to

full performance quickly, and p ermits adm inistrators to defer maintenance to a

more conven ient time. Some systems even allow d efective d rives to be

replaced without interrupting processing or removing power, but such hot

swapping is sometimes less useful than a system th at can instantly begin

recovery using a hot spare.

Popular RAID Levels

Can RAID subsystem s actually eliminate the I/ O bottleneck and p rovide very

high levels of perform ance? In some circumstances, RAID configu rations can

dramatically improve performance and in others may actually impair it. To

accommodate the various kinds of demands put on storage systems, different

levels of RAID hav e been d efined, each w ith their ow n sp ecial characteristics.

Five RAID levels, numb ered 1 throu gh 5, were d efined by the Berkeley

researchers. Since that time, a few more have been added some simply

variations of the original five. The four RAID levels offered in the Sun

StorEdge array product family, levels 0, 1, 5, and 0+1, are discussed here.

4


30/84


Level 0 - Striping

Striping is an indu stry term for breaking a data stream up and placing it across

mu ltiple d isks in equ al-sized chunks or stripe blocks. Each chu nk is sequ entiallywr itten to successive drives in the ar ray. The set of sequential chun ks that

begins with the first drive and ends with the last drive forms the stripe

(Figure 4-3). Array management software running on the host, on a special

controller inside th e array, or some combination of both , is responsible for

making the array look like a normal, or virtual, disk to the operating system

and applications.

Figure 4-3 RAID 0 breaks data into equal sized chunks of data in sets that form astripe. In this figure, the stripe width is three, with chunks 1, 2, and 3forming a single stripe.

Each disk in a stripe is generally assumed to be on its own independent datachannel, allowing the transfer rate of a RAID 0 imp lementation to app roach the

sum of the transfer rates of each of the drives. (Presum ing that th e intervening

software and hardw are has sufficient throughpu t.)

RAID 0 can also p rovide excellent IOPS performance, because d ata is spread

across many independ ent spindles and actuators, helping to balance the I/ O

load. Op timizing RAID 0 for transfer rate or IOPS performance requires

administrators to make configuration decisions about the relationship of the

sizes of the chunks in a stripe and the average size of disk transactions.

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

Chunk 6

Chunk 1

Chunk 2

Chunk 3Array ManagementSoftware

Virtual Disk

PhysicalDisk 1

PhysicalDisk 2

PhysicalDisk 3

Chunk 4

Chunk 5

Chunk 6

4


31/84


Level 1 M irroring

Used for a long time in fault tolerant ap plications, this type of array is also

called shadowing or mirroring. Designed to optimize data availability ratherthan speed, RAID 1 duplicates each write transaction to one or more mirror

d isks (Figure 4-4).

Mirrored systems can be used to improve the IOPS performance of read

operations, because the least busy d rive can be selected to service requests.

Because both drives must be involved in write operations, write performance

may be slightly worse than an independ ent drive.

Figure 4-4 RAID 1, or mirroring, provides d ata redu ndan cy by recording datamu ltiple times on independent spindles.

Level 5 Striping with Interleaved Parity

RAID 5 exploits the p erformance gains of striping w hile mitigating its

vu lnerability to d ata loss by ad ding error correction information (called parity)to the data.

RAID 5 allows independent access to individual drives in a group with striped

da ta. If the d ata is interleaved across the group of drives in blocks equal in size

or larger than most I/ O requests, then drives can respond ind ependently to

requests, and even to m ultiple requests simu ltaneously. This has the effect of

allowing RAID 5 to have very good random read performance. Conversely, if

the block sizes in a stripe are smaller than the average size of an I/ O request,

forcing m ultiple dr ives to wor k together, RAID 5 systems can a lso enjoy goodsequential performance.

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 1

Chunk 2

Chunk 3Chunk 4

Array ManagementSoftware

Chunk 1

Chunk 3

Chunk 4

Chunk 2

Virtual Disk

4


32/84


Figure 4-5 RAID 5 interleaves parity and data across the stripe to improve

performance.

With RAID 5, both the parity information and data are interleaved across the

array in a cyclic pattern (Figure 4-5). If one of the disks fail, a simple algorithm

allows the missing d ata to be recovered from the remaining disks. RAID 5 is a

good compromise for those needing good rand om p erformance without

incurring the expense of mirroring.

RAID 5s failing comes from its p oor p erformance on small wr ites. When a

single-block w rite command is issued to a RAID 5 configu ration, the systemmu st first complete four I/ O op erations and two p arity calculations in a read-

modify-write sequence, degrading ran dom write performance.

Cached RA ID 5

Using a n on-volatile cache in th e disk controller, the need to mod ify w rite

operations can be optimized to use full width stripes, resulting in very good

wr ite performance. This is often know n as h ardw are RAID 5 or controller-

based RAID. Sun s StorEd ge A1000, A3000, and A7000 subyst ems p erform high

speed cached RAID 5 operations for write-intensive workloads.

Chunk 1

Chunk 2

Chunk 3

Chunk 4

Chunk 5

Chunk 1

Chunk 2

Chunk 3


Physical

Disk 1

PhysicalDisk 2

PhysicalDisk 3

PhysicalDisk 4

P(1-3)

Chunk 6

P(4-6)

Chunk 7

Chunk 8

Chunk 9

P(7-9)

Chunk 10

Chunk 11

Chunk 12

P(10-12)

Virtual Disk

Chunk 4

Chunk 5

Chunk 6

Chunk 7

Chunk 8

Chunk 9Chunk 10

Chunk 11

Chunk 12

4


33/84


RAID 0+1 Striping plus Mirroring

Useful configurations have been created by combining two existing RAID

levels. One of the more popular is known as RAID 0+1, which combines thereliability of mirroring (RAID 1) with the performance of striping (RAID 0)

(Figure 4-6).

The reliability of a striped and mirrored system is excellent because of the h igh

redu nd ancy afforded by m irroring. RAID 0+1 systems can tolerate the failure

of one or more disks, and continue to deliver data with virtually no

performa nce degra dation, u nlike RAID 5. RAID 0+1 systems do carry,

however, the higher cost of mirrored systems, as data requiring protection

needs twice the disk space of simple independent spindles.

Some subsystems also support RAID 1+0, in which data is first mirroed and

then striped. This scheme can withstand more individual disk failures and

man tain d ata av ailability.

Figure 4-6 RAID 0+1 combines m irroring an d str iping to achieve excellent ava ilabilityand performance.


Chunk 2

Chunk 1

Chunk 3

Chunk 4

Chunk 1

Chunk 3Chunk 5

Chunk 7

Chunk 2

Chunk 4

Chunk 6

Chunk 8



Striping

Mirroring

Virtual DisksPhysical Disks

Virtual Disk

Chunk 1

Chunk 3

Chunk 5

Chunk 7

Chunk 2

Chunk 4

Chunk 6

Chunk 8

Striping

Chunk 6

Chunk 5

Chunk 7

Chunk 8Chunk 2

Chunk 1

Chunk 3

Chunk 4

Chunk 6

Chunk 5

Chunk 7

Chunk 8

Chunk 2Chunk 1

Chunk 3

Chunk 4

Chunk 6

Chunk 5

Chunk 7

Chunk 8

4


34/84


Optimizing the RA ID Configuration

The various RAID levels have d istinctive strength s, and therefore tend to be

best for certain kinds of applications (Table 4-2). Database management, forexample, requires good rand om p erformance, and indep endent spind les are

the best solution. If reliability is a concern, however, mirroring (RAID 1) can

give excellent performance, but is expensive for large data sets. RAID 5 offers

slightly worse throughput, but with good redundancy and economy. Clearly,

finding the right solution requires a good und erstanding of the I/ O load.

Table 4-2 RAID levels provide varying degrees of availability with correspondingtrade-offs in performance and cost

Interlace Size

The interlace size specifies the chunk sizethe amount of data written to each

disk before moving on to th e next componen t in a stripe. Optimally setting th e

interlace size can h elp ensu re the RAID configuration p erforms w ell for bothread an d w rite operations. RAID 0 and RAID5 configura tions require different

settings for the interlace size, depen ding on the type of I/ O:

In striped configurations experiencing a high volume of rand om access I/ O,

the interlace size shou ld be set large relative to the size of a typical I/ O

request.

In striped configu rations subject to mostly sequential access I/ O, the

interlace size should b e set small relative to the size of the typical I/ O

request. One typ ical strategy to d etermine the interlace size is to divide th e

typical I/ O size by the nu mber of disks in the stripe (Figure 4-7).

Strengths Weaknesses

RAID 0

Striping

Improves I/O performance No redundancy

A single disk failure causes all data tobe lost

RAID 1

Mirroring

Improves data availability

May improve read performance

Expensive

Potentially decreased write

performance

RAID 0+1Striping Plus MIrroring

Improves data availability without

sacrificing performance

Expensive

RAID 5

Striping Plus Distributed Parity

Improves data availability

Error correction information

Lower cost than mirroring

Survives only a single disk failure

Poor small write performance

4


35/84


Figure 4-7 Setting the interlace size in striped configurations fosters higher sequentialperformance

RAID 5 configurations perform best when write requests align with the

stripe and the I/ O is a mu ltiple of the interlace size. Ind eed, full stripe

wr ites can be u tilized if the size of a typical I/ O requ est equals the interlace

size multiplied by one less than the number of components in the stripe

(Figure 4-8).

Figure 4-8 Setting the interlace size in RAID 5 configurations promotes faster writeoperations

More information on configuring and optimizing RAID systems can be found

in Sun Performance and Tuning, Java and the Internet, Chapter 8: Disks, and

Configuration and Capacity Planning for Solaris Servers, Chapter 7: Storage

Subsystem Architecture.

Host-Based RA ID versus Controller-Based RA ID

In recent years, much has been said about the implementation options

available for RAID subsystems, and contention has ar isen over the p erceived

benefits and drawbacks to both host-based RAID and controller-based RAID

devices. While traditional controller-based RAID su bsystems em ploying SCSItechnology were long seen as the answ er to higher performan ce by minimizing

the amou nt of traffic flowing between the host and peripheral, the adv ent of

100 MB/ second Fibre Chann el is bridging th e gap. Utilizing a faster transport

medium like Fibre Channel, the performance of host-based RAID storage

systems compete w ith, and in man y cases surp ass, their controller-based RAID

counterp arts. In add ition, the low er cost and h igher d egree of flexibility offered

with host-based RAID solutions like the Sun StorEdge A5000 ensures that users

have access to the best price/ performan ce configura tions available.

I/O_Size___________________________

Number_of_componentsInterlace_Size =

I/O_Size = Interlace_Size * (Number_of_components - 1)

4


36/84


Disk A ctivity and Performance iostat

Two factors that impact overall system throughput are CPU utilization and

disk I/ O activity. If there are insufficient CPU resou rces to process data andexecute tasks, d ata w ill not be stream ed to devices and performa nce will suffer.

In addition, it is possible for disks to become saturated, rendering them

incapable of hand ling higher loads of traffic.

The iostat utility bund led in the Solaris operating en vironmen t helps system

administrators determine disk activity patterns and performance. In order to

iteratively repor t disk I/ O activity as well as CPU u tilization, the Solaris kernel

maintains a series of counters:

The number of reads and writes per disk The number of bytes read and bytes written per d isk

Time stamps

The residence time and cumulative residence length product for each queue

These statistics enable iostat to produce highly accurate measures of

throughput, utilization, queue lengths, transaction rates, and service times

(Figure 4-9).

Figure 4-9 The iostat utility reports I/ O statistics which can be analyzed todetermine CPU an d disk bottlenecks.

More information on iostat can be found in the iostat(1M) man p age.

# iostat -xpextended device statistics

devicedad0dad0,adad0,bdad0,cdad1dad1,adad1,b

dad1,cfd0sd2nfs1

r/s0.00.00.00.00.00.00.0

0.00.00.00.0

w/s0.00.00.00.00.20.20.0

0.00.00.00.0

kr/s0.00.00.00.00.10.10.0

0.00.00.00.0

kw/s0.00.00.00.01.41.40.0

0.00.00.00.0

wait0.00.00.00.00.00.00.0

0.00.00.00.0

actv0.00.00.00.00.00.00.0

0.00.00.00.0

svc_t0.00.00.00.0

61.761.77.9

0.00.00.00.0

%w0000000

0000

%b0000000

0000

4


37/84


Managing Disk Space

Data availability depends on disk drives to be able to read and write data at a

given time. System administrators must ensure that adequate disk and filesystem space exists for users and app lications. Quotas can be u sed to constrain

users from utilizing too much disk space.

Determining Free Disk Blocks and Files df

To determine the number of free disk blocks and files on the disks in the

system, administrators can employ the df command included in the Solaris

operating environment. The df comm and displays the amount of disk space

occupied by currently mounted file systems, the amount of used and available

space, as well as the amou nt of each file systems capacity that has been u sed

(Figure 4-10).

Figure 4-10 The df utility reports free disk blocks and files, enabling administrators toquickly isolate capacity problems that can lead to decreased performanceand availability.

More information on df can be found in the df(1M) man p age.

# df -ak

Filesystem/proc

/dev/dsk/c0t1d0s0fdswapauto.sharedauto.scdeauto.src-hostsauto.doeauto.wsauto_home

kbytes0

387574206378320000000

used0

8164090553120000000

avail0

30205760

5825200000000

capacity0%

22%0%9%0%0%0%0%0%0%0%

Mounted on/proc

//dev/fd/tmp/shared/scde/src/net/doews

/home-xfnnevermore:vold(pid236)

00

00

00

0%0%

/xfn/vol

4


38/84


Determining Disk Usage du

The ability to ensure that disks do not completely fill aids both performance

and availability. System ad ministrators can obtain a q uick gauge of d isk usage

by executing the du command. The du command recursively describes the

number of kilobytes contained in all files and directories (Figure 4-11). The size

of the file space allocated to a d irectory is reported as the su m of the total sp ace

allocated to the files contained in the d irectory, as w ell as the space allocated to

the d irectory itself. All files sizes are reported in, and roun ded to, the nearest

512 byte unit.

Figure 4-11 The du utility describes the amount of disk space allocated to files anddirectories.

More information on the du comm and can be found in the du(1M) man page.

System Activity Reporter sar

The Solaris operating environm ent includes a system activity reporter (sar)

that sam ples cumu lative activity coun ters at specified intervals. The sar

command can also be used to determine storage related information, including:

Disk activity levels for system disks (-d option)

Buffer cache activity (-b option)

File access system rou tine usag e (-a option)

Process status (-v option)

Average queue length (-q option)

More information on the sar comm and can be found in the sar(1M) man

page.

# du -as /tmp

306128 /tmp

4


39/84


Volume M anagement

The combination of disk drives and intelligent software to manage them is

common ly referred to as a storage management system. With increasing n um bersof disks and a higher d eman d for availability in mission-critical environmen ts,

storage management systems h ave a nu mber of requirements:

Reliability

Organizations need assurance that the d ata they d epend on is accurate.

Storage management systems must implement the features necessary to

ensure the accurate storage and retrieval of data.

Data and system availability

Many org anizations run 7x24. In these environmen ts, it is imperative for

data to be available on demand. For many, unavailable data can mean a

significant loss in income. Wh en a system fails, data mu st be accessible to an

alternate host. In the event of a d isk failure, some means m ust exist for the

data stored on that disk to be made available to users.

Performance

Storage management systems mu st provide the performance required bytodays d emand ing app lications and environments.

Management of large amounts of data

Users are generating more data than ever before. Organizations need a

method for organizing disks and the d ata on th em such that d ata reliability

and availability are maintained without sacrificing performance.

Simplified administration

Managing a large nu mber of disks can be dau nting and challenging. Storage

management systems must provide tools to ease this task.

Application integration

A storage management system must fit seamlessly into existing customer

server, database and OLTP environments.

Sun p rovides tw o tools to aid th ese efforts: the Sun StorEdge Volume Manager,

and RAID Manager. More information on Sun volume management solutions

can be found at http://www.sun.com/storage/products.html on the World Wide

Web.

4


40/84



41/84

37

File System Tuning 5

The Solaris operating environment supports several file system types, each

designed to provide high performance to users. These file systems include the

stand ard Unix File System (UFS), a RAM disk file system (Tmpfs), and th e

VERITAS file system (VxFS) discussed below. Two additional file systems,

CacheFS and NFS, are geared toward enhancing the performance of network

operations and are discussed in chapter 7 of this document.

Unix File System

The Unix File System (UFS) is the stand ard file system used to man age d isk

dr ives in the Solaris operating environm ent. Based on the Berkeley Fast File

System, UFS is designed for high performan ce. Its tree-like structure, ma naged

by a structure know n as an inode, enables fast access to directories and files.

With UFS, all files that are read are stored in RAM. If RAM becomes full, files

are flushed to disk and retrieved w hen needed . Consequently, the more RAMin the system the better overall file system performance.

UFS and Transaction Logging

When u nschedu led system dow ntime occurs, such as a system p anic or power

failure, partially comp leted up dates to th e file system are possible. If partial

up da tes exist, the file system can be left in an inconsistent state. To fix these

inconsistencies, UFS file systems are checked at boot time (with fsck an d

quotacheck) and repaired, if possible. Because mu ltiple and complex

5


42/84


directory hierarchies often exist, and because fsck mu st traverse each an d

every h ierarchy to ensure its integrity an d consistency, the fsck process is

often time consuming.

The combination of Solaris 2.4 and the Solstice DiskSuite storage man agementtool offers a new feature called UFS Logging, sometimes referred to as

journ alling. UFS logging tak es the (logically) synchronou s natu re of up dating a

file system and makes it asynchronous. Updates to a file system are not made

directly to the disk or partition containing the file system. Instead, user data is

written to the device containing the file system, but the file system disk

structures are not modifiedthey are logged instead.

Updates are applied to the file system structure at key times to ensure

consistency. Changes m ad e to the file system by u nfin ished system calls arediscarded , ensuring th at the file system is in a consistent state. This mean s that

logging file systems do not have to be checked at boot time, speeding the

reboot p rocess.

System administrators should consider the following when configuring UFS

file systems with transaction logging:

The absolute m inimum size for a transaction log is 1 MB of disk spa ce. A

recommend ed minimu m is 1 MB of log space for every 1 GB of file systemspace.

The larger the log, the more simultan eous file system operations p er second .

As a general ru le, ad ministrators shou ld p lan on 1 MB of log space for every

100 MB of file system space being logged.

Small file systems, or p rimarily read-intensive file systems, will not benefit

from tran saction logging. H ow ever, file systems experiencing a significant

num ber of up dates should be logged.

More information about UFS transaction logging can be found in the Solstice

DiskSuite Users Guide, and Solstice DiskSuite Reference Guide available at

http://www.docs.sun.com on the World Wide Web.

Configuring UFS File Systems for M aximum Performance

The advent of very large disk drives and RAID subsystems is leading to ever

increasing file system sizes. Users of large file systems n eed to reset a nu mber

of UFS para meters to ensure efficient disk utilization and I/ O perform ance:

5


43/84

File Sy stem Tuning 39

inode density

When U FS wa s introdu ced, most files were very sm all, averaging on ly 1 to 2

KB in size. Today, files are approaching 64 KB or larger. To ensure the file

system u tilizes space efficiently, system ad ministrators shou ld set the inodeden sity to one inod e per 8 KB of file space.

minfree

Disk have grown tremen dou sly since the minfree=10% rule w as set back in

1980. At that time, 10 percent of a disk ranged from 2.5 MB to 5 MB. Today,

10 percent of a typical disk is several hund red m egabytes. For large file

systems, or those using d isk drives larger than 2 GB, minfree should be set

to 1 percent to reduce file system overhead and wasted disk space. On largefile systems, minfree is autom atically redu ced (Solaris 2.6 and later).

maxcontig

The file system cluster size, or maxcontig, specifies the m aximu m num ber

of blocks belonging to a single file that can be allocated contiguously before

inserting a rotational delay of the disk. Performa nce may be imp roved if the

file system I/ O cluster size is an integral of the stripe w idth of file systems

on RAID 0 or RAID 5 volum es.

cylinders per cylinder group

Volumes an d file systems that u tilize more than 8 GB of disk space may

need to increase the size of a cylinder group from 16 cylinders p er cylind er

group to as many as 256 cylinders per cylinder group.

More information on configuring UFS file systems can be found in...?

TmpfsThe Solaris operating en vironmen t includ e sup port for Tmpfs, a temp orary file

system that utilizes memory to cache writes scheduled to go to disk. System

administrators can improve system performance by making extensive use of

Tmpfs to store shor t-lived files. This schem e enables temp orary files to be

written and accessed without the typical performance impact associated with

writing information to disk.

5


44/84


A system is a candid ate for Tmpfs usage if:

If it ru ns a pp lications th at create temp orary, short-lived files. (Systems th at

run compilers are a good example of this type of environment.)

It has a lot of memory and swap space.

It employs slow disks.

Since many ap plications already emp loy the /tmp directory on Solaris systems

for temporary files, it is a natural candidate for Tmpfs. System administrators

should note, however, that any directory can utilize Tmpfs.

More information on Tmpfs can be found in...?

VERITAS File System

The VERITAS File System is a high-performance, fast recovery, enterprise file

system. Typically emp loyed w ith very large ma ss storage subsystems from

Sun, the VERITAS File System offers a host of features and benefits to users,

including:

Enterprise file system

The VERITAS File System is d esigned as an enterpr ise file system, provid ing

superior file system performance in certain applications and increased

availability over th e stand ard UN IX file system (UFS).

Availability

High availability is an essential requirement of enterprises run ning m ission-

critical applications. The VERITAS File System provides fast recovery after

system crashes, decreasing system down-time and increasing data and

app lication availability. Adm inistrative op erations such as file systembackup, resizing, and defragmentation can occur on-line. Administrators no

longer need to take the file system off-line to perform th ese common

operations, increasing file system availability and making administration

more convenient.

High performance

Using a more efficient extent-based a llocation schem e, hashed directories, as

well as layou t and caching mod ifications, the VERITAS File System p rovides

the high I/ O performance demanded by large enterprise databases and

other applications.

5


45/84

File Sy stem Tuning 41

Extent -Based A llocation

The VERITAS File System allocates space in contiguous segments, or extents,

instead of sm all fixed-size disk blocks. This can accelerate sequen tial I/ O by

reducing seek and disk rotation times. Performance is also improved as fewer

I/ O operations are needed to read and write large amounts of data.

Extent sizes can be chosen automatically by the VERITAS File System based on

the I/ O pa ttern of the file, or can be explicitly selected by th e adm inistrator.

More information on the VERITAS File System can be found at

http://www.sun.com/storage/software/veritas.html on the World Wide Web.

Tuning File Systemstunefs

The layout of a file system can have a d ram atic affect on I/ O perform ance.

When a file system is over 90 percent full, system ad ministrators shou ld

optimize it for time, ensuring higher throughput. The tunefs utility is

designed to change the dynamic parameters of a file system than affect layout

policies.

More information on tunefs can be found in the tunefs(1M) man p age.

5


46/84



47/84

43

Networking 6

The last decade has seen a ten-thousand fold increase in computer

performa nce. This increase in pow er, along with ad vances in operating systems

and graphical desktop environments, has spurred the d evelopment of an array

of new software tools and technologies. These new capabilities in softwa re

have in turn pushed computing resources to their limit, causing users to

deman d h igher performance hardware. Networks are developing through a

similar technology and demand-driven cycle.

Networks must not only keep pace with technological advancements, they

must ensure that sufficient resources are available to meet the demands of

increasingly soph isticated ap plications. To meet th ese needs, netw orks m ust

perform well and respond to client requests as needs dictate.

When a netw ork server is un able to respond to a clients request, the client

typically retransm its the request a n um ber of times. Each retransmission

induces additional system overhead and generates more network traffic.

System a dm inistrators can mitigate the extent of excessive retransmissions by

improv ing three key cond itions: data integrity, system performa nce, and

network congestion.

6

Selecting the Right Network Technology


48/84


Selecting the Right Network Technology

Local Area Networks

Local Area Networks, or LANs, provide a means to connect systems within a

limited distance, facilitating information an d resou rce sharing in a mu ltivend or

environm ent. While adequ ate for small, centralized en vironmen ts, existing

LAN techn ologies are qu ickly becoming a bottleneck in larger d istributed

computing environments.

In the futu re, with resources as likely to be remote as local, networ ks mu st be

able to accommodate the rapid movement of large amounts of data. With

escalating needs for collaboration, organizations are demanding advancedfunctionality, such a s video mu lticasting, time-synchronou s ap plications, and

integration with wide area networks. Enterprises require high speed networks

(100 Megabits/ second or abov e) but will not accept expen sive chan ges to their

existing hardware, software and training. To keep up with the expanding

demands of computing, Sun believes that three technologies have promise:

Ethernet, FDDI, and ATM.

Ethernet

Conventional Ethernet was designed when it was ambitious for an affordable

departmental system to deliver 1 SPECmark. Provided as a standard feature or

low-cost option with nearly every w orkstation, 10 Megabit/ second Ethernet

(Ethernet or Ethernet 10) is a u biquitous, inexpensive alternative w hich is

simple to d eploy. Its pop ularity has m ade p ossible the economies of scale that

come from volume manu facturing, which in turn has mad e Ethernet

inexpensive.

Althoug h it h as extensive capabilities, such a s ease of connectivity, and a stable

installed base, Ethernet is inadequate for todays ambitious requirements

(Figure 7-2). Its inability to scale to more th an a few h un dred users, or easily

accommodate bandwidth-intensive applications makes it difficult to use in

many environments.

Because Ethernet is collision-based, on ly one d evice can tra nsmit a t a given

time. Should two devices attempt to transmit simultaneously, their data

packets w ill collide, resulting in lower aggregate th rough pu t. A 10BaseTEthernet network is considered saturated at 35percent utilization.

6

Consequently approximately 500 KB of traffic can saturate a 10BaseT LAN


49/84

Networking 45

Consequently, approximately 500 KB of traffic can saturate a 10BaseT LAN.

The use of switches and routers can minimize collisions and improve LAN

performance by breaking the network into pieces and isolating traffic to

smaller segments.

Figure 7-1 Today, networking technology is characterized by inadequate performance,complex interconnections, and multiple protocols

Fiber Data Distributed Interface

Primarily used to satisfy the needs of data communication users, the Fiber

Data Distributed Interface (FDDI) has become a standard for networks

requiring 100 Megabit/ second transm ission speed s. Providing m edia access

control, data encoding and decoding, as well as defining drivers for fiber-optic

components and a multilayered network management scheme, FDDI often is

used as the backbone for lower speed networks such as Ethernet 10.

Although a good investment when speed is required, FDDI has limitations. It

is more expensive to deploy than Ethernet, it offers limited bandwidth, and is

not as popular, and therefore is more expensive than other networking

technologies. Neverth eless, for u sers that requ ire standard ization, a high

degree of interop erability, flexibility an d robustness, FDDI is one of th e most

mature high speed network technologies available today.

CampusB

ackbone

10 Mbps

Bridge/Router

Server

10BaseTHub

10-20 Workstations in a Workgroup

n x 10 Mbps

10 Mbps

6

Fast Ethernet


50/84


Fast Ethernet

For users needing a low er-cost alternative to FDDI, Fast Ethernet (100

Megabit/ second Ethernet) may be the networ king technology of choice. Fast

Ethernet is an extension to the Ethernet 10 standard, supporting a wide range

of applications with increased bandwidth. Fast Ethernet maintains

compatibility with the installed base of wiring currently employed for

Ethernet. It has the ability to au tomatically sense 10 Megabits/ second or 100

Megabits/ second op eration and can adjust to either speed. Fast Ethernet

ensures maximum compatibility with existing environments and applications,

providing an investment-preserving migration path from conventional

Ethernet.

Gigabit Ethernet

The transition of faster interfaces, such as Fast Ethernet, to a wid espread

desktop technology had created another p roblem for netw ork ad ministrators.

As the number of devices entering the network at 100 Mbps grows, the

demand for higher speed at the backbone and server level also increases.

Organizations are finding they need to increase the level of network

performance to support computing power demands.

Expected to become the solution of choice for netw ork ad ministrators, Gigabit

Ethernet provides a raw data bandwidth of 1000 Mbps (or 1 Gigabit per

second) that aims to help the increased network congestion experienced at the

backbone and server levels. Not just a high performance networking

technology, Gigabit Ethernet also gives ad ministrators the flexibility th ey need

new full-duplex operating modes for switch-to-switch and switch-to-server

connections, and half-duplex modes for shared connections using repeaters

and the carrier sense multiple/ collision detection (CSMA/ CD) access method .

More information on Gigabit Ethernet and its application can be found in the

Sun and Gigabit Ethernetwhite paper located on the World Wide Web at

http://www.sun.com/products-n-solutions/hw/networking/whitepapers/gigabitwp.html .

Asyn chronous Transfer Mode

Perhaps the m ost promising n ew network technology is Asynchronous

Transfer Mode (ATM). Based on cell switching technology, ATM uses small,

fixed-sized cells and inexpensive, high-band wid th, low-latency switches. With

6

da ta rates from 51 Megabits per second to over 2 Gigabits per second, ATM


51/84

Networking 47

g p g p

offers the performance needed by the most demanding interactive applications

(Figure 7-2).

Figure 7-2 Advanced networking technologies have performance far exceeding that ofEthernet, the most commonly used network today

ATM is independent of the speed of the line on which it runs, making it

adaptable to most wiring technologies. ATM can run full duplex, permitting

the simultaneous transmission and receipt of data without interruption. Its

flexibility and low latency allows real-time ap plications like video a nd aud io to

be supp orted along w ith data services.

A scalable technology offering a deterministic quality of service, ATM allows

the n etwork to be tailored to th e ap plication. A global, mobile, scalable, and

app lication tran sparent p rotocol, ATM allow s the m ixture of d ata, voice and

video. With its increased performance and ability to support advanced

app lications, ATM holds the p romise of being th e prem ier network solution for

the rest of the decad e and beyond (Figure 7-4).

TransferRate(Mbits/sec)

Ethernet Fast Ethernet FDDI1

10

100

2500

ATM

10 Mbps

100 Mbps 51 Mbps to 2.5 Gbps100 Mbps

6


52/84


Figure 7-3 ATM has many characteristics that make it the preferred solution for mostfuture networking requirements

Wide Area Networks

Wide Area N etworks, or WANs, provide a m eans to connect systems which are

distributed over a large geograp hic area, facilitating informa tion and resource

sharing. WAN s mu st be low-cost, high sp eed, and easily integrated w ith LANs.

Three technologies are emerging as leading WAN contenders: Frame Relay,

Integrated Services Digital Network (ISDN) and A synchronou s Transfer Mod e

(ATM).

Frame Relay

Frame Relay is emp loyed to efficiently transfer information w ith minim al

delay. Designed to hand le data commu nication w ith large, bursty d ata packets,

it uses a transmission channel only when there is data present. This offers

significant advantages over LAN technologies that tie up lines for the entire

du ration of a session. Frame Relays ability to efficiently p rovide su ch on-

demand transmission has garnered considerable attention. However, its

inability to provid e consistent p erformance, its imp recise specifications for

Data segmented in fixed 53-byte long cells

Circuit-oriented for easy routing

Transparent to data type

Multimedia Support

Real-time capable

Low latency

Dynamic bandwidth

Star topology

Full bandwidth to each user

Easy network maintenance

Seamless integration between local

and wide-area networks

ATM Scalable Bandwidth Large volume Low cost

Wide vendor acceptance

6

congestion avoidan ce and recovery m echanisms, and its inefficiency in


53/84

Networking 49

transmitting voice and video, preclude Frame Relay from being a multimedia

WAN solution.

Integrated Services Digital Network

Integrated Services Digital Network (ISDN) emerged through the need for a

single integrated stru cture that efficiently accommod ated a variety of services.

The most widely used network technology, ISDN has addressed the growing

need s of digital data comm un ications in a var iety of ind ustries. The ability to

transport information digitally allows a single line to transmit up to 1.5

Megabits per second while providing superior quality of service over analog

tr

system tuning perf final

Documents