shell scripts for administrators

Shell Scriptsfor Administrators

IN THIS CHAPTERManaging statistics

Performing backups

Working with users

There’s no place where shell script programming is more usefulthan for the Linux system administrator. The typical Linux systemadministrator has a myriad of jobs that need to be done daily, from

monitoring disk space and users to backing up important files. Shell scriptscan make the life of the system administrator much easier! You can accom-plish all of the basic system administration functions automatically usingsimple shell scripts. This chapter demonstrates some of the capabilities youhave using shell scripts.

Monitoring System StatisticsOne of the core responsibilities of the Linux system administrator is toensure that the system is running properly. To accomplish this task, thereare lots of different system statistics that you must monitor. Creating auto-mated shell scripts to monitor specific situations can be a lifesaver.

This section shows how to create simple shell scripts that monitor andreport problems as soon as possible, without you even having to be loggedin to the Linux system.

Monitoring disk free spaceOne of the biggest problems with multi-user Linux systems is the amountof available disk space. In some situations, such as in a file sharing server,disk space can fill up almost immediately just because of one careless user.

725

Part V Advanced Topics

This shell script will monitor the available disk space on a specific volume, and send out an e-mailmessage if the available disk space goes below a set threshold.

The required functionsTo automatically monitor the available disk space, you’ll first need to use a command that candisplay that value. The best command to monitor disk space is the df command (see Chapter 4).The basic output of the df command looks like this:

$ dfFilesystem 1K-blocks Used Available Use% Mounted on/dev/hda1 3197228 2453980 580836 81% /varrun 127544 100 127444 1% /var/runvarlock 127544 4 127540 1% /var/lockudev 127544 44 127500 1% /devdevshm 127544 0 127544 0% /dev/shm/dev/hda3 801636 139588 621328 19% /home$

The df command shows the current disk space statistics for all of the real and virtual disks onthe system. For the purposes of this exercise, we’ll monitor the size of the root filesystem. We’llfirst need to parse the results of the df command to extract only the line for the root filesystem.

There are a couple of different ways to do this. Probably the easiest is to use the sed commandto search for the line that ends with a forward slash:

/dev/hda1 3197228 2453980 580836 81% /

The sed script to build this uses the dollar sign (to indicate the end of the line), and the forwardslash (which you’ll need to escape since it’s a sed special character). That will look like this:

$ df | sed -n ’/\/$/p’/dev/hda1 3197228 2453980 580836 81% /$

Now that you’ve got the statistics for the root filesystem volume, the next step is to isolate thepercentage used value in the line. To do that, you’ll need to use the gawk command:

$ df | sed -n ’/\/$/p’ | gawk ’{print $5}’81%$

This is close, but there’s still one small problem. The gawk command let you filter out the fifthdata field, but the value also includes the percent symbol. You’ll need to remove that so you canuse the value in a mathematical equation. That’s easily accomplished by using the sed commandagain:

$ df | sed -n ’/\/$/p’ | gawk ’{print $5}’ | sed ’s/%//’81$

Now you’re in business! The next step is to use this information to create the script.

726

Shell Scripts for Administrators 27

Creating the scriptNow that you know how to extract the used disk space value, you can use that formula to storethe value to a variable. You can then check the variable against a predetermined number to indi-cate when the used space has exceeded your set limit.

Here’s an example of code using this technique:

$ cat diskmon#!/bin/bash# monitor available disk space

SPACE=`df | sed -n ’/\/$/p’ | gawk ’{print $5}’ | sed ’s/%//`

if [ $SPACE -ge 90 ]then

echo "Disk space on root at $SPACE% used" | mail -s "Disk warning"richfi$

And there you have it. A simple shell script that’ll check the available disk space on the rootfilesystem and send an e-mail message if the used space is at 90% or more.

Running the scriptBefore having the diskmon script run automatically, you’ll want to test it out a few times manu-ally to ensure that it does what you think it should do. To test it, change the value that it checksagainst to a value lower than the current disk usage percentage:

if [ $SPACE -ge 40 ]When you run the script, it should send you a mail message:$ ./diskmon$ mailMail version 8.1.2 01/15/2001. Type ? for help."/var/mail/rich": 1 message 1 new›N 1 rich@testbox Tue Feb 5 06:22 16/672 Disk warning&Message 1:From rich@testbox Tue Feb 5 06:22:26 2008Date: Tue, 5 Feb 2008 06:22:26 -0500From: rich ‹rich@testbox›To: [email protected]: Disk warning

Disk space on root at 81% used

&q$

727


It worked! Now you can set the shell script to execute at a set number of times to monitor thedisk activity. You do this using the cron table (see Chapter 13).

How often you need to run this script depends on how active your file server is. For a low-volumefile server, you may only have to run the script once a day:

30 0 * * * /home/rich/diskmon

This cron table entry runs the shell script every day at 12:30 AM. For a high-volume file serverenvironment, you may have to monitor this a few times a day:

30 0,8,12,16 * * * /home/rich/diskmon

This cron table entry runs the shell script four times a day, at 12:30 AM, 8:30 AM, 12:30 PM, and4:30 PM.

Catching disk hogsIf you’re responsible for a Linux server with lots of users, one problem that you’ll often bumpup against is who’s using all of the disk space. This age-old administration question is sometimesharder to figure out than others.

Unfortunately, for the importance of tracking user disk space usage, there’s no one Linux com-mand that can provide that information for you. Instead, you need to write a shell script piecingother commands together to extract the information you’re looking for. This section walks youthrough this process.

The required functionsThe first tool you’ll need to use is the du command (see Chapter 4). This command displaysthe disk usage for individual files and directories. The -s option lets you summarize totals atthe directory level. This will come in handy when calculating the total disk space used by anindividual user. Just use this command for the /home directory contents to summarize for eachuser’s $HOME directory:

# du -s /home/*40 /home/barbara9868 /home/jessica40 /home/katie40 /home/lost+found107340 /home/rich5124 /home/test#

Okay, that’s a start. You can now see the total listing (in KB) for the $HOME directory totals.Depending on how your /home directory is mounted, you may or may not also see a specialdirectory called lost+found, which isn’t a user account.

To get statistics for all of the user $HOME directories, you must be logged in as the rootuser account when running this script.

728


To get rid of that, we use the grep command with the -v option, which prints all of the linesexcept ones that contain the specified text:

# du -s /home/* | grep -v lost40 /home/barbara9868 /home/jessica40 /home/katie107340 /home/rich5124 /home/test#

Next, let’s get rid of the full pathname so all we see are the user names. This sounds like a jobfor the sed command:

# du -s /home/* | grep -v lost | sed ’s/\/home\//’40 barbara9868 jessica40 katie107340 rich5124 test#

Much better. Now, let’s sort this output so that it appears in descending order:

# du -s /home/* | grep -v lost | sed ’s/\/home\///’ | sort -g -r107340 rich9868 jessica5124 test40 katie40 barbara#

The sort command sorts numerical values when you use the -g option, and will sort in descend-ing order when you include the -r option.

There’s just one more piece of information that you’re interested in, and that’s the total amountof space used for all of the users. Here’s how to get that value:

# du -s /home122420 /home#

This is all the information you’ll need to create the disk hogs report. Now you’re ready to pushthis into a shell script.

Creating the scriptNow that you know how to extract the raw data for the report, it’s time to figure out a scriptthat can read the report, parse the raw data values, and display it in a format that’s presentableto a user.

729


The easiest way to manipulate data for reports is with the gawk command. The report will havethree sections:

■ A header with text identifying the columns

■ The body of the report, showing the user, their total disk space used, and a percentageof the total they’re consuming

■ A footer which shows the total disk space usage for all users

The gawk command can perform all of these functions as a single command, using the BEGINand END tags (see Chapter 16).

Here’s the diskhogs script that puts all of the elements together:

# cat diskhogs#!/bin/bash# calculate disk usage and report per user

TEMP=`mktemp -t tmp.XXXXXX`du -s /home/* | grep -v lost | sed ’s/\/home\///’ | sort -g -r ›$TEMPTOTAL=`du -s /home | gawk ’{print $1}’`cat $TEMP | gawk -v n="$TOTAL" ’BEGIN {

print "Total Disk Usage by User";print "User\tSpace\tPercent"

}

{printf "%s\t%d\t%6.2f%\n", $2, $1, ($1/n)*100

}

END {print "--------------------------";printf "Total\t%d\n", n

}’rm -f $TEMP#

The script sends the result from the formula used to generate the raw data to a temporary file,then stores the result of the total disk space formula in the variable $TOTAL.

Next, the script retrieves the raw data in the temporary file, and sends it to the gawk command.The gawk command retrieves the $TOTAL value and assigns it to a local variable called n.

The code in the gawk command first creates the report header in the BEGIN section:

BEGIN {print "Total Disk Usage by user";

730


print User\tSpace\tPercent"}

It then uses the printf command to format and rearrange the text from the raw data:

{printf "%s\t%d\t%6.2f%\n", $2, $1, ($1/n)*100

}

This is the section that processes each line of output from the du command. The printf com-mand allows us to format the output to make a nice table. If you happen to have long usernameson your system, you may need to fudge the formatting some to get it to turn out.

The other trick here is that the diskhogs script passes the $TOTAL variable value to the gawkscript via the gawk command line parameter:

-v n=$TOTAL

Now the gawk variable n is equal to the total user disk space, and you can the use that valueanywhere in the gawk script.

Finally, the script ends the output display by showing the total amount of user disk spaced used:

END {print "--------------------------";printf "Total\t%d\n", n

}’

This uses the n variable, which contains the value from the $TOTAL shell variable.

Running the scriptPutting it all together, when you run the script you should get a nice report:

# ./diskhogsTotal Disk Usage by userUser Space Percentrich 107340 87.68%jessica 9868 8.06%test 5124 4.19%katie 40 0.03%barbara 40 0.03%--------------------------Total 122420#

Now you’ve got a report that you can send off to your boss and feel proud of!

If you really do have disk hogs on your system, it may take a long time for the ducommand to calculate all the space they’re using. This causes quite a delay before you

see any output from the diskhogs script. Be patient!

731


Watching CPU and memory usageThe core statistics of any Linux system are the CPU and memory usage. If these values startgetting out of control, things can go wrong very quickly on the system. This section demonstrateshow to write scripts to help you monitor and track the CPU and memory usage on your Linuxsystem, using a couple of basic shell scripts.

The required functionsAs with the other scripts shown so far in this chapter, the first step is to determine exactly whatdata you want to produce with your scripts. There are a few different commands that you can useto extract CPU and memory information for the system.

The most basic system statistics command is the uptime command:

$ uptime09:57:15 up 3:22, 3 users, load average: 0.00, 0.08, 0.28

$

The uptime command gives us a few different basic pieces of information that we can use:

■ The current time

■ The number of days, hours, and minutes the system has been operational

■ The number of users currently logged into the system

■ The one, five, and fifteen minute load averages

Another great command for extracting system information is the vmstat command. Here’s anexample of the output from the vmstat command:

$ vmstatprocs- ----memory--------- ---swap-- --io-- --system-- -----cpu------r b swpd free buff cache si so bi bo in cs us sy id wa st0 0 178660 13524 4316 72076 8 10 80 22 127 124 3 1 92 4 0$

The first time you run the vmstat command, it displays the average values since the last reboot.To get the current statistics, you must run the vmstat command with command line parameters:

$ vmstat 1 2procs- ----memory--------- ---swap-- --io-- --system-- -----cpu------r b swpd free buff cache si so bi bo in cs us sy id wa st0 0 178660 13524 4316 72076 8 10 80 22 127 124 3 1 92 4 00 0 178660 12845 4245 71890 8 10 80 22 127 124 3 1 86 10 0$

The second line contains the current statistics for the Linux system. As you can see, theoutput from the vmstat command is somewhat cryptic. Table 27-1 explains what each of thesymbols mean.

732


TABLE 27-1

The vmstat Output Symbols

Symbol Description

r The number of processes waiting for CPU time

b The number of processes in uninterruptible sleep

swpd The amount of virtual memory used (in MB)

free The amount of physical memory not used (in MB)

buff The amount of memory used as buffer space (in MB)

cache The amount of memory used as cache space (in MB)

si The amount of memory swapped in from disk (in MB)

so The amount of memory swapped out to disk (in MB)

bi Number of blocks received from a block device

bo Number of blocks sent to a block device

in The number of CPU interrupts per second

cs The number of CPU context switches per second

us Percent of CPU time spent running non-kernel code

sy Percent of CPU time spent running kernel code

id Percent of CPU time spent idle

wa Percent of CPU time spent waiting for I/O

st Percent of CPU time stolen from a virtual machine

This is a lot of information. You probably don’t need to record all of the values from the vmstatcommand; just a few will do. The free memory, and percent of CPU time spent idle, should giveyou a good snapshot of the system status (and by now you can probably guess exactly how we’llextract those values from the output).

You may have also noticed that the output from the vmstat command includes table headinginformation, which we obviously don’t want in our data. To solve that problem, you can use thesed command to display only lines that have a numerical value in them:

$ vmstat 1 2 | sed -n ’/[0-9]/p’1 0 172028 8564 5276 62244 9 13 98 28 160 157 5 2 89 5 00 0 178660 12845 4245 71890 8 10 80 22 127 124 3 1 86 10 0$

733


That’s better, but now we need to get only the second line of data. Another call to the sed editorcan solve that problem:

$ vmstat | sed -n ’/[0-9]/p’ | sed -n ’2p’0 0 178660 12845 4245 71890 8 10 80 22 127 124 3 1 86 10 0$

Now you can easily extract the data value you want using the gawk program.

Finally, you’ll want to tag each data record with a date and timestamp to indicate when the snap-shots were taken. The date command is handy, but the default output from the date commandmight be a little cumbersome. You can simplify the date command output by specifying anotherformat:

$ date +"%m/%d/%Y %k:%M:%S"02/05/2008 19:19:26$

That should look much better in our output. Speaking of the output, you should also considerhow you want to record the data values.

For data that you sample on a regular basis, often it’s best to output the data directly to a log file.You can create the log file in your $HOME directory, appending data each time you run the shellscript. When you want to see the results, you can just view the log file.

You should also spend some time considering the format of the log file. You’ll want to ensure thatthe data in the log file can be read easily (after all that’s the whole purpose of this script).

There are many different methods you can use to format the data in the log file. A popular formatis comma-separated values (CSV). This format places each record of data on a separate line, andseparates the data fields in the record with commas. This is a popular format for people who lovespreadsheets, as it’s easily imported into a spreadsheet.

However, staring at a CSV file of data is not the most exciting thing in the world. If you want toprovide a more aesthetically appealing report, you can create an HTML document.

HTML has been the standard method for formatting Web pages for years. It uses simple tags todelineate data types within the Web page. However, HTML is not just for Web pages. You’ll oftenfind HTML used in e-mail messages as well. Depending on the MUA client (see Chapter 26), youmay or may not be able to view an embedded HTML e-mail document. A better solution is tocreate the HTML report, and attach it to the e-mail message.

The script will save data in a CSV-formatted file, so you can always access the raw data to importinto a spreadsheet. When the system administrator runs the report script, that will reformat thedata into an HTML report. How cool is that?

Creating the capture scriptSince you need to sample system data at a regular interval, you’ll need two separate scripts. Onescript will capture the data and save it to the log file. This script should be run on a regular basis

734


from the cron table. The frequency depends on how busy your Linux system is. For most systems,running the script once an hour should be fine.

The second script should output the report data and e-mail it to the appropriate individual(s).Most likely you won’t want to e-mail a new report every time you get a new data sampling. You’llprobably want to run the second script as a cron job at a lower frequency, such as once a day,first thing in the day.

The first script, used to capture the data, is called capstats. Here’s what it looks like:

$ cat capstats#!/bin/bash# script to capture system statistics

OUTFILE=/home/rich/capstats.csvDATE=`date +%m/%d/%Y`TIME=`date +%k:%M:%S`

TIMEOUT=ùptime`VMOUT=`vmstat 1 2`

USERS=ècho $TIMEOUT | gawk ’{print $4}’`LOAD=ècho $TIMEOUT | gawk ’{print $9}’ | sed ’s/,//’`FREE=ècho $VMOUT |sed -n ’/[0-9]/p’ |sed -n ’2p’ |gawk ’{print $4}’ÌDLE=ècho $VMOUT |sed -n ’/[0-9]/p’ |sed -n ’2p’|gawk ’{print $15}’`

echo "$DATE,$TIME,$USERS,$LOAD,$FREE,$IDLE" ›› $OUTFILE$

This script mines the statistics from the uptime and vmstat commands and saves them in vari-ables. The script then writes the values to the file defined by the $OUTFILE variable. For thisexample, I just saved the file in my $HOME directory. You should modify this location to whatsuits your environment best.

After creating the capstats script, you should probably test it from the command line before hav-ing it run regularly from your cron table:

$ ./capstats$ cat capstats.csv02/06/2008,10:39:57,4,0.26,57076,87$

The script created the new file, and placed the statistic values in the proper places. Just to makesure that subsequent runs of the script don’t overwrite the file, test it again:

$ ./capstats$ cat capstats.csv02/06/2008,10:39:57,4,0.26,57076,8702/06/2008,10:41:52,4,0.14,46292,88$

735


As hoped, the second time the script ran it appended the new statistics data to the end of the file.Now you’re ready to place this in the cron table. To run it once every hour, create this cron tableentry:

0 * * * * /home/rich/capstats

You’ll need to use the full pathname of where you place your capstats shell script file for thecron table. Now you’re capturing statistics once every hour without having to do anything else!

Generating the report scriptNow that you have a file full of raw data, you can start working on the script to generate a fancyreport for your boss. The best tool for this is the gawk command.

The gawk command allows you to extract raw data from a file and present it in any mannernecessary. First, test this from the command line, using the new capstats.csv file created bythe capstats script:

$ cat capstats.csv | gawk -F, ’{printf "%s %s - %s\n", $1, $2, $4}’02/06/2008 10:39:57 - 0.2602/06/2008 10:41:52 - 0.1402/06/2008 10:50:01 - 0.0602/06/2008 11:00:01 - 0.1802/06/2008 11:10:01 - 0.0302/06/2008 11:20:01 - 0.0702/06/2008 11:30:01 - 0.03$

You need to use the -F option for the gawk command to define the comma as the field separatorcharacter in your data. After that, you can retrieve each individual data field and display it asyou need.

For the report, we’ll be using HTML format. This allows us to create a nicely formatted reportwith a minimum amount of work. The browser that displays the report will do all the hard workof formatting and displaying the report. All you need to do is insert the appropriate HTML tagsto format the data.

The easiest way to display spreadsheet data in HTML is using the ‹table› tag. The table tagallows you to create a table with rows and cells (called divisions in HTML-speak). You definethe start of a row using the ‹tr› tag, and the end of the row with the ‹/tr› tag. Similarly, youdefine cells using the ‹td› and ‹/td› tag pair.

The HTML for a full table looks like this:

‹html›‹body›‹h2›Report title‹/h2›‹table border="1"›

736


‹tr›‹td›Date‹/td›‹td›Time‹/td›‹td›Users‹/td›‹td›Load‹/td›‹td›Free Memory‹/td›‹td›%CPU Idle‹/td›

‹/tr›‹tr›

‹td›02/05/2008‹/td›‹td›11:00:00‹/td›‹td›4‹/td›‹td›0.26‹/td›‹td›57076‹/td›‹td›87‹/td›

‹/tr›‹/table›‹/body›‹/html›

Each data record is part of a ‹tr›/‹/tr› tag pair. Each data field is within its own ‹td›/‹/td›tag pair.

When you display the HTML report in a browser, the browser creates the table automatically foryou, as shown in Figure 27-1.

FIGURE 27-1

Displaying data in an HTML table

737


For the script, all you need to do is generate the HTML heading code by using echo commands,generate the data HTML code by using the gawk command, then close out the table, again byusing echo commands.

Once you have your HTML report generated, you’ll want to redirect it to a file for mailing. Themutt command (see Chapter 26) is a great tool for easily sending e-mail attachments.

Here’s the reportstats script, which will generate the HTML report and mail it off:

$ cat reportstats#!/bin/bash# parse capstats data into daily report

FILE=/home/rich/capstats.csvTEMP=/home/rich/capstats.htmlMAIL=`which mutt`DATE=`date +"%A, %B %d, %Y"`

echo "‹html›‹body›‹h2›Report for $DATE‹/h2›" › $TEMPecho "‹table border=\"1\"›" ›› $TEMPecho "‹tr›‹td›Date‹/td›‹td›Time‹/td›‹td›Users‹/td›" ›› $TEMPecho "‹td›Load‹/td›‹td›Free Memory‹/td›‹td›%CPU Idle‹/td›‹/tr›" ››$TEMPcat $FILE | gawk -F, ’{

printf "‹tr›‹td›%s‹/td›‹td›%s‹/td›‹td›%s‹/td›", $1, $2, $3;printf "‹td›%s‹/td›‹td›%s‹/td›‹td›%s‹/td›\n‹/tr›\n", $4, $5, $6;}’ ›› $TEMP

echo "‹/table›‹/body›‹/html›" ›› $TEMP$MAIL -a $TEMP -s "Stat report for $DATE" rich ‹ /dev/nullrm -f $TEMP$

Since the mutt command uses the file name of the attached file as the attachment file name,it’s best not to create the report file using the mktemp command. Instead, I gave the file a moredescriptive name. The script deletes the file at the end, so it’s not too important where you createthe file.

Most e-mail clients can automatically detect the type of file attachment by the fileextension. For this reason, you should end your report file name with .html.

Running the scriptAfter creating the reportstats script, give it a test run from the command line and see whathappens:

$ ./reportstats$

738


FIGURE 27-2

Viewing the report attachment in Evolution

Well, that wasn’t too exciting. The real test now is to view your mail message, preferably in agraphical e-mail client such as KMail or Evolution. Figure 27-2 demonstrates viewing the messagefrom the Evolution e-mail client.

The Evolution e-mail client provide the option of either viewing the attachment separate fromthe client window, or within the client window. Figure 27-2 demonstrates viewing the attachedreport within the client window. Notice how the data is all nicely formatted using the HTMLtables, just as it was when viewing from the browser!

Performing BackupsWhether you’re responsible for a Linux system in a business environment or just using it in ahome environment, a loss of data can be catastrophic. To help prevent bad things from happeningto your data, it’s always a good idea to perform regular backups.

739


However, what’s a good idea and what’s practical are often two separate things. Trying to arrangea backup schedule to store important files can be a challenge. This is another place where shellscripts often come to the rescue.

This section demonstrates two different methods for using shell scripts to back up data on yourLinux system.

Archiving data filesOften the cause for lost data isn’t a catastrophic system failure. Many times, data is lost in that‘‘oh no’’ moment. You know, that millisecond of time just after you click the Delete button. Forthis reason, it’s always a good idea to have a separate archived copy of data you’re working onlaying around on the system.

If you’re using your Linux system to work on an important project, you can create a shell scriptthat automatically takes snapshots of your working directories, then stores them in a safe place onthe system. While not a protection against a catastrophic hardware failure, this is a great safeguardagainst file corruption or accidental deletions.

This section shows how to create an automated shell script that can take snapshots of your work-ing directory and keep an archive of past versions of your data.

The required functionsThe workhorse for archiving data in the Linux world is the tar command (see Chapter 4). Thetar command is used to archive an entire directory into a single file. Here’s an example of creat-ing an archive file of a working directory using the tar command:

$ tar -cf archive.tar /home/rich/testtar: Removing leading `/’ from member names$

The tar command responds with a warning message that it’s removing the leading forward slashfrom the pathname to convert it from an absolute pathname to a relative pathname. This allowsyou to extract the tar archive file anywhere you want in your filesytem. You’ll probably want toget rid of that message in your script. You do that by redirecting STDERR to the /dev/null file(see Chapter 12):

$ tar -cf archive.tar /home/rich/test 2› /dev/null$

Now it’s ready to be used in a script.

If you’re using a Linux distribution that includes a graphical desktop, be careful aboutarchiving your $HOME directory. While this may be tempting, the $HOME directory

contains lots of configuration and temporary files related to the graphical desktop, and it will createa much larger archive file than you probably intended. Pick a subdirectory in which to store yourworking files, and use that as the archive directory.

740


After creating the archive file you can send it to your favorite compression utility to reduceits size:

$ ls -l archive.tar-rw-rw-r-- 1 rich rich 30720 2008-02-06 12:26 archive.tar$ gzip archive.tar$ ls -l archive*-rw-rw-r-- 1 rich rich 2928 2008-02-06 12:26 archive.tar.gz$

You now have the first component to your archive system.

The next step is to create a rolling archive system. If you save your working directory on a reg-ular basis, you may not necessarily want the newest archive copy to immediately overwrite theprevious archive copy.

To prevent this, you’ll need to create a method to automatically provide a unique filename foreach archive copy. The easiest way to do that is to incorporate the date and time in your file-names.

As you’ve seen from the ‘‘Monitoring System Statistics’’ section, you can format the date commandto produce information in just about any format you desire. This comes in handy when creatingunique filenames.

For example, if you want to create a filename using the two-digit year, the month, and the date,you’d do something like this:

$ DATE=`date +%y%m%d`$ FILE=tmp$DATE$ echo $FILEtmp080206$

While it may look odd to squish the $DATE variable at the end of a text string, it’s perfectly legalin the bash shell. The shell appends the variable’s value to the string, creating a unique filename.

Using a date to uniquely identify a file can get tricky when looking for the most recentfile version. Make that sure you remember the format you use to tag the date (such

as month-day-year or year-month-day). Often when sorting file names it is good to specify the yearfirst, then the month, and finally, the day.

You should now have enough information to start building the script. The next section walks youthrough creating the archive script.

Creating a daily archive scriptThe archdaily script automatically creates an archived version of your working directory in aseparate location, using the day to uniquely identify the file. Here’s the code for that script:

$ cat archdaily#!/bin/bash

741


# archive a working directory

DATE=`date +%y%m%d`FILE=archive$DATESOURCE=/home/rich/testDESTINATION=/home/rich/archive/$FILE

tar -cf $DESTINATION $SOURCE 2› /dev/nullgzip $DESTINATION$

The archdaily script generates a unique filename based on the year, month, and day that it runs.It also uses environment variables to define the source directory to archive, so that can be easilychanged. You can even use a command line parameter to make the archdaily program even moreversatile.

The $DESTINATION variable appends the full pathname for the archived file. This too can beeasily changed to an alternative directory if needed.

Testing the archdaily script is pretty straightforward:

$ ./archdaily$ ls -al /home/rich/archivetotal 24drwxrwxr-x 2 rich rich 4096 2008-02-06 12:50 .drwx------ 37 rich rich 4096 2008-02-06 12:50 ..-rw-rw-r-- 1 rich rich 3046 2008-02-06 12:50 archive080206.gz$

The data is now safely archived away.

Creating an hourly archive scriptIf you’re in a high-volume production environment where files are changing rapidly, a dailyarchive might not be good enough. If you want to increase the archiving frequency to hourly,you’ll need to take another item into consideration.

If you’re backing up files hourly, and trying to use the date command to timestamp each filename, things can get pretty ugly pretty quickly. I don’t thing you’d want to have to sift through adirectory of files with filenames looking like this:

archive080206110233.gz

Instead of placing all of the archive files in the same folder, you’d be better off creating a directoryhierarchy for your archived files. Figure 27-3 demonstrates this principle.

The archive directory contains directories for each month of the year, using the month numberas the directory name. Each month’s directory in turn contains folders for each day of the month(using the day’s numerical value as the directory name). This allows you to just timestamp theindividual archive files, then place them in the appropriate directory for the day and month.

742


FIGURE 27-3

Creating an archive directory hierarchy

/home/rich/archive

01

02

01

02

01

Base

Month

Day

Now you have a new challenge to solve. Your script must create the individual month and daydirectories automatically, and know that if they already exist, they don’t need to be created. As itturns out, this isn’t as hard as it sounds!

If you peruse the command line options for the mkdir command (see Chapter 3), you’ll findthe -p command line option. This option allows you to create directories and subdirectoriesin a single command, plus the added benefit in that it doesn’t produce an error message if thedirectory already exists. Perfect!

We’re now ready to create the archhourly script. Here’s what it looks like:

$ cat archhourly#!/bin/bash# archive a working directory hourly

DAY=`date +%d`MONTH=`date +%m`TIME=`date +%k%M`

743


SOURCE=/home/rich/testBASEDEST=/home/rich/archive

mkdir -p $BASEDEST/$MONTH/$DAY

DESTINATION=$BASEDEST/$MONTH/$DAY/archive$TIMEtar -cf $DESTINATION $SOURCE 2› /dev/nullgzip $DESTINATION$

The script retrieves the day and month values from the date command, along with the times-tamp used to uniquely identify the archive file. It then uses that information to create the archivedirectory for the day (or just silently exit if it already exists). Finally, the script uses the tar andgzip commands to create the archive and compress it.

Just as with the archdaily script, it’s a good idea to test the archhourly script before putting it inthe cron table:

$ ./archhourly$ ls -al /home/rich/archive/02/06total 32drwxrwxr-x 2 rich rich 4096 2008-02-06 13:20 .drwxrwxr-x 3 rich rich 4096 2008-02-06 13:19 ..-rw-rw-r-- 1 rich rich 3145 2008-02-06 13:19 archive1319.gz$ ./archhourly$ ls -al /home/rich/archive/02/06total 32drwxrwxr-x 2 rich rich 4096 2008-02-06 13:20 .drwxrwxr-x 3 rich rich 4096 2008-02-06 13:19 ..-rw-rw-r-- 1 rich rich 3145 2008-02-06 13:19 archive1319.gz-rw-rw-r-- 1 rich rich 3142 2008-02-06 13:20 archive1320.gz$

The script worked fine the first time, creating the appropriate month and day directories, thencreating the archive file. Just to test things out, I ran it a second time to see if it would havea problem with the existing directories. The script again ran just fine and created the secondarchive file. It’s now ready for the cron table!

Storing backups off-siteWhile creating archives of important files is a good idea, it’s certainly no replacement for a fullbackup that’s stored in a separate location. In large commercial Linux environments, systemadministrators often have the luxury of tape drives to copy important files off of the server forsafe storage.

For smaller environments this is not always an option. Just because you can’t afford a fancybackup system doesn’t mean that you have to have your important data at risk. You can puttogether a shell script that automatically creates an archive file, then sends it outside of the Linuxsystem. This section describes just how to do that.

744


The required functionsYou’ve already seen the core of the archiving system, the tar and gzip commands. The trick isgetting the compressed archive file off of the Linux system. If you have the e-mail system config-ured (see Chapter 26) that can be your portal to the outside world!

While system administrators don’t often think of e-mail as a way to archive data, it’s a perfectlyacceptable method in a snap for getting your archived data to another safe location. I wouldn’trecommend trying to archive your entire Linux system and send it out as an e-mail attachment,but archiving a working directory and using it as an attachment should not be a problem.

Let’s take a look at this script and see how it works.

Creating the scriptThe mailarch script creates a normal archive file as you did with the archdaily and archhourlyscripts. Once it creates the archive file, it uses the Mutt e-mail client to send the file to a remotee-mail address.

Here’s the mailarch script:

$ cat mailarch#!/bin/bash# archive a working directory and e-mail it out

MAIL=`which mutt`DATE=`date +%y%m%d`FILE=archive$DATESOURCE=/home/rich/testDESTINATION=/home/rich/archive/$FILEZIPFILE=$DESTINATION.zip

tar -cf $DESTINATION $SOURCE 2› /dev/nullzip $ZIPFILE $DESTINATION$MAIL -a $ZIPFILE -s "Archive for $DATE" [email protected] ‹ /dev/null$

One thing you may notice about the mailarch script is that I use the zip compression utilityinstead of the gzip command. This allows you to create a .zip file that is more easily handled onnon-Linux systems, such as Microsoft Windows workstations.

Running the scriptThe mailarch script runs the same was as the archdaily script, by placing it in the cron table torun at a specific time of the day (usually late at night). When the script runs, it creates the archivefile, compresses it using the zip command, then sends it off in an e-mail message. Figure 27-4demonstrates what the e-mail message you receive should look like.

Congratulations, you now have a method for backing up your important files from your Linuxsystem.

745


FIGURE 27-4

E-mail message with archive file

SummaryThis chapter put some of the shell-scripting information presented in the book to good use inthe Linux system administration world. When you’re responsible for managing a Linux system,whether it’s a large multi-user system or your own system, there are lots of things you need towatch. Instead of pouring over log files, and manually running commands, you can create shellscripts to do the work for you.

The chapter demonstrated how to use the df command to determine available disk space, thenuse the sed and gawk commands to retrieve specific information from the data. Passing the out-put from a command to sed and gawk to parse data is a common function in shell scripts, so it’sa good idea to know how to do it.

Next the chapter showed how to use the du command to determine the disk space used by indi-vidual users. The output from the du command was again passed to the sed and gawk commandsto help filter out just the data that we were interested in.

746


The next section walked you through the world of creating system logs and reports. The capstatsshell script captured vital system statistics on a regular basis, extracted the data from the com-mands, and stored it in a comma-separated value file. This data could then be imported into aspreadsheet, but instead you saw how to create another script to create a formatted report fromthe data. The reportstats shell script used HTML formatting to place the raw system data into anHTML table, then used the Mutt e-mail client to send the file to a remote user.

The chapter closed out by discussing using shell scripts for archiving and backing up data files onthe Linux system. The tar and gzip commands are popular commands for archiving data. Thechapter showed how to use them in shell scripts to create archive files, and manage the archivefiles in an archive directory. Finally, you saw how to e-mail an archive file to a remote user as acrude form of backup for important data files.

Thanks for joining me on this journey through the Linux command line and shell scripting. Ihope that you’ve enjoyed the journey and have learned how to get around on the command line,and how to create shell scripts to save you time. But don’t stop your command line educationthere. There’s always something new being developed in the open source world, whether it’s anew command line utility, or a full-blown shell. Stay in touch with the Linux community, andfollow along with the new advances and features.

747

Quick Guide to bashCommands

As you’ve seen throughout this book, the bash shell contains lots offeatures and thus has lots of commands available. This appendixprovides a concise guide to allow you to quickly look up a feature

or command that you can use from the bash command line or from a bashshell script.

Built-In CommandsThe bash shell includes many popular commands built into the shell. Thisprovides for faster processing times when using these commands. Table A-1shows the built-in commands available directly from the bash shell.

The built-in commands provide higher performance than external com-mands, but the more built-in commands that are added to a shell, the morememory it consumes with commands that you may never use. The bashshell also contains external commands that provide extended functionalityfor the shell. These are discussed in the next section.

Bash CommandsBesides the built-in commands, the bash shell utilizes external commandsto allow you to maneuver around the filesystem and manipulate files anddirectories. Table A-2 shows the common external commands you’ll wantto use when working in the bash shell.

You can accomplish just about any task you need to on the command lineusing these commands.

749

shell scripts for administrators

Documents