Introduction to Unix (CA263)
File Processing
Guide to UNIX Using Linux, Third Edition 2
Objectives
• Explain UNIX and Linux file processing
• Use basic file manipulation commands to create, delete, copy, and move files and directories
• Employ commands to combine, cut, paste, rearrange, and sort information in files
• Create a script file
• Use the join command to link files using a common field
• Use the awk command to create a professional-looking report
Guide to UNIX Using Linux, Third Edition 3
Understanding File Structures
• Files can be structured in many ways depending on the kind of data they store– Employee information can be stored on
separate line separated by delimiters such as colon. This type of record is known as variable-length record.
– Another way to create a records is to have fixed number of column for each column. This type of record is known as fixed-length record.
Guide to UNIX Using Linux, Third Edition 4
Understanding File Structures (continued)
• UNIX/Linux store data, such as letters and product records, as flat ASCII files
• Three kinds of regular files are– Unstructured ASCII character, you can store any
kind of data in any order. You can’t retrieve a particular column, you have to print everything to get what you need.
– Unstructured ASCII records, it store data as a sequence of fixed-length record, contains similar information of different persons on different rows.
– Unstructured ASCII trees, it is structured as a tree of record, that can be organized as fixed-length or variable length record. Each record contains key that helps in searching record quickly.
Guide to UNIX Using Linux, Third Edition 5
Understanding File Structures (continued)
Guide to UNIX Using Linux, Third Edition 6
Manipulating Files
• Creating files• Delete files when no longer needed• Finding a file• Combining files using paste command and output
redirection• Separating files suing cut command• Sorting the contents of a file
Guide to UNIX Using Linux, Third Edition 7
Create Files
• Redirection sign & touch command will create an empty file.
$ > accountfiles$ touch accountsfile2
Syntax touch [-options] [filename(s)]Useful Options include:
-a update the access time only
-m update the last time the file was modified
-c prevent creating file, if it does not exist
Guide to UNIX Using Linux, Third Edition 8
Delete Files
• Delete files or directory permanently when no longer needed
• $ rm –i phonebook.bak
Syntax rm [-options] [filename / directory]Useful Options include:-i display warning before deleting a file-r will remove directory and everything it contains
Guide to UNIX Using Linux, Third Edition 9
Removing Directories
• Remove directory permanently when no longer needed
• $ rmdir documentsSyntax rmdir [-options] [directory]Useful Options include:
-i display warning before deleting a file ???-r will remove directory and everything it contains
• Note: Directory must be empty to delete with the rmdir command
Guide to UNIX Using Linux, Third Edition 10
Finding a File
• Finding a file helps you locate it in the directory structure
• $ find –i phonebook.bak
Syntax: find [pathname] [-name filename]Useful Options include:
Pathname: . is for current directory
-name indicates that you are searching for file
with specific name. You can use wild card
Guide to UNIX Using Linux, Third Edition 11
Combining Files
• Combining files using output redirection
– cat command - concatenate text of two different files via output redirection
– paste command - joins text of different files in side by side fashion
Guide to UNIX Using Linux, Third Edition 12
Sorting
• Sorting a file’s contents alphabetically or numerically.
Syntax sort [-option] [filename]
Useful Options include:-k n sort on key field specified by n-t indicates a specified char that separate fields-m merges input file that have been previously sorted-o redirect output to the specified file-d sorts in alphanumeric or dictionary order-g sorts by numeric order-r sorts in reverse order
Guide to UNIX Using Linux, Third Edition 13
Sorting Example
$ sort –k 3 food >sortedfood$ Cat sortedfoodLettuce Sourdough BeefSpinach White bread ChickenBeans Pumpernickel MuttonCarrots Whole Wheat Turkey
Guide to UNIX Using Linux, Third Edition 14
Sorting –n Example
• The –n option to sort specifies that the first field on the line is to be considered a number.
$ cat data5 272 123 3323 2-5 1115 614 -9
$ sort data-5 1114 -915 62 1223 23 335 27
$ sort –n data-5 112 123 335 2714 -915 623 2
Guide to UNIX Using Linux, Third Edition 15
Sorting +1n Example
• The +1 say to skip the first field. Similarly, +5n would mean to skip the first five field.
$ cat data5 272 123 3323 2-5 1115 614 -9
• Skip the first field in the sort
$ sort +1n data14 -923 215 6-5 112 125 273 33
Guide to UNIX Using Linux, Third Edition 16
Creating Script Files
• UNIX/Linux users create shell script files to contain commands that can be run sequentially as a set – this helps with the issues of command automation and re-use of command actions
• UNIX/Linux users use the vi editor to create script files, then make the script executable using the chmod command with the x argument
Guide to UNIX Using Linux, Third Edition 17
Creating Script Files (continued)
Guide to UNIX Using Linux, Third Edition 18
Using the join Command on Two Files
• Sometimes you want to link the information in two files
• The join command is often used in relational database processing
• The join command associates information in two different files on the basis of a common field or key in those files
Guide to UNIX Using Linux, Third Edition 19
The join Command
• Syntax: join [-option] [file1 file2]Useful Options include:• File1 and file2 are two input files that must be
sorted on join field.-1 fieldnum specifies the common join field in file 1-2 fieldnum specifies the common join field in file 2-o specifies a list of field to output
-t specifies the field separator character, space, tab or new line
-a filenum produce a file for each unpairable line
-e str replaces empty fields for the unpairable line in the string specified by str.
Guide to UNIX Using Linux, Third Edition 20
sed
• sed is a program used for editing data. It stands for stream editor.
• Example: To change first occurrences of “Unix” to “UNIX” on every line of intro
$ cat introThe Unix operating system was pioneered by Ken Thompson. Main goal of Unix was to create Unix environment for efficient program development
$ sed 's/Unix/UNIX/' introThe UNIX operating system was pioneered by Ken Thompson. Main goal of UNIX was to create Unix environment for efficient program development
Guide to UNIX Using Linux, Third Edition 21
sed with g option
• Example: To change all occurrences of “Unix” to “UNIX” on every line of intro
$ cat introThe Unix operating system was pioneered by Ken Thompson. Main goal of Unix was to create Unix environment for efficient program development
$ sed 's/Unix/UNIX/g' intro The UNIX operating system was pioneered by Ken Thompson. Main goal of UNIX was to create UNIX environment for efficient program development
Guide to UNIX Using Linux, Third Edition 22
sed with –n option
• Just print first 2 lines
$ cat intro The Unix operating system was pioneered by Ken Thompson. Main goal of Unix was to create Unix environment for efficient program development
$ sed –n '1,2p' intro The Unix operating system was pioneered by Ken Thompson. Main goal of Unix was to create Unix
Guide to UNIX Using Linux, Third Edition 23
sed with –n option
• Just print first lines containing Unix
$ cat intro The Unix operating system was pioneered by Ken Thompson. Main goal of Unix was to create Unix environment for efficient program development
$ sed –n '/Unix/p' intro The Unix operating system was pioneered by Ken Thompson. Main goal of Unix was to create Unix
Guide to UNIX Using Linux, Third Edition 24
sed with d option
• Delete lines 1 and 2
$ cat intro The Unix operating system was pioneered by Ken Thompson. Main goal of Unix was to create Unix environment for efficient program development
$ sed '1,2d' intro environment for efficient program development
• Delete all lines containing Unix
$ sed '/Unix/d' intro environment for efficient program development
Guide to UNIX Using Linux, Third Edition 25
A Brief Introduction to theAwk Program
• Awk, a pattern-scanning and processing language helps to produce professional-looking reports
• Awk provides a powerful programming environment that can perform actions on files that are difficult to duplicate with a combination of other commands
Guide to UNIX Using Linux, Third Edition 26
A Brief Introduction to theAwk Program (continued)
• Awk checks to see if the input records in specified files satisfy a pattern
• If so, awk executes a specified action
• If no pattern is provided, awk applies the action to every record
Guide to UNIX Using Linux, Third Edition 27
awk command
• Syntax:
awk [-Fsep] [‘pattern {action}…’] [filename]Useful Options include:-F: means the field separator is colon
awk ‘BEGIN { print "This is an awk print line." }’This is an awk print line.
awk –F: ‘{printf "%s\t %s\n", $1, $2}’ datafile