chapter five advanced file processing guide to unix using linux fourth edition chapter 5 unix (34...

34
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides) 1 CTEC 110

Upload: christina-parks

Post on 13-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

Chapter Five

Advanced File Processing

Guide To UNIX Using Linux Fourth Edition

Chapter 5 Unix (34 slides) 1CTEC 110

Page 2: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Use the pipe operator to redirect the output of one command to another command

• Use the grep command to search for a specified pattern in a file

• Use the uniq command to remove duplicate lines from a file

• Use the comm and diff commands to compare two files

• Use the wc command to count words, characters and lines in a file

Objectives

Chapter 5 Unix (34 slides) 2CTEC 110

Page 3: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Use manipulation and transformation commands, which include sed, tr, and pr

• Design a new file-processing application by creating, testing, and running shell scripts

Objectives (continued)

Chapter 5 Unix (34 slides) 3CTEC 110

Page 4: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Selection commands focus on extracting specific information from files

Advancing YourFile-Processing Skills

Chapter 5 Unix (34 slides) 4CTEC 110

Page 5: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Manipulation and transformation commands alter and transform extracted information into useful and appealing formats

Chapter 5 Unix (34 slides) 5CTEC 110

Advancing YourFile-Processing Skills (continued)

Page 6: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

Chapter 5 Unix (34 slides) 6CTEC 110

Advancing YourFile-Processing Skills (continued)

Page 7: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Using the Pipe Operator

– The pipe operator (|) redirects the output of one command to the input of another

– An example would be to redirect the output of the ls command to the more command

– The pipe operator can connect several commands on the same command line

Using the Selection Commands

Chapter 5 Unix (34 slides) 7CTEC 110

Page 8: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

Using the Pipe Operator

Chapter 5 Unix (34 slides) 8CTEC 110

Using pipe operators and connecting commands is useful when viewing directory information

Page 9: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Used to search for a specific pattern in a file, such as a word or phrase

• grep’s options and wildcard support allow for powerful search operations

• You can increase grep’s usefulness by combining with other commands, such as head or tail

Using the grep Command

Chapter 5 Unix (34 slides) 9CTEC 110

Page 10: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Removes duplicate lines from a file• Compares only consecutive lines, therefore uniq

requires sorted input

• uniq has an option that allows you to generate output

that contains a copy of each line that has a duplicate

Using the uniq Command

Chapter 5 Unix (34 slides) 10CTEC 110

Page 11: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

Using the uniq Command (continued)

Chapter 5 Unix (34 slides) 11CTEC 110

Page 12: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

Using the uniq Command (continued)

Chapter 5 Unix (34 slides) 12CTEC 110

Page 13: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Used to identify duplicate lines in sorted files• Unlike uniq, it does not remove duplicates, and it

works with two files rather than one• It compares lines common to file1 and file2, and

produces three column output– Column one contains lines found only in file1– Column two contains lines found only in file2– Column three contains lines found in both files

Using the comm Command

Chapter 5 Unix (34 slides) 13CTEC 110

Page 14: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Attempts to determine the minimal changes needed to convert file1 to file2

• The output displays the line(s) that differ• Codes in the output indicate that in order for the files

to match, specific lines must be added or deleted

Using the diff Command

Chapter 5 Unix (34 slides) 14CTEC 110

Page 15: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Used to count the number of lines, words, and bytes or characters in text files

• You may specify all three options in one issuance of the command

• If you don’t specify any options, you see counts of lines, words, and characters (in that order)

Using the wc Command

Chapter 5 Unix (34 slides) 15CTEC 110

Page 16: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

Using the wc Command (continued)

Chapter 5 Unix (34 slides) 16CTEC 110

The options for the wc command:

–l for lines

–w for words

–c for characters

Page 17: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• These commands are: sed, tr, pr• Used to edit and transform the appearance of

data before it is displayed or printed

Using Manipulation and Transformation Commands

Chapter 5 Unix (34 slides) 17CTEC 110

Page 18: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• sed is a UNIX/Linux editor that allows you to make global changes to large files

• Minimum requirements are an input file and a command that lets sed know what actions to apply to the file

• sed commands have two general forms– Specify an editing command on the command

line– Specify a script file containing sed commands

Introducing the sed Command

Chapter 5 Unix (34 slides) 18CTEC 110

Page 19: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• tr copies data from the standard input to the standard output, substituting or deleting characters specified by options and patterns

• The patterns are strings and the strings are sets of characters

• A popular use of tr is converting lowercase characters to uppercase

Translating CharactersUsing the tr Command

Chapter 5 Unix (34 slides) 19CTEC 110

Page 20: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• pr prints specified files on the standard output in paginated form

• By default, pr formats the specified files into single-column pages of 66 lines

• Each page has a five-line header containing the file name, its latest modification date, and current page, and a five-line trailer consisting of blank lines

Using the pr Command toFormat Your Output

Chapter 5 Unix (34 slides) 20CTEC 110

Page 21: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• The most important phase in developing a new application is the design

• The design defines the information an application needs to produce

• The design also defines how to organize this information into files, records, and fields, which are called logical structures

Designing a New File-Processing Application

Chapter 5 Unix (34 slides) 21CTEC 110

Page 22: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• The first task is to define the fields in the records and produce a record layout

• A record layout identifies each field by name and data type (numeric or nonnumeric)

• Design the file record to store only those fields relevant to the record’s primary purpose

Designing Records

Chapter 5 Unix (34 slides) 22CTEC 110

Page 23: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Multiple files are joined by a key: a common field that each of the linked files share

• Another important task in the design phase is to plan a way to join the files

• The flexibility to gather information from multiple files comprised of simple, short records is the essence of a relational database system

Linking Files with Keys

Chapter 5 Unix (34 slides) 23CTEC 110

Page 24: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

Chapter 5 Unix (34 slides) 24CTEC 110

Page 25: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• With the basic design complete, you now implement your application design

• UNIX/Linux file processing predominantly uses flat files• Working with these files is easy, because you can create

and manipulate them with text editors like vi and Emacs

Creating the Programmerand Project Files

Chapter 5 Unix (34 slides) 25CTEC 110

Page 26: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

Creating the Programmerand Project Files (continued)

Chapter 5 Unix (34 slides) 26CTEC 110

Page 27: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• The awk command is used to prepare formatted output

• For the purposes of developing a new file-processing application, we will focus primarily on the printf action of the awk command

Formatting Output

Chapter 5 Unix (34 slides) 27CTEC 110

Awk provides a shortcut to other UNIX/Linux commands

Page 28: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Shell scripts should contain:– The commands to execute– Comments to identify and explain the script so

that users or programmers other than the author can understand how it works

• Use the pound (#) character to mark comments in a script file

Using a Shell Script toImplement the Application

Chapter 5 Unix (34 slides) 28CTEC 110

Page 29: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• You can run a shell script in virtually any shell that you have on your system

• The Bash shell accepts more variations in command structures that other shells

• Run the script by typing sh followed by the name of the script, or make the script executable and type ./ prior to the script name

Running a Shell Script

Chapter 5 Unix (34 slides) 29CTEC 110

Page 30: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• An effective way to develop applications is to combine many small scripts in a larger script file

• Have the last script added to the larger script print a report indicating script functions and results

Putting it All Together toProduce the Report

Chapter 5 Unix (34 slides) 30CTEC 110

Page 31: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• UNIX/Linux file-processing commands are (1) selection and (2) manipulation and transformation commands

• uniq removes duplicate lines from a sorted file• comm compares lines common to file1 and file2• diff tries to determine the minimal set of changes

needed to convert file1 into file2

Chapter Summary

Chapter 5 Unix (34 slides) 31CTEC 110

Page 32: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• tr copies data read from the standard input to the standard output, substituting or deleting characters specified

• sed is a file editor designed to make global changes to large files

• pr prints the standard output in pages

Chapter Summary (continued)

Chapter 5 Unix (34 slides) 32CTEC 110

Page 33: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• The design of a file-processing application reflects what the application needs to produce

• Use record layout to identify each field by name and data type

• Shell scripts should contain commands to execute programs and comments to identify and explain the programs

Chapter Summary (continued)

Chapter 5 Unix (34 slides) 33CTEC 110

Page 34: Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110

• Work through Hands-on Projects at end of chapter 5

• Canvas: Review Questions 5– (Do not do questions 22,23,24 and 25)

• Quiz 5 Unix…

Chapter 5 Unix Exercises

Chapter 5 Unix (34 slides) 34CTEC 110