introduction to bash, awk, and perl victor anisimov, ncsa fiu / sserca / xsede workshop, apr 4-5,...

29
Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL March 17, 2022

Upload: esmond-lee

Post on 13-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

Introduction to BASH, AWK, and PERL

Victor Anisimov, NCSA

FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL

April 18, 2023

Page 2: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

MOTIVATION

• Increase Productivity of Research & Development

Scripting languages require less effort in implementation of small computational projects than that when using regular programming languages

Scripts are more portable than binary code

Scripts are easy to maintain

Lab materials: /home/anisimov/labs.tgz on FIU cluster

Important: type “module add make” after logging to FIU cluster

2Introduction to BASH, AWK, and PERL

Page 3: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

BASH, AWK, and PERL

BASH is a Linux shell

AWK is a language for data post-processing

PERL is a versatile programming language

Common feature:

interpreted programming languages

How to decide which one I will need:

project complexity dictates which language to use

3Introduction to BASH, AWK, and PERL

Page 4: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

Objective of the Course

As of now:•No prerequisites are necessary•No change in the way you think•No need to memorize abstract concepts

At the end of the day:•You will learn three programming languages•You will improve your project organization skills•You will increase your productivity

4Introduction to BASH, AWK, and PERL

Page 5: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

Every Project Works with Data

• Data generation by computation• Extraction of data from text files• Data format conversion• Data computation• Data analysis and reporting• Data archival and retrieval

Scripting languages can handle this work without turning the data processing into a major programming project

5Introduction to BASH, AWK, and PERL

Page 6: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

Projects have Complex Processing Flows• Input to a program depends on the result of

another program• The process includes many steps that need to be

automated• The process is not standard and has to be

created• The process needs to be optimized

6Introduction to BASH, AWK, and PERL

Scripting Languages are perfect for automation of repetitive processes

Page 7: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

Elements of Programming Language

• Data types• Conditional statements• Loops• Functions / procedures• Input / Output

7Introduction to BASH, AWK, and PERL

Our first guide to this virtual world is BASH shell.

Page 8: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

BASH Data Types

• BASH treats all variables as text strings• Limited support of integer arithmetics#!/bin/bashgreetings="Hello ${USER}!" # example of stringtoday=`date` # run a program by enclosing it in grave accentsecho "${greetings} Today is ${today}”N=1; let N=N+2; echo "Integer math: 1+2=${N}"R=0.1; R=`echo “$R+1.2” | bc -l`; echo "FP math: 0.1+1.2=${R}”

$ chmod 755 01-hello.sh$ ./01-hello.shHello victor! Today is Thu Apr 4 13:37:02 EST 2013Integer math: 1+2=3FP math: 0.1+1.2=1.3

8Introduction to BASH, AWK, and PERL

Page 9: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

BASH Conditional Statements

One more data type: built-in constants

$# - number of arguments; $0 - self name; $1, $2, … - command-line arguments

#!/bin/bash# supported string comparison conditions: == !=# supported arithmetic conditions: -eq (==) -ne (!=) –lt (<) -le (<=) -gt (>) –ge (>=)if [ $# != 2 ] ; then echo "USAGE $0 argument1 argument2" ; exitfiif [ $1 -gt $2 ] ; then echo "True: $1 -gt $2" else echo "False: $1 -gt $2" fi

$ ./02-conditions.sh

9Introduction to BASH, AWK, and PERL

Page 10: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

BASH Loops

• Loop over listLIST="01 02 03 04 05” example: 03-loops.sh

for job in ${LIST} ; do

echo "job number ${job}”

done

• Conditional loopN=1

while [ ${N} -le 5 ] ; do

echo ${N}

let N=N+1

done

• C-style loop for ((a=1; a <= LIMIT ; a++))

10Introduction to BASH, AWK, and PERL

Page 11: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

BASH Procedures / Functions

• Functions contain repetitive part of the code#!/bin/bash

# declaration of function

filenameGenerator()

{

echo "$1.out"

}

# call the function and supply arguments

filenameGenerator 1

filenameGenerator 2

$ ./04-functions.sh

$ 1.out

$ 2.out

11Introduction to BASH, AWK, and PERL

Page 12: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

BASH Input / Output

• I/O is extremely simple in BASH

cat file.out send file content to std output

mycode.sh | mytool.sh send output to another program

mycode.sh > /dev/null get rid of unwanted output

mycode.sh &> log.out & detach from terminal

12Introduction to BASH, AWK, and PERL

Page 13: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

Sample BASH Project

• Perform context replacement in text file05-project.sh

#!/bin/bash

if [ $# -ne 1 ] ; then

echo "Usage: $0 file.coor”

else

# create name for output file

outfile=`echo $1 | sed 's/\.coor/\.pdb/'`

# replace "HETATM" by "ATOM " in the text

cat $1 | sed 's/HETATM/ATOM /' > $outfile

# count number of processed lines

wc -l $outfile

fi

13Introduction to BASH, AWK, and PERL

Page 14: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

AWK

• Although simple and powerful, BASH code can quickly become bulky because of limited structural constructs

• AWK designed to simplify data extraction and post-processing; and thus it nicely complements BASH when computational projects become a little more involved

14Introduction to BASH, AWK, and PERL

Developed by Aho, Weinberger, and Kernighan

Page 15: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

The Power of AWK in Action

• Compute sum of number in the one-line code#!/bin/bash

awk 'BEGIN{sum=0} {for (i = 1; i <= NF; i++) sum += $i} END{print sum}’

$ echo "1.2 2.3 3.4" | ./01-sum.sh

$ 6.9

AWK logistics:• section BEGIN{…} is executed once in the beginning• standard input is processed by main program body, i.e. by second {…} block

• NF is a built-in constant equal to number of fields obtained from standard input

• $1, $2, … individual input fields

• i is loop index, so we can address each field as $i• input fields are processed in the C-style for-loop and their value is summed up

• Section END{…} is executed once in the end of execution• Variable type is automatically recognized by awk based on operation type

15Introduction to BASH, AWK, and PERL

Page 16: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

AWK: Input Field Separator (option –F)

• AWK accepts custom field separators

#!/bin/bash

awk -F$1 '{for (i = 1; i <= NF; i++) print $i}’

Use comma as field separator

$ echo "1,a,3,b:5" | ./02-inpfields.sh ,

1

a

3

b:5

Challenge: Try using different field separators

16Introduction to BASH, AWK, and PERL

comma character

Page 17: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

AWK: PDB-to-XYZ Format Conversion

#!/bin/bash

# Convert PDB file to XYZ format

if [ $# -ne 1 ] ; then

echo "Usage: $0 input.pdb"

else

cat $1 |

awk 'BEGIN {n=0}

{ if($1 == ”ATOM") {n=n+1; a[n]=$3; x[n]=$5; y[n]=$6; z[n]=$7} }

END {

printf "%d\n\n", n;

for (i=1; i<=n; i++)

printf "%-5s %7.3f %7.3f %7.3f\n", a[i], x[i],y[i],z[i];

}'

fi

17Introduction to BASH, AWK, and PERL

03-convert.sh Arrays in AWK are super easy !!!

Page 18: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

AWK: Column Block-average#!/bin/bash

# compute block-average for data from loan.out

if [ $# -ne 2 ] ; then

echo "USAGE: $0 blocksize column” ; exit

fi

cat loan.out | awk -v blocksize=$1 -v column=$2 '

BEGIN{n=0; j=0}

{ if(NF==10) {x[n]=$column; n++} } # read all data

END{

nblocks = n / blocksize;

for(i=0; i<nblocks; i++){ # loop over blocks

aver=0.0; # compute average for each block

for(nRecs=0; nRecs<blocksize && j<n; nRecs++) { aver += x[j]; j++ }

printf "%4d %9.3f %d\n", i+1, aver/nRecs, nRecs;

}

}'

18Introduction to BASH, AWK, and PERL

04-blockaverage.sh

Page 19: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

AWK: Multiple Input Files

• Alternative processing of input data from a file#!/bin/bash

# alternative way of handling input files

inpfile="loan.out” # input file to be processed

nlines=`wc -l ${inpfile} | awk '{print $1}’` # get number of lines

awk -v inpfile=${inpfile} -v size=${nlines} '

BEGIN{

command = "cat " inpfile; # string concatenation

for(i=0; i<size; i++) {

command | getline; # getting a line from the file

if(NF==10) print $0; # print entire line

}

}'

19Introduction to BASH, AWK, and PERL

05-nfiles-demo.sh06-nfiles-full.sh

Page 20: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

AWK: Functions – Return Absolute Value

• Compute absolute value#!/bin/sh

awk 'function abs(x){return ((x+0.0 < 0.0) ? -x : x)} {print abs($1)}’

$ echo -23.11 | ./07-function.sh

23.11

20Introduction to BASH, AWK, and PERL

Page 21: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

AWK: Writing to File

• AWK writes to file by using the mechanism of output redirection

#!/bin/sh

# redirecting output to a file

if [ $# -ne 1 ] ; then

echo "Usage $0 input.pdb" ; exit

fi

output=`echo $1 | sed 's/\.pdb/\.txt/'`

cat $1 | awk -v fname=${output} '{print $0 > fname}'

21Introduction to BASH, AWK, and PERL

08-file.sh

Page 22: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

Exercise

22

NCSA Loan Simulator (copy left) FIU Workshop 2013, will be our computational kernel

Input:

Starting balance = $ 1000.00

Annual interest = % 7.20

Minimum payment = % 1.00

Output:

month: 1 balance: 1006.00 charge: 6.00 payment: 259.00 interest: 6.00

month: 2 balance: 751.48 charge: 4.48 payment: 259.00 interest: 10.48

month: 3 balance: 495.43 charge: 2.95 payment: 259.00 interest: 13.43

month: 4 balance: 237.85 charge: 1.42 payment: 237.85 interest: 14.85

Simulation results:

Borrowed 1000.00

Paid 1014.85 in 4 months

Finance charge 14.85

Introduction to BASH, AWK, and PERL

Write a script to optimize the loan duration

The program is not flexible enough; so, how to get the answer we need?

Page 23: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

PERL

• Full fledge (interpreted) programming language• Highly optimized and amazingly fast• Ideal for data processing and data extraction• Lots of reusable plug-ins available for download• Fast learning curve• If you know C-language, you already know Perl

23

Practical Extraction and Reporting Language by Larry Wall

Introduction to BASH, AWK, and PERL

Page 24: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

PERL: Program Structure#!/usr/bin/perl –w

my $inpFileName = ""; # string

my $sum = 0.0; # floating point

if (@ARGV != 1) { # number of command-line arguments

printf " USAGE %s loan.out\n", $0; exit }

else {

$inpFileName = $ARGV[0];

unless (open INP, "<$inpFileName") { die "Error: Cannot open input file $inpFileName” }

readData();

close INP;

print "All Done\n";

}

sub readData {

}

24Introduction to BASH, AWK, and PERL

enable warnings

$0 is self program name

mandatory semicolon at the end of line

open file descriptor for reading (<)close file descriptor after reading is done

do the work here (will be described later)

read 1st command-line argument

Page 25: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

PERL: Pattern MatchingExtracting specific parts from text files is often a non-trivial task

# Patterns

my $ap = "\\S+"; # Any pattern

my $lp = "\\w+\\d*"; # Label (text) pattern

my $ip = "-?\\d+"; # Integer pattern

my $rp = "-?\\d*\\.?\\d*"; # Real pattern

my $ep = "[+|-]?\\d+\\.?\\d*[D|E]?[+|-]?\\d*"; # Exponential pattern (scientific format)

\s – space [+|-]? – either + or – or neither

\S – non-space + – one or more same instances

\w – word character (a-zA-Z0-9) ? – optional instance

\W – anything but word character * – any number of same instances

\d – numeric character (0-9)

\D – anything except numeric

\. – any character

25Introduction to BASH, AWK, and PERL

mask multiplier

Page 26: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

PERL: Arrays

@ARGV # built-in array for command-line arguments

my @array = (); # array declaration

# accessing array elements

for(my $i=0; $i < $nRecords; $i++) {

printf "%9.3f \n", $array[$i];

}

# returning and passing arrays

($nRecords, $total) = readData( $ARGV[1], \@array );

sub readData {

my ($column, $data) = @_;

$$data[$i] = $substring; # such array must be handled as a pointer

}

26Introduction to BASH, AWK, and PERL

Page 27: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

Exercise: Data Extraction Project

• Use the data from loan.out• Read a specified column• Sum up the values

• Extra credit: make sure that the values to be summed up have type real

27Introduction to BASH, AWK, and PERL

01-parser.pl

Page 28: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

Useful Internet Resources

• BASHhttp://tldp.org/LDP/abs/html/

• AWKhttp://www.gnu.org/software/gawk/manual/gawk.html

• PERLhttp://www.perl.org

book: Learning Perl, Author: Randal L. Schwartz, O’Reilly

28Introduction to BASH, AWK, and PERL

Page 29: Introduction to BASH, AWK, and PERL Victor Anisimov, NCSA FIU / SSERCA / XSEDE Workshop, Apr 4-5, 2013, Miami, FL December 5, 2015

Let Us know your opinion

http://www.bitly.com/fiuworkshop

Thank you !!!

29Introduction to BASH, AWK, and PERL