science on a (linux) computer in three...

49
Science on a (LINUX) computer in three parts Introductory short couse, Part I Thorsten Becker University of Southern California, Los Angeles September 2007

Upload: vantu

Post on 08-May-2018

218 views

Category:

Documents


2 download

TRANSCRIPT

Science on a (LINUX) computer in three parts

Introductory short couse, Part I

Thorsten BeckerUniversity of Southern California, Los Angeles

September 2007

Contents

● Part one: UNIX and computers

● Part two: Scientific computing and data

analysis

● Part three: Matlab practical

Purpose of this short course

● Introduce UNIX-based (e.g. LINUX, Mac OSX) computing and scientific work flow environments by providing pointers for further information

● Describe what I think are best practices in moderate to high-performance computing– I will make judgments and provide specific

recommendations– I cannot possibly provide a comprehensive, fair,

or entirely up to date overview

Typographic conventions

● Most important:

– links to web based information are blue

● UNIX (or shell) commands and program names you might type at the command line are written in bold

Contents part I

● UNIX (or LINUX, used synonymously here): what and why

● The file system and Window managers● Shell environment● Editing files● Command line tools● Scripts and GUIs● Type setting, publishing, layout

UNIX: What is UNIX?

● an operating system that originated in the 70ies● build for multi-user, multi-tasking, scalable (that

was new way back then)● runs on all computing hardware, including iPOD● many flavours, free: LINUX, BSD (a version

made it into OSX), Solaris (SUN)● they are all kind of the same thing, your mileage

may vary (e.g. directory structure)● there is convergence between LINUX, Mac OS,

and Windows look and feel

UNIX: Why use UNIX?

● can use same tools and programs on laptop, workstation, and supercomputer (less important if virtualization is available)

● flexible, modular, powerful● seamless integration of C and F90 programs,

shell commands, and post-processing (UNIX is written in C)

● all important numerical tools and libraries are available

● LINUX is open (security!), and ubiquitous

UNIX: Why (not) use UNIX?

time

po

wer

effo

rt

UNIXUNIX

WINDOWS

WINDOWS

UNIX: This overview

● describes typical, ca. anno 2005, scientific workplace set-up in natural sciences

● fairly low-level, close to machine● tries to not

– spend a lot of time on point-and-click GUIs– discuss vapor ware– discuss cutting-edge programs (with unclear

support situation and user base)● will be out of date tonight

File system: Gnome

File system: Graphical Window managers and tools

● GNOME● KDE● Provide support

and interface with other apps (search, web, access files on other servers, etc.)

File system: Hardcore: The actual file system

● user versus super-user (administrator) setup● tree structure of files within directories:

– /usr/local has software– /dev has devices– /home/$USER has all the user's files, which might

be subdivided into folders like–

– /mnt/data/ might hold shared data/storage

File system: But where is it? The shell

● open a shell to get a command line● type commands, such as ls to list the

contents of a directory

Even if you regularly use Mac OS-X or GNOME, someknowledge of the background

can save the day!

File system: Naming conventions

● suffixes indicate type of file: file.dat, file.c, file.f, file.f90, file.awk, file.txt, file.tex, file.ps (and determines helper applications)

● UNIX is case sensitive● normally, use lower case for files and

directories● some symbols (e.g.: *, %, ?) are special, if you

want those literally you got to quote (\*, \%, \?)● different quotes (”, ', `) have different meanings

File system: ls: list contents of directories

becker@jackie:~ > lscalendar  data     dokumente  idl_gmt  mail    plates  public_html  RCS             subduct   TEX  unison.logCITCOM    Desktop  evolution  ioffice  mylibs  progs   quakes       Screenshot.png  teaching  tmpbecker@jackie:~ > ls ­F ­ltotal 6500­rw­r­­r­­   1 becker users    1638 Jun 17 07:39 calendardrwxrwxr­x   4 becker users    4096 Jun 17 07:39 CITCOM/drwxr­xr­x  35 becker users    4096 Jul 12 15:22 data/drwx­­­­­­   2 becker users    4096 Jul 26 17:20 Desktop/drwxr­xr­x  25 becker users    4096 Jul 12 07:48 dokumente/drwx­­­­­­   7 becker users    4096 Jun 17 11:53 evolution/drwxr­xr­x   3 becker users   20480 Jul 27 15:00 idl_gmt/drwxr­xr­x   3 becker users    4096 Jun 17 07:39 ioffice/drwx­­­­­­   2 becker users    4096 Jul  7 12:21 mail/drwxr­xr­x  15 becker users    4096 Jun 17 07:46 mylibs/drwxr­xr­x  12 becker users    4096 Jun 17 07:46 plates/drwxr­xr­x  12 becker users    4096 Jun 17 07:46 progs/drwxr­xr­x  27 becker users   12288 Jul 18 19:20 public_html/drwxr­xr­x   4 becker users    4096 Jun 17 07:47 quakes/drwxrwxr­x   2 becker users    4096 Jun 17 07:39 RCS/­rw­r­­r­­   1 becker users   35775 Jul 27 16:15 Screenshot.pngdrwxrwxr­x   5 becker users    4096 Jun 17 07:47 subduct/lrwxrwxrwx   1 becker users      19 Jun 17 07:39 teaching ­> dokumente/teaching//drwxr­xr­x  29 becker users    4096 Jul 26 17:39 TEX/lrwxrwxrwx   1 becker users      12 Jun 16 17:28 tmp ­> /mnt/dos/tmp/­rw­­­­­­­   1 becker users 6508582 Jul 27 15:01 unison.log

File system: Commands have options

● command output and workings can be modified by adding -x (or x for tar)

● ls:– ls -F– ls -la

● usually, you can do “command --help” to learn more

● often, there are long version: ls --all --full● man pages (RTFM): “man command”

File system: File system commands I

● cp: copy files (will normally overwrite!)– cp filea fileb

● rm: remove files (for real!)– rm goneforever.dat– rm -i goneforever.dat

● mkdir: make directories– mkdir new_dir/

● cd: change directories (cd ..; cd -; cd ~)● pwd: print current directory

File system: File system commands II

● scp: copy files across machines– scp filea [email protected]:~/directory/fileb

● more: display files– more filea.dat

● ln: create (symbolic) links (shortcuts in Windows) – cd new_dir– ln -s ../old_dir/script .– soft vs. hard: deletion of hard link deletes file

File system: Using regular expressions

● * (all): cp *.dat new_dir/

● [pat] (pattern): cp file[1-5].dat new_dir

● ? (single letter/number): cp file??.dat new_dir

● rm -rf * (DON'T TRY IT, IT WORKS)

File system: Permissions

● first character: - (file), d (directory), l (link)● r: read w: write x: execute or list● u: user g: group a: all o: other

– chmod u+x file– chmod a+r *.dat– chmod -R o-rwx my_stuff

● whoami, id: output of user and group

­rw­r­­r­­   1 becker users    1638 Jun 17 07:39 calendar{ {{

{u g auser group size ctime filename

Shells: The environment

● shells: interpret your commands when logged in and using a terminal session

● csh, tcsh: nice for interactive stuff, syntax close to C, command completion, auto correction

● bash, ksh: nice for programming● shells use mostly same commands, but there

are differences in the script languages, e.g.– export var=100 (bash)– setenv var 100 (csh)

Shells: Can use variables, and many are predefined (csh example)

becker@jackie:~ > setenv region 0/360/­90/90

becker@jackie:~ > echo $region0/360/­90/90

becker@jackie:~ > echo $HOME/home/becker

becker@jackie:~ > echo $USERbecker

becker@jackie:~ > envBIBINPUTS=.:/home/becker/TEX//:CFLAGS_DEBUG=­g ­DDEBUG ­DDEBUG ­DLINUX_SUBROUTINE_CONVENTIONLDFLAGS=­posixlib ­nofor_main  ­Vaxlib  ­L/usr/lib/gcc/i386­redhat­linux/3.4.3/ ­lg2c ­lmMANPATH=/home/becker/progs/man/:/home/becker/progs/man/:DVIPSHEADERS=/home/becker/TEX//:SUPPORTED=en_US.UTF­8:en_US:enSSH_AGENT_PID=31881HOSTNAME=jackie.usc.eduDXROOT=DXMEMORY=128CONFC=ifortHOST=jackie.usc.eduSHELL=/usr/bin/tcshFFLAGS_DEBUG=­g ­DDEBUG ­fpp    ­nofor_main  ­DDEBUG

....

Shells: Source startup scripts

● ~/.login (= $HOME/.login) at startup● ~/.cshrc every time you start a shell● those scripts are where you define

environment variables and aliases you want to use in all sessions– alias rm 'rm -i'– setenv F77 ifort; setenv FFLAGS “-O3 -ipo”

● see references on UNIX and dotfiles.com

Shells: A few lines from my .tcshrc

# set architecture flag, e.g., ip27 for IRIX  and i686 for Pentium#setenv ARCH `uname ­m | gawk '{print(tolower($1))}'`## hostname without domain#setenv myhostname `hostname | gawk '{split($1,a,".");print(a[1])}'`#if ( $ARCH == "i686" ) then    #    # Pentium/Xeon Linux system    #    # GMT etc    setenv GMT_VERSION GMT3.4.5    #setenv GMT_VERSION GMT4.0    set local_gmt_path = /usr/local/src/${GMT_VERSION}/    set local_netcdf_dir = /usr/local/src/netcdf­3.5.0/

...set rmstarset corect=cmdset autocorrectset nobeepset prompt=”%B%n@%m:%b%~\n> “

....

Shells: Command history and other feature that save typing

● use up, down, left, right arrows to navigate and edit commands on the command line

● use a bunch of tricks to access and modify last commands, e.g.– !n: execute last command that starts with “n”

● auto-completion (TAB key)● auto-correction● many more tricks

Shells: Job control

● ps: list currently running processes● jobs: list current jobs (processes started from

shell in background)● running commands in background

– emacs & (or: emacs; CTRL-Z; bg)– echo mybigjob.exe | nohup (don't quit with shell)– kill %2 (kill the second job running, % are job IDs)– kill -9 12344 (kill process with PID 12344)

● top: show machine load

Shells: Job control example

becker@jackie:~ > ps  PID TTY          TIME CMD 1758 pts/5    00:00:00 tcsh 2500 pts/5    00:00:00 ps

becker@jackie:~ > ps aux | tail

becker    1413  0.0  0.0  4348 1004 ?        S    15:10   0:00 /bin/sh /usr/bin/realplay /tmp/youfm_cms.rambecker    1418  3.2  1.1 79664 12000 ?       Sl   15:10   3:24 /usr/local/RealPlayer/realplay.bin /tmp/youfm_cms.rambecker    1420  0.0  0.5 25720 5260 ?        S    15:10   0:00 /usr/local/RealPlayer/realplay.bin /tmp/youfm_cms.rambecker    1421  0.0  0.5 25720 5260 ?        S    15:10   0:00 /usr/local/RealPlayer/realplay.bin /tmp/youfm_cms.rambecker    1642  0.0  0.1  5200 1776 pts/3    Ss+  16:03   0:00 ­cshbecker    1703  0.0  0.1  6624 1764 pts/4    Ss+  16:08   0:00 ­cshbecker    1758  0.0  0.1  5328 1984 pts/5    Ss   16:14   0:00 ­cshbecker    1888  0.0  1.6 27332 17228 ?       S    16:26   0:01 /usr/lib/acroread/Reader/intellinux/bin/acroread ­display :0.0 ­name main ­visual default +useFrontEndProgram ­xrm *useNullDoc:false ­progressPipe 3 ­xrm *noPrivateColormap:true ­xrm *exitPipe:4becker    2501  0.0  0.0  3032  772 pts/5    R+   16:54   0:00 ps auxbecker    2502  0.0  0.0  4212  532 pts/5    R+   16:54   0:00 tail

becker@jackie:~ > kill 1413...

Shells: Cluster job control

● on large, parallel machines one typically runs batch schedulers or queing systems

● this allows distributing jobs and utilizing resources efficiently

● PBS– qsub myjob.exe -tricky_options -q large– qstat | grep $USER– pbstop– qdel job-ID

Editors: Editing text or ASCII data files

● vi: old school: fast, efficient, bizarre– controlled by typing commands like !w, /text– good for minor editing tasks, required for admins

● emacs: best overall tool– GUI, menus– flexible, expandable– bizarre

● tons of others, but don't use Word or such, since UNIX expects pure ASCII characters

Editors: What (x)EMACS looks like

UNIX tools: Command line tools for file management

● more, less: display files page by page interactively

● cat: display file● head: display first few lines of file● tail: guess● paste: align files with columns row by row

● paste file1.dat file2.dat ● wc: count words, lines, and bytes of file

UNIX tools: Pipes and redirection (ksh example)

● >: redirect stdout, <: stdin, 2>: stderr● >>: append, |: pipe

– cat file1.dat > combined.dat

– cat file2.dat >> combined.dat

– cat file1.dat | wc● myconvectioncode.exe < input.dat● echo Whatever! > /dev/null● mycode > log.dat 2> error.dat

UNIX tools: grep and sort

● grep: find patterns in file– grep my_function *.c | more– grep -ni my_function.*c (disregard case and list

line numbers)● sort: sort row data

– sort -n +2 file.dat● uniq: only print unique lines

– sort -n splitting.dat | uniq > stations.dat

UNIX tools: awk and sed

● awk: (or gawk) powerful language for ASCII data and text manipulations– like C, interpreted at run time– the best thing since sliced bread

● cat file.dat | gawk '{print($2,cos($5))}' or● gawk '{print($2,cos($5))}' file.dat

● sed: streaming text editor– sed 's/Bush/Kerry/g' file.dat > new_file.dat

● perl: more powerful, more complex

UNIX tools: compression and dealing with big files

● gzip: compress ASCII files, which can be huge, to binary– compress: gzip file– uncompress: gunzip file.gz

● can write gzipped from within C, can use gunzip on the fly (zcat): this allows using nice ASCII tools such as awk while storing things compactly

● bzip2: smaller files, takes longer

UNIX tools: storage and backup

● tar: package multiple files into one file or tape drive (e.g. to backup or send across internet)– pack: tar cvf package.tar file_dir/*– display contents: tar tf package.tar– expand: tar xvf package.tar

● Unison file synchronizer (to sync laptop and workstation)

● backup all the time (it's easy to screw up big time), or have your admin backup for you

Please don't forget to backup

UNIX tools: Getting smart: a few tricks

● unpacking on the fly:– gunzip -c newsoftware.tgz | tar xv

● interpreting commands on the fly:– echo $variable_a `cat file.dat | gawk -f mean.awk`

● tcsh interactive functionality:– foreach f ( *.ps)

● convert $f $f:r.gif– end

Scripts: Scripts and GUIs

● scripts are the opposite of point-and-click● need to work hard once to generate template● benefit forever if you want to produce more

products (e.g. plots) using different parameters, or if the data has changed

● automate research galore (needed to explore parameter space)

● scripts can serve as documentation of steps taking to analyse data and produce results

Scripts: An example script#!/bin/bash## run run_fstrack for different models#models=${1­"pmDsmean_nt pmDnngrand_nt saf1 saf2 saf3 "}strains=${2­"2 1 0.5"}

# PBS queues to useq1="becker64";q2="scec"

c=0for m in $models;do    cd $m    for s in $strains ;do

if [ $c ­eq 1 ];then     queue=$q1;c=0

else     queue=$q2;c=1

fi# regional../run_fstrack 1 0 0 0 $s 1 2 1 ­14 0 60 "" 1 $queue

done    cd ­done

Scripts: scripting languages

● csh, tcsh● bash● perl● python● Tk● Scripted programs

– gnuplot– Matlab

● Script whenever you can (because you'll want to reproduce things exactly)

Scripts: Visual scripting languages:Tcl/TK (e.g. iGMT), GTK, Tkinter, Qt

Scripting

● Can be the way to go if individual processing steps are not time-sensitive

● If speed is an issue, need to compile from higher level language such as script

● For basic LINUX automatization tasks, bash and awk are very useful

● Python seems to be a nice middle ground for more advanced projects

Virtualization: NX export your workstation to, like, wherever

Virtualization: run any OS (e.g. Windows) in, like, whatever

USC geosys Wiki I

USC geosys Wiki II

The next lecture will be on

● Programming– common languages– philosophy– compiling, debugging, make, version control– C and F77 interfacing– libraries and packages

● Number crunching● Scientific visualization● Scientific typesetting (why not to use Word,

yet, if you need equations)