linux command line basics ii: downloading data and...

Post on 01-Feb-2018

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Linux command line basics II: downloading data and

controlling filesYanbinYin

1

2

Learningprogramminghastogothroughthehands-onpractice,alotofpractice

HearingwhatIdescribeaboutacommandoraprogramhelps,butyouwillnotbeabletodoitunlessyoutypeinthecodesandrunittoseewhathappens

Readingothers’codeshelpsbutoften isharderthanwritingitbyyourself fromscratch

Althoughpainfulandfrustrating,trouble-shootingisnormalandpartofthelearningexperience(askexperiencedpeopleorgoogle)

Toavoiderrors,youhavetofollowrules;mosterrorsoccurredinprogrammingarebecauseofnotknowingrulesorforgettingrules

Usecommentsincaseyouforgetwhatyou’vewrittenmeans

write->run->errors->edit->errors->………………………………….. ->run->success

Goodnews:finishedscriptscouldbereusedoreditedforlateruse

Thingsyoushouldknowaboutprogramming

Homework#71.Createafolderunderyourhomecalledhw7

2.Changedirectorytohw7

3.GotoNCBIftpsite,findthegenome,bacteria,ecoli MG1655folder,anddownloadtheptt fileandthefaa fileinthere

4.Createacopyoftheppt file,iftheoriginalfileiscalledA.ptt,namethecopiedfileA.ptt.bakDothesamethingforthefaa file

5.Readthechapter5ofhttp://edu.isb-sib.ch/pluginfile.php/2878/mod_resource/content/3/couselab-html/content.html andfinishallquizzesinthere

6.Usewhatyoulearnedinchapter5tocounthowmanyproteinsequencesareinthefaafle ofstep4.

3

Writeareport(inwordorppt)toincludealltheoperations/commands andscreenshots.

DueonNov10(sendbyemail)Officehour:Tue,ThuandFri2-4pm,MO325AOremail:yyin@niu.edu

Whatwelearnedlastclass:

filesystem,relative/absolutepaths,workingfolder,homefolder

ssh,pwd,lscd,mkdir,rmdir,rm,mancp,mv

Ifthingsgowrong, try:

Ctrl+c (sometimesmultiple times)

qtoexitfrommanpage

4

http://korflab.ucdavis.edu/Unix_and_Perl/unix_and_perl_v3.1.1.pdf5

Viewfiles:more, less, head, tail

HowyouuseTabkeytoautocomplete

less /home/ thenhittabtwice,youwillseeallfiles/foldersunder /home/less /home/yyin/ thenhittabtwice,youwillsee…

less /home/yyin/U thenhittabonce,Unix_and_Perl_course willbeautocompleted

less /home/yyin/Unix_and_Perl_course/ keepdoing thisuntilyouget

less /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/At_proteins.fasta

See next page for screen shots

q:quitviewing ↑or↓:moveupordownalinespace:nextpage />:searchfortext‘>’BorPgUp:backapage ForPgDn: forwardapagen:findnextoccurrenceof‘abc’G:gototheend ?:findpreviousoccuence of ‘abc’

6

HowyouuseTabkeytoautocomplete

7

more /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/At_genes.gff

moreissimilartoless,butcandolessthanless

head /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/chr1.fasta

head -20 /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/chr1.fasta

headtodumpthetopfewlinestothescreen

tail /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/intron_IME_data.fasta

tail -20 /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/intron_IME_data.fasta

tailtodumpthelastfewlinestothescreen

more, less,head, taildonot loadallfilecontenttothememoryYoucaneditthefilecontenteither, theyarejustviewers 8

9

CreateoreditfilesTexteditors:nanopicovi

Supposeyouareatyourhome:

WritethetoppartoftheintAt_genes.gff filetoanewfilehead -20 /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/At_genes.gff > head

Trynano (Intuitiveuserinterface)nano head

Tryvi(command-driven interface,butmuchmorepower)vi head

Createafilefromscratchusingvi.1) youtypevi filename andhitenter2) afteryouareinvi,typei togetintoeditmodeandcopy&pastecontentinvi3) hitEsc toexiteditmodeandthen:x tosavethefileandexitvi.

10

Inputandoutputredirection:thegreater-thansign

Unixhasaspecialwaytodirectinputandoutputfromcommandsorprograms.

Bydefault,theinput isfromkeyboard (calledstandardinput,stdin):youtypeinacommandandShelltakesthecommandandexecutesit.

Thestandardoutputbydefaultistotheterminalscreen(stdout);

ifthecommandorprogramfailed,youwillalsohavestandarderrorsdumped totheterminalscreen(stderr).

However, ifyoudonotwanttheoutputdumped tothescreen,youcanuse“>”toredirect/writetheoutput intoafile.Forexample,try

ls /home/yyinls /home/yyin > listls /home/yyimls /home/yyim 2> err

“2>”todumptheerrormessageNospacehere!

11

vibasics

commandmode editmodeiEsc

Thefollowing commandsoperateincommandmode(hit Esc beforeusingthem)x deleteonecharacteratcursorpositionu undodd deletethecurrentlineG gotoendoffile1G gotobeginning offile10G gotoline10$ gotoendofline1 gotobeginning ofline:q! exitwithoutsaving:w save(butnotexit):wq or:x saveandexitArrowkeys: movecursoraround (inbothmodes)

http://cbsu.tc.cornell.edu/ww/1/Default.aspx?wid=36

12

Searchandsubstitutioninvi

Incommandmode,youcandoanumber offancythings.Themostusefulare:

- Search: hitslash(“/”)togetthecursortotheleft-bottomcorner;youcantypeanywordorlettertosearchit;typentogotothenextinstance

- Replace:hitEsc(atanytime,hittingEsctogetbacktothedefaultstatusisthesafestthing todo)andtype“:1,$s/+/pos/g”andthenenterwillreplaceall“+”to“pos”.

Trythisinvi head

:1,$s/+/pos/gReady to type in command

From the first line to the last

Substitution

The first field: to be replaced

The second field: to replace with

all instances in a row

1) hitEsc toexiteditmodeandthen:q!toNOTsavethefileandexitvi.

Wildcardsandregularexpression

Regularexpression(regexorregexp)isaverypowerfultoolfortextprocessingandwidelyusedintexteditors(e.g.vi)andprogramminglanguages(e.g.Shellcommands:sed,awk,grep andperl,python,PHP)toautomaticallyedit(matchandreplacestrings)texts.

Findingandreplacingexactwordsorcharactersaresimple,e.g.theviexampleshownabove

However,ifyouwanttomatchmultiplewordsorcharacters,youwillneedwildcardsorpatterns.

13

alistofcommonlyusedwildcardsandpatterns:

* anynumbersofletters,numbersandcharactersexceptforspacesandspecialcharacters,e.g.()[]+\/$@#%;,?

. anysingleletter,numberandcharacterincludingspecialcharacters^ startofaline$ endofaline^$ anemptyline,i.e.nothingbetween̂ and$[] createyourownpattern,e.g.[ATGC]matchesoneofthefourlettersonly,

[ATGC]{2}matchestwosuchletters;[0-9]:anynumbers

\w anyletter(a-zandA-Z)\d anynumber(0-9)+ previousitemsatleastonetimes,e.g.\w+matcheswordsofanysizes{n} previousitemsntimes,e.g.\w{5}matcheswordswithexactlyfiveletters\s space\t tabularspace\n newline

caret

http://www.bsd.org/regexintro.html

Curlybrackets

14

This overwrite the head file:head -20 /home/yyin/Unix_and_Perl_course/Data/Arabidopsis/At_proteins.fasta > head

vi head

Inside vi, try :1,$s/ *//g

Hit u to undo

What about :1,$s/ .*//g

1) hitEsc toexiteditmodeandthen:x tosavethefileandexitvi.

Useregex insidevi

15

16

Getdatafromremoteftp/httpwebsiteftplftpsftpncftp

lftp addr command to connect to a remote ftp servercd dir change to the directorycd .. change to the upper folder (..)ls list files and folders in the current directory at oncels dir list files and folders in dir at oncels | less list page by page (good if the list is too long)get file get a filemirror dir get a folderzless file view the file contentby or bye exit lftp

NCBIftpsite:

ConnecttoNCBIftpsite:lftp ftp.ncbi.nih.gov

Thepromptwillchangeto:lftp ftp.ncbi.nih.gov:/>

After‘>’youcantypeincommandandhitenter:lftp ftp.ncbi.nih.gov:/>ls

Theftpsitecanalsobeaccessedthroughawebbrowser

17

ls command:

listfilesandfolders

18

Wherebacterialgenomesareintheftpsite?

19

Theendofthepageafterls

20

cdne

Thenpresstabkeytoauto-completeorlist

21

Howtotransferfilebetweenalinux andawindowsmachine?UseSSHsecurefiletransferclient

OpenthesoftwareHitenter

PutIPaddress(10.157.217.4)PutusernameHitconnect

Chooseyes

PutpasswordHitok

22

Iftransferfrom localtoremote:locateyourfileanddragtotherightIftransferfromremotetolocal:locateyourfileanddragtotheleft

23

TransferfilesbetweentwoLinuxmachines(ormac andlinux)

scp:securecopyfiles/foldersbetweenhostsonanetwork

YouareataLinuxorMacmachine,e.g.yourlaptopwithUbuntu installedandyouwanttocopysomefilefromser

Openaterminalinyourmachine

scp yyin@10.157.217.87:/home/yyin/Unix_and_Perl_course/Data/Arabidopsis/At_genes.gff .

scp username@IP:/path .

Youwillbeaskedforpasswordonser

24

25

wget isaprogramuseful fordownloading filesfrombothFTPandHTTPsites.

wget isnon-interactive:yousimplyenterthenecessaryoptions andargumentsonthecommandlineandthefileisdownloaded foryou.

Youmustidentify thelinksfirst:browseahttpwebpageoraftpsiteandlocatetheremotefiles/foldersyouwanttodownload andthengototheterminalandtype

wget -q ftp.ncbi.nih.gov/blast/db/FASTA/yeast.aa.gz

wget -r -q ftp://ftp.ncbi.nih.gov/genomes/archive/old_refseq/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779

wget –q ftp.ncbi.nih.gov:/blast/executables/LATEST/ncbi-blast-2.2.27+-x64-linux.tar.gz

wget ftp://emboss.open-bio.org/pub/EMBOSS/emboss-latest.tar.gz

wget

-qquiet-rrecursive(for folders)

IttaketimetodownloadPut& attheendofcommand linetoputthejobtothebackground

26

Archiveandcompressfiles/folders

Tosavediskspace,wecancompresslargefilesifwedonotintendtousethemforawhile.Alotoffilesdownloaded fromthewebarecompressedandneedtobeuncompressedbeforeanyprocessingcantakeplace.

Commoncompressedformats:•gzip (gz)

gzip my_file (compressesfilemy_file,producing itscompressedversion,my_file.gz)

gzip –dmy_file.gz(decompressmy_file.gz,producing itsoriginalversionmy_file)

•bzip2bzip2my_file (compressesfilemy_file,producing itscompressedversion,

my_file.bz2)bunzip2my_file.bz2(decompressmy_file.bz2,producing itsoriginal

versionmy_file)

zless toviewzipped files

27

Commoncompressedformats(continued):•zip

zipmy_file.zipmy_file1my_file2my_file3(createacompressedarchivecalledmy_files.zip, containing threefiles:my_file1,my_file2,

my_file3)zip-rmy_file.zipmy_file1my_dir (ifmy_dir isadirectory,createan

archivemy_file.zipcontaining thefilemy_file1andthedirectorymy_dir

withallitscontent)zip–lmy_file.zip(listcontentsoftheziparchivemy_file.zip)unzipmy_files.zip(decompressthearchiveintotheconstituentfilesand

directories•tar

tar-cvf my_file.tarmy_file1my_file2my_dir (createacompressedarchivecalledmy_files.tar,containing filesmy_file1,my_file2

andthedirectorymy_dir withallitscontent)

tar–tvf my_file.tar(listcontentsofthetararchivemy_file.tar)tar-xvf my_files.tar(decompressthearchiveintotheconstituentfiles

anddirectories)

Usemantartolearnmore

28

Commoncompressedformats(continued):•tgz (also,tar.gz – essentiallyacomboof“tar”and“gzip”)

tar-czvf my_file.tgzmy_file1my_file2my_dir (createacompressedarchivecalledmy_files.tgz,containing filesmy_file1,my_file2

andthedirectorymy_dir withallitscontent)

tar–tzvf my_file.tgz(listcontentsofthetararchivemy_file.tar)tar-xzvf my_files.tgz(decompress thearchiveintotheconstituentfiles

anddirectories)

Wget thebookmaterialsofUnixandPerlPrimerforBiologistshttp://korflab.ucdavis.edu/Unix_and_Perl/

mkdir book

cd bookwget http://korflab.ucdavis.edu/Unix_and_Perl/current.zip

unzip current.zip

Unpackage theembosspackage

cdmkdir toolscd toolsmv ../emboss-latest.tar.gz toolstar –zxf emboss-latest.tar.gz &

29

30

CheckdiskusageDiskspaceisalimited resource,andyouwanttofrequentlymonitorhowmuchdiskspaceyouhaveused.Tocheckthediskspaceusageforafolder,usethedu(diskusage)commandyyin@ser:~$ du -hs .318M .yyin@ser:~$ du -hs Unix_and_Perl_course/131M Unix_and_Perl_course/

Tocheckhowmuchspaceleftontheentirestoragefilesystem,usethedf command

31

- Savehistoryofyourcommands:history > hist1less hist1

- Sendmessagetootheronlineuserswrite username(ctrl+c toexit)

- Changeyourpasswordpasswd

Ctrl+c totellShelltostopcurrentprocessCtrl+z tosuspendbg tosendtobackgroundCtrl+d toexittheterminal(logout)

top related