assignment 12: substitution rates and identifying...
Post on 05-Jul-2020
2 Views
Preview:
TRANSCRIPT
Assignment12:SubstitutionRatesandIdentifyingSelection
4/13/17
Modifiedfromslides2015
Detectingselectionusingthenucleotidesubstitutionrate
• dN orKa =nonsynonymoussubstitutionrate=#nonsyn.changes#nonsyn.sites
• dS orKs =synonymoussubstitutionrate=#syn.changes#syn.sites
• dN/dS ratioisameasureoftheselectivepressureonaprotein-codinggene:
2AdaptedfromDr.Fay’slectureslides
dN/dS Interpretation= 1 Noconstraintonproteinsequence,i.e.,nonsyn.changesareneutral
(neutral selection)< 1 Functionalconstraintontheproteinsequence,i.e.,nonsyn.
mutationsaredeleterious(purifyingselection)> 1 Changeinthefunctionoftheproteinsequence,i.e.,nonsyn.
mutationsareadaptive(positive selection)
Assignment12:SubstitutionRatesandIdentifyingSelection
• Goal• Investigatesynonymousandnonsynonymoussubstitutionratesacrossthegenomesofseveralyeastspecies
• Input• Alignmentsof5kgenesfrom4yeastspecies• SynonymousNon-synonymousAnalysisProgram(SNAP)• GeneannotationfileforS.cerevisiae
• Output• dN/dSratioforeverygene,summarystatisticsofdN/dSdistribution,visualizationofdN/dSdistribution,averagedN/dSratioforallGOterms
3
Inputfiles
4
>ScerATGTCAAAAGCTGTCGGGCTCCA-----------CCGTTGAAGAAGTTGAT>SmikATGTCAAAAGCTGTCGGGCTCCAGGAGCTGCTCCCTGTTGAAGAAGTTGAT
Examplenucleotidealignmentfile
• Alignmentsof5kgenesfrom4yeastspecies:S.cerevisiae,S.paradoxus,S.mikatae,&S.bayanus• Fastaformat• DatacomesfromKellisetal.Nature(2003)
AdaptedfromDr.Cohen’slectureslides
Synonymous Non-synonymousAnalysisProgram (SNAP)
• Perlscriptthatcalculatesthesyn.andnonsyn.substitutionratesinanucleotidealignmentofagene• Usage
$ perl /home/assignments/assignment10/SNAP.pl <nucleotide alignment fasta> <output directory>
• Createsanoutputfile(*.dnds)containingsubstitutionmetricsforeachpairofspeciesinthealignment• Outputfileiswrittentotheoutputdirectory
5
dndsfileformat• Whitespace-separatedtextfile• Tableofsubstitutionmetricsforeachpairwisecomparison• Containsheader• Seeassignmentforcompletedescriptionoffileformat
• Tableofabbreviations
6
Compare Sequences_names Sd Nd S N ps pn ds dn dn/ds0 1 Scer Spar 291.00 37.00 896.50 3177.50 0.3246 0.0116 0.4253 0.0117 0.02760 2 Scer Smik 424.50 71.50 891.33 3170.67 0.4763 0.0226 0.7559 0.0229 0.03031 2 Spar Smik 369.33 77.67 891.83 3170.17 0.4141 0.0245 0.6025 0.0249 0.0413Sd = Synonymous differencesNd = Nonsynonymous differencesS = Synonymous sitesN = Nonsynonymous sitesps = Synonymous rate (Sd/S)pn = Nonsynonymous rate (Nd/N)ds = Synonymous rate (corrected)dn = Nonsynonymous rate (corrected)
Exampledndsfile
AssignmentTODOs
• Writerun_SNAP.py• RunSNAP.pl oneveryalignmentfile• CreateanoutputfileofdN/dSratios• CalculatedN/dSsummarystatistics
• Writeplot_gene_length_vs_dnds.py• Createscatterplotofgenelengthvs.dN/dSratio
• Writecalc_average_go_dnds.py• CalculatetheaveragedN/dSratioforeachGOterminaGFF
• Answerfollow-upquestions7
ExecutingexternalcommandsinPython
• Usesubprocess.call toexecuteanexternalcommandfromwithinaPythonscript
• Pythonwillwaituntilthecommandcompletesbeforemovingtothenextlineofcode
• Seehttp://stackoverflow.com/questions/89228/calling-an-external-command-in-python foralternatives
8
Shell command Python
$ SNAP.pl YAL003W.fasta dnds_out
Code
Template import subprocess
subprocess.call(<list_of_arguments>)# <list_of_arguments> is a list of the words in the command
Exam
ple
FileanddirectorymanipulationinPython• Useos.listdir tolistofallfilesandsubdirectoriesinagivendirectory:
Codetemplate Example
import os<list_of_files> =
os.listdir(<directory>) Code
Outpu
t
[‘YAL002W.fasta’, ‘YAL008W.fasta’]
9
• os.listdir returnsthename ofthefile/dir,notthepath• Useos.path.join toconstructthepath:
Codetemplate Example
import os<file_path> =
os.path.join(<directory>,<file>)
Code
Outpu
t
../alignments/YAL002W.fasta
FileanddirectorymanipulationinPython
• Useos.path.isfile tocheckifafileexists:
Codetemplate Example
import osif os.path.isfile(<filename>):
# Do something Code
Outpu
t
The file exists!
10
• Wheretolearnmore• Python.orgdocumentation:https://docs.python.org/3.4/library/os.html
Tipsforwritingrun_SNAP.py• RunningSNAP.pl on5kfilestakesalongtime
• à Beforeyoustart,makeatesterfolderw/20alignmentfiles.Runrun_SNAP.py onthisdirectorytosavetimewhenwriting/debugging.
• Theclassservermaygetveryslowifeveryonerunstheirscriptatthesametime• à Don’twaituntilThursdaynighttostart
• Someoftheinputalignmentfilesarenotformattedcorrectly. (Datacanbemessy.)• à SNAP.plmay/maynotproduceadndsfileforthesegenes. IgnoretheseinyouroutputfilesincetheyhavenodN/dSratio)
• Hints• useos.path.isfile• Checkifthefilecontainsthescer vs.sparcomparison
11
AssignmentTODOs
• Writerun_SNAP.py• RunSNAP.pl oneveryalignmentfile• CreateanoutputfileofdN/dSratios• CalculatedN/dSsummarystatistics
• Writeplot_gene_length_vs_dnds.py• Createscatterplotofgenelengthvs.dN/dSratio
• Writecalc_average_go_dnds.py• CalculatetheaveragedN/dSratioforeachGOterminaGFF
• Answerfollow-upquestions12
GeneralFeatureFormat(GFF)
• Seehttps://www.sanger.ac.uk/resources/software/gff/spec.html#t_2 foracompletedescription• Attributecolumncontainsadditionalinformationaboutthefeature,e.g.,GOIDs• Semicolon-separatedlistofkey-valuepairs
13
chrI SGD gene 335 649 . + . Name=YAL069W;Ontology_term=GO:0003674,GO:0005575chrI SGD gene 7235 9016 . - . Name=YAL067C;Ontology_term=GO:0005215
Examplegfffile
Assignment12:requirements• TIP:startearly!• Due:21April2017,10am• Commentyourcode
• Writedocstringsforscripts(includeusage)• Writedocstringsforfunctions
• Submissiondirectoryshouldcontain• README.txt• Allscripts
• run_SNAP.py• plot_gene_length_vs_dnds.py• calc_average_go_dnds.py
• Most* outputfiles• alignments_all_dnds.txt• alignments.err• gene_length_vs_dnds.png• average_go_dnds.txt• *Youdonotneedtoturninthedndsoutputfiles
14
top related