issues, concepts and tools for analysis of fungal environmental its sequences d. lee taylor
DESCRIPTION
Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee Taylor [email protected] Shawn Houston [email protected] Institute of Arctic Biology University of Alaska Fairbanks Photograph: Roger Ruess. - PowerPoint PPT PresentationTRANSCRIPT
Issues, Concepts and Tools for Analysis of Fungal Environmental
ITS Sequences
D. Lee [email protected]
Shawn [email protected]
Institute of Arctic BiologyUniversity of Alaska Fairbanks
Photograph: Roger Ruess
Coupling Diversity with Function:
Metagenomics of Boreal Forest Fungi USDA-NSF Microbial Genome Sequencing Program, 2003-2007
IPY: A Community Genomics Investigation of Fungal Adaptation to ColdNSF OPP International Polar Year, 2007-2011Major Clone Datasets To Date:
• Upland successional stages, Bonanza Creek LTER site30,000
• Various black spruce community types, Interior Alaska
40,000
• Two individual floodplain black spruce soil cores 20,000
• Seasonal study in single white spruce site, Interior
Alaska 9,200
• Moist sites along North American Arctic Transect,
bioclimatic subzones A-E 9,200
• Moist sites at Svalbard, subzones B, C 3,000
• Snow addition experiment at Toolik Lake LTER tundra site
3,800
Bioinformatic Processing of Fungal ITS Sequences from the Environment
I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences
II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script
III. Chimeras-> Uclust
• Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->
Phylobinning• Pseudogenes -> ???
I. Identifying OTUsI. Curated databases with masking of conserved
seqs
• SIMPLE DEMO/TUTORIAL
ftp://folders.inbre.alaska.edu/FMP/
http://www.borealfungi.uaf.edu/pipeline/
V Kunin, A Engelbrektson, H Ochman and P Hugenholtz. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental microbiology 12:118–123.
Bioinformatic Processing of Fungal ITS Sequences from the Environment
I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences
II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script
III. Chimeras-> Uclust
• Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->
Phylobinning• Pseudogenes -> ???
I. Identifying OTUs-> Curated databases with masking of conserved
seqs
Design of Pig-Tagged Primers
Taylor DL, Booth MG, McFarland JW, Herriott IC, Lennon NJ, Nusbaum C & Marr TG. 2008. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach. Molecular Ecology Resources 8(4): 742 - 752.
4 Taxon Test for Biases
Taylor DL, Booth MG, McFarland JW, Herriott IC, Lennon NJ, Nusbaum C & Marr TG. 2008. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach. Molecular Ecology Resources 8(4): 742 - 752.
Soil 1 Soil 2 . OTU Tag064 Tag102 Tag067 Tag126 Grand Total
1 13 14 37 35 9934 9 8 10 3 3032 7 7 5 4 2329 12 10 0 0 2225 9 9 0 0 1814 2 0 7 5 1426 6 4 0 3 134 2 2 5 3 1230 0 0 8 4 1217 3 3 2 2 102 0 2 1 5 85 2 0 2 3 719 5 2 0 0 723 0 0 3 4 740 2 0 2 3 73 0 0 3 3 631 4 1 1 0 67 1 2 0 2 518 0 0 1 4 511 1 1 1 1 412 2 0 1 1 421 1 3 0 0 441 1 3 0 0 4
Soil Sample Tests for Biases
Bioinformatic Processing of Fungal ITS Sequences from the Environment
I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences
II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script
III. Chimeras-> Uclust
• Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->
Phylobinning• Pseudogenes -> ???
I. Identifying OTUs• Curated databases with masking of conserved
seqs
• SIMPLE DEMO/TUTORIAL
Challenges: Chimeras
Reports in literature up to 30% of clone datasets
3% in our earliest clone libraries
<1% in a 30,000 clone black spruce dataset*
Currently used detection methods depend upon global MSA and/or library of clean reference sequences
STEP 1: Identify 97% contigs that are represented in multiple libraries. Sequences belonging to these contigs are deemed to be real and non-chimeric.
STEP 2: BLAST sequences against all known databases of fungi (including GenBank and lab databases) and identify passing matches (queries)
STEP 3: BLAST ITS1 and ITS2 of remaining sequences against curated database hunting for 97+% matches of both sides to same species. Sequences for which both the ITS1 and ITS2 regions match the same species at 97+% over 200+ bp are considered real and non-chimeric.
STEP 4: BLAST ITS1 and ITS2 sequences against database from which they came (including all libraries), hunting for matches to possible chimera parents
STEP 5: Align full length queries against best ITS1 and ITS2 matches, examine by eye
First Uclust test runs on fungal ITS sequences:
1) dataset of 45 OTUs with ITS plus 600bp LSU• 10 out of 10 synthetic chimeras
detected, including intrageneric• only 2 real sequences suggested as
possible chimeras, with low probability2) examined another dataset of 547 real,
relatively reliable sequences• spits out 3 way alignments that can be
examined• Bellerophon suggested 53% chimeric,
Uclust found ZERO
Bioinformatic Processing of Fungal ITS Sequences from the Environment
I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences-> Orienting to fix direction
II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script
III. Chimeras-> Uclust
IV. Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->
Phylobinning• Pseudogenes -> ???
V. Identifying OTUs-> Curated databases with masking of conserved
seqs
Challenges: Arbitrary % Identity Thresholds
- Multicopy- Intra-individual variation (including
pseudogenes)- Intra-specific variation- Different rates of evolution in different
lineages- How does 97% identity threshold perform?
SSU ITS-1 5.8S ITS-2 LSU
Groupings Differ Depending on Alignment Program and Parameter
Settings
X. Huang & A. Madan. 1999. Genome Research 9: 868-877
ITS phylogram of LactariusML tree, thick branches have >0.95 Bayesian
Posterior Probability
OTU 13 (97% ITS sim.)
Geml J, Laursen GA, Timling I, McFarland J, Booth MG, Lennon N, Nusbaum HC, Taylor DL. 2009. Molecular Ecology 18: 2213–2227.
OTU 19 (97% ITS sim.)
OTU 12 (97% ITS sim.)
Geml J, Laursen GA, Timling I, McFarland J, Booth MG, Lennon N, Nusbaum HC, Taylor DL. 2009. Molecular Ecology 18: 2213–2227.
Our Phylobinning Approach:
- cluster with Cap3 at low % identity (90%)- extract sequences from clusters- find related sequences in GenBank (everything & uncultured excluded)- generate alignments for each cluster using Muscle- feed alignments to RAxML- use fast-bootstraping method and find best tree using maximum likelihood- parse tree to determine phylobins
If branch length > 0.001 AND bootstrap >= 98, then name new phylobin
If branch length < 0.01 AND bootstrap < 98, move to next cluster
If branch length >= 0.01 AND bootstrap < 70, then move to next cluster
If branch length >= 0.01 AND bootstrap >= 70, then name new phyobin
If branch length >= 0.03, then name new phylobin (even if individual sequence)
All sequences from a contig that are not assigned to a phylobin at this point go into a last, default phylobin
Systematic Biology 57(5): 758–771, 2008{RAxML version 7.0.4 released by Alexandros Stamatakis in April 2008}
Meliniomyces bicolor
Cistella acuum
Uncultured fungus clone TD9_OTU5
Uncultured fungus clone G20_OTU5
Uncultured fungus clone IH_Tag102_3331
Mycorrhizal fungal sp. pkc09
Mycorrhizal fungal sp. pkc12
Mycorrhizal fungal sp. pkc22
Mycorrhizal fungal sp. pkc18
Mycorrhizal fungal sp. pkc33
Mycorrhizal fungal sp. pkc38
TKN7_3179P22 phylobin18*gi|133753088| Uncultured fungus clone G20_OT phylobin18*gi|133753170| Uncultured fungus clone TD9_OT phylobin18TKN12_3255J11 phylobin19TKN12_3258A12 phylobin20TKN9_3238J10 phylobin21*gi|37624773| Mycorrhizal fungal sp. pkc18 1 phylobin21*gi|37624772| Mycorrhizal fungal sp. pkc33 1 phylobin21*gi|37624762| Mycorrhizal fungal sp. pkc38 1 phylobin21TKN10_3235I22 phylobin21TKN11_3260O3 phylobin21*gi|37624759| Mycorrhizal fungal sp. pkc12 1 phylobin21*gi|37624763| Mycorrhizal fungal sp. pkc22 1 phylobin21*gi|162311725| Uncultured fungus clone IH_Tag phylobin22TKN12_3249H16 phylobin22
18
21
22
TKN12_3255J11
TKN9_3238J10
TKN12_3258A12
TKN10_3235I22
TKN11_3260O3
TKN12_3249H16
TKN7_3179P2
Hyalodendriella betulae
19
20
Bioinformatic Processing of Fungal ITS Sequences from the Environment
I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences-> Orienting to fix direction
II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script
III. Chimeras-> Uclust
IV. Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->
Phylobinning• Pseudogenes -> ???
V. Identifying OTUs-> Curated databases with masking of conserved
seqs
Challenges: Pseudogenes
- Sequences from cultures and fruitbodies for phylogenetics are rarely cloned - usually averages of variants that equate with the dominant sequence type
- Pseudogenes found in ITS clone libraries of Zooxanthellae
Thornhill, Lajeunesse & Santos. 2007. Molecular Ecology 16: 5326-5340.
“Based on these results, we conclude that artefacts due to Taq polymerase and cloning error only account for a small percentage of our clones while the remaining sequence diversity and divergence originates from ribosomal operon variation within the Symbiodinium Genome.”
Challenges: Non-fungal Sequences
30,000 black spruce clonesPrimers ITS1-F and TW13
5.8S
LSU
Bioinformatic Processing of Fungal ITS Sequences from the Environment
I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences-> Orienting to fix direction
II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script
III. Chimeras-> Uclust
IV. Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->
Phylobinning• Pseudogenes -> ???
V. Identifying OTUs-> Curated databases with masking of conserved
seqs
>TKN14_3314_P9CTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATTGAAATTATAGGTGAGGGTTGTAGCTGGCCTCTCGGGGCATGTGCACGCCCGAGCCCTTAATCCACACACACCTGTGAACCTATTGTAAGGGCCCTTAAAAAAGGCCTTTACGTCTTATCATCAACCCATCGTATGTCTCATAGAATGTAAATATATGTCCTCGCCTTAAAAAGCGTTGATAAACTTATACAACTTTCAACAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGATTTTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATGCCTGTTTGAGTGTCATTAAATTCTCAACTCTGATCGATTTGTTTCGACTTCGGAGCTTGGATTTGGAGCGTGCTGGCGTCGGTCGGCTCCTCTTAAATGCATCAGCGGAATCTAACGTTTCGGACGTCAGTGTGATAATCATGTTGCGCTGTCTGCCTGATCTGAAAGCCCGCTCACAATGGTCTTCGGACAACTTCATATCAAATTTGACCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCGGTCTTGCGGCCGTCCGAGTTGTAATCTGGAGAAGCGTTTATCCGCGTCGGACCGTGTACAAGTCTTCTGGAAGGGAGCGTCGTAGAGGGTGAGAATCCCGTCTTTGACACGGACAACCGGTGCTTTTGTGATGCGCTCTCGAAGAGTCGAGTTGTTTGGGAATGCAGCTCAAAATGGGTGGTAAATTCCATCTAAAGCTAAATATTGGCGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAGATGAAAAGCACTTTGGAAAGAGAGTTAAACAGTACGTGAAATTGTTGAAAGGGAAACGTTTGAAGTCAGTCGCGTCGGCCGAGACTCAACCTTGCTTCTGCTCGGTGCACTTCTCGGTTGACGGGTCAGCATCAATTTTGACCGCCGGATAAAGGTCGGGGGAATGTGGCATCCTTCGGGATGTGTTATAGACCTCGATTCGGATACGGCGATTGGGATTGAGGAACTCGGCGCTTTGCGTCCAGGATGCTGGCATAATGGCTTTAAGCGACCCGTCTTGAAACACGGANC
>TKN14_3314_P9CTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATTGAAATTATAGGTGAGGGTTGTAGCTGGCCTCTCGGGGCATGTGCACGCCCGAGCCCTTAATCCACACACACCTGTGAACCTATTGTAAGGGCCCTTAAAAAAGGCCTTTACGTCTTATCATCAACCCATCGTATGTCTCATAGAATGTAAATATATGTCCTCGCCTTAAAAAGCGTTGATAAACTTATACAACTTTCAACAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGATTTTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATGCCTGTTTGAGTGTCATTAAATTCTCAACTCTGATCGATTTGTTTCGACTTCGGAGCTTGGATTTGGAGCGTGCTGGCGTCGGTCGGCTCCTCTTAAATGCATCAGCGGAATCTAACGTTTCGGACGTCAGTGTGATAATCATGTTGCGCTGTCTGCCTGATCTGAAAGCCCGCTCACAATGGTCTTCGGACAACTTCATATCAAATTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGTCTTGAAACACGGANC
query match bit score E value -------- -------- ---------- ----------TKN14_3314_P9 gi|56126498|gb|AY822743.1| Uncultured ectomycor 1168 0.0TKN14_3314_P9 gi|296184581|gb|AY884238.2| Ectomycorrhizal fun 1168 0.0TKN14_3314_P9 gi|299778250|gb|HM069482.1| Uncultured fungus c 1128 0.0TKN14_3314_P9 gi|13470319|gb|AY010281.1| Piloderma fallax iso 1100 0.0TKN14_3314_P9 gi|104295534|gb|DQ474631.1| Uncultured ectomyco 1100 0.0
query match bit score E value -------- -------- ---------- ----------TKN14_3314_P9 gi|296184581|gb|AY884238.2| Ectomycorrhizal fun 1168 0.0TKN14_3314_P9 gi|13470319|gb|AY010281.1| Piloderma fallax iso 1100 0.0TKN14_3314_P9 gi|13470320|gb|AY010282.1| Piloderma fallax iso 1066 0.0TKN14_3314_P9 gi|86610857|gb|DQ365660.1| Piloderma fallax iso 1025 0.0TKN14_3314_P9 gi|86610864|gb|DQ365667.1| Piloderma fallax iso 1025 0.0
query: TKN14_3314_P9Click here to see the sequences of the best scoresThe best scores are:gi|296184581|gb|AY884238.2| Ectomycorrhizal fun cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; unclassified Fungi; ectomycorrhizal fungal sp. AR-Ny2gi|13470319|gb|AY010281.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallaxgi|13470320|gb|AY010282.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallaxgi|86610857|gb|DQ365660.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallaxgi|86610864|gb|DQ365667.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallax
Funding Sources and Supporting Agencies
Thanks!
Michelle AugustynMichael Booth Dan Cardin József Geml Hope GrayIan Herriott
Scott HillardTeresa Hollingsworth Sarah Hopkins Jason HuntTom MarrJack McFarland
Chad NusbaumGary LaursenNiall LennonJim LongMitali PatilIna Timling
Mask (marking low quality base calls)
Tag-Finder (identifying primer bar-codes)
Orient (fixing sequence directions)
Trim-Seq (removing low quality bases at ends)
Purge (removing low quality sequences)
Flag Non-Fungals Prepare_contigs(BLAST + Organism Lookup) (TGICL/Cap3 broad clusters)
(BLAST to add close relatives)(Muscle cluster alignments)
Flag Chimeras Phylo_table(Uclust) (RAxML bootstrap trees)
(Tree parsing)
Final Phylobin Table(Closest BLAST Relatives)
(Abundances of Phylobins across Samples)(Any Flags)
ftp://folders.inbre.alaska.edu/FMP/
http://www.borealfungi.uaf.edu/pipeline/
ftp://folders.inbre.alaska.edu/FMP/http://www.borealfungi.uaf.edu/pipeline/
ftp://folders.inbre.alaska.edu/FMP/http://www.borealfungi.uaf.edu/pipeline/
Upload sequences in .fasta format here
Upload quality file here(.qual file)
Place phred threshold here(phred = 20 is conservative)
>R_UP1_3168_P8CAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTCTCCGTTGGTGAACCAGCGGAGGGATCATTACCGAGTTTACAAACTCCCAAACCCTTTGTGAACCTTACCTATCGTTGCTTCGGCGGGACCGCCCCGACGGCCACCTCGGTGGTCCCGGAACCAGGCGCCCGCCGAAGGCCCCAAACTCTTTGTTTCCTATGGTTTTCTCCTCTGAGTGGAAAATAAACAAATAAATAAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAGGCCCCCAGTGCCTGGCGTTGGGGATCGGCCGCTGGCGTCCTTCGGGGGCGCCTGGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGTAGTCCTCCTCTGCGTAGTAGCACAACCTCGCAGTTGGAACGCGGCGGTGGGCCATGCCGTTAAACACCCCACTTCTGAAAGTTGACCTCGGATCAGG
>R_UP1_3168_P836 22 41 51 51 51 51 45 20 26 22 21 27 33 43 41 61 61 61 45 61 61 61 51 57 61 61 61 61 57 45 61 57 61 61 57 49 61 61 61 57 61 61 61 61 61 57 52 61 2751 57 61 61 55 52 45 61 61 61 61 39 25 61 61 43 33 61 61 61 61 61 61 61 61 61 61 61 61 55 57 61 51 61 61 61 61 61 61 43 41 61 61 61 61 61 61 61 61 6157 61 61 57 51 61 61 61 61 61 61 55 61 61 61 61 51 61 61 61 61 61 61 61 43 42 61 61 61 45 39 61 61 61 61 61 61 61 47 47 61 52 61 61 51 61 61 61 61 6161 32 55 61 51 61 61 61 61 61 61 51 42 61 52 61 61 52 61 61 61 61 52 61 61 61 61 52 42 51 51 52 51 47 49 61 61 51 61 61 52 61 61 61 42 61 47 47 51 4761 55 61 52 51 61 61 61 61 51 61 61 51 52 52 61 52 47 49 61 51 51 61 61 47 31 55 61 51 49 40 61 55 47 61 61 52 41 61 61 52 61 55 52 61 61 61 52 41 6144 47 52 47 61 49 51 40 51 55 51 61 43 61 40 32 55 51 49 61 52 34 49 51 61 61 47 51 61 47 52 47 61 61 40 51 49 49 49 51 51 52 61 52 38 55 39 61 55 3161 51 51 46 61 61 45 47 25 52 43 24 25 55 43 27 47 40 55 46 39 51 49 29 49 47 55 51 51 37 51 34 49 55 49 49 52 39 51 46 55 47 40 44 55 47 51 46 51 4941 51 55 52 47 51 49 43 41 37 30 40 39 52 37 49 39 55 43 51 55 51 32 55 51 49 39 51 44 49 38 44 27 52 30 32 40 44 51 41 43 51 23 39 31 55 49 49 32 2337 46 41 35 47 40 47 31 32 33 52 41 41 44 35 45 27 40 35 47 34 47 47 31 45 15 24 39 37 36 38 39 20 28 44 21 26 39 40 29 28 24 20 29 47 26 25 27 40 3926 29 34 35 36 8 24 31 32 47 30 21 10 8 36 36 35 26 47 41 29 34 47 29 35 30 45 29 46 27 19 44 11 16 39 31 27 45 36 40 30 20 31 31 30 45 31 21 32 2222 22 27 33 24 23 28 27 17 24 29 37 28 29 5 6 20 7 7 14 24 30 29 29 28 28 30 25 7 10 21 29 35 32 14 24 22 22 27 23 29 21 16 27 26 8 22 30 23 2929 10 16 28 26 29 29 29 23 22 26 26 30 24 17 15 18 18 25 21 20 25 23 40 28 17 18 19 23 28 18 28 17 24 28 22 29 31 30 8 11 26
>R_UP1_3168_P8_OriginalCAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTCTCCGTTGGTGAACCAGCGGAGGGATCATTACCGAGTTTACAAACTCCCAAACCCTTTGTGAACCTTACCTATCGTTGCTTCGGCGGGACCGCCCCGACGGCCACCTCGGTGGTCCCGGAACCAGGCGCCCGCCGAAGGCCCCAAACTCTTTGTTTCCTATGGTTTTCTCCTCTGAGTGGAAAATAAACAAATAAATAAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAGGCCCCCAGTGCCTGGCGTTGGGGATCGGCCGCTGGCGTCCTTCGGGGGCGCCTGGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGTAGTCCTCCTCTGCGTAGTAGCACAACCTCGCAGTTGGAACGCGGCGGTGGGCCATGCCGTTAAACACCCCACTTCTGAAAGTTGACCTCGGATCAGG
>R_UP1_3168_P8_MaskedCAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTCTCCGTTGGTGAACCAGCGGAGGGATCATTACCGAGTTTACAAACTCCCAAACCCTTTGTGAACCTTACCTATCGTTGCTTCGGCGGGACCGCCCCGACGGCCACCTCGGTGGTCCCGGAACCAGGCGCCCGCCGAAGGCCCCAAACTCTTTGTTTCCTATGGTTTTCTCCTCTGAGTGGAAAATAAACAAATAAATAAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAGGCCCCCAGTGCCTGGCGNTGGGGATCGGCCGCTGGCGTCCTTCGGGGNCGCCTGNNCGGCCCCGAAATCTAGNGNNGGTCTCGCTGTAGTCCTCCTCTGCNTAGTANNANNNCCTCGCAGNNGGAANGCGGCGGNGGNCCATGNNGTTAAACACCCNNNNTCTGAAANNNGANCNCGGATCNNG
primer = TTTCTTpigtail = TTGGTC
Upload your sequences here
Upload list of tags as text file here
Upload sequences here
Upload text file “Orient_Motifs” here
Challenges: Introns
- best fungal-selective primer is ITS1F, but it is 5’ of intron insertion site in 3’ end of SSU for many Ascomycetes
Challenges: Introns
Mask (marking low quality base calls)
Tag-Finder (identifying primer bar-codes)
Orient (fixing sequence directions)
Trim-Seq (removing low quality bases at ends)
Purge (removing low quality sequences)
Flag Non-Fungals Prepare_contigs(BLAST + Organism Lookup) (TGICL/Cap3 broad clusters)
(BLAST to add close relatives)(Muscle cluster alignments)
Flag Chimeras Phylo_table(Uclust) (RAxML bootstrap trees)
(Tree parsing)
Final Phylobin Table(Closest BLAST Relatives)
(Abundances of Phylobins across Samples)(Any Flags)
Only for Sanger, 454 & Illumina software do these steps