issues, concepts and tools for analysis of fungal environmental its sequences d. lee taylor

61
Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee Taylor [email protected] Shawn Houston [email protected] Institute of Arctic Biology University of Alaska Fairbanks Photograph: Roger Ruess

Upload: becky

Post on 11-Jan-2016

16 views

Category:

Documents


2 download

DESCRIPTION

Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee Taylor [email protected] Shawn Houston [email protected] Institute of Arctic Biology University of Alaska Fairbanks Photograph: Roger Ruess. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Issues, Concepts and Tools for Analysis of Fungal Environmental

ITS Sequences

D. Lee [email protected]

Shawn [email protected]

Institute of Arctic BiologyUniversity of Alaska Fairbanks

Photograph: Roger Ruess

Page 2: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Coupling Diversity with Function:

Metagenomics of Boreal Forest Fungi USDA-NSF Microbial Genome Sequencing Program, 2003-2007

IPY: A Community Genomics Investigation of Fungal Adaptation to ColdNSF OPP International Polar Year, 2007-2011Major Clone Datasets To Date:

• Upland successional stages, Bonanza Creek LTER site30,000

• Various black spruce community types, Interior Alaska

40,000

• Two individual floodplain black spruce soil cores 20,000

• Seasonal study in single white spruce site, Interior

Alaska 9,200

• Moist sites along North American Arctic Transect,

bioclimatic subzones A-E 9,200

• Moist sites at Svalbard, subzones B, C 3,000

• Snow addition experiment at Toolik Lake LTER tundra site

3,800

Page 3: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Bioinformatic Processing of Fungal ITS Sequences from the Environment

I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences

II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script

III. Chimeras-> Uclust

• Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->

Phylobinning• Pseudogenes -> ???

I. Identifying OTUsI. Curated databases with masking of conserved

seqs

• SIMPLE DEMO/TUTORIAL

Page 5: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 6: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

V Kunin, A Engelbrektson, H Ochman and P Hugenholtz. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental microbiology 12:118–123.

Page 7: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Bioinformatic Processing of Fungal ITS Sequences from the Environment

I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences

II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script

III. Chimeras-> Uclust

• Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->

Phylobinning• Pseudogenes -> ???

I. Identifying OTUs-> Curated databases with masking of conserved

seqs

Page 8: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 9: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Design of Pig-Tagged Primers

Taylor DL, Booth MG, McFarland JW, Herriott IC, Lennon NJ, Nusbaum C & Marr TG. 2008. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach. Molecular Ecology Resources 8(4): 742 - 752.

Page 10: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

4 Taxon Test for Biases

Taylor DL, Booth MG, McFarland JW, Herriott IC, Lennon NJ, Nusbaum C & Marr TG. 2008. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach. Molecular Ecology Resources 8(4): 742 - 752.

Page 11: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Soil 1 Soil 2 . OTU Tag064 Tag102 Tag067 Tag126 Grand Total

1 13 14 37 35 9934 9 8 10 3 3032 7 7 5 4 2329 12 10 0 0 2225 9 9 0 0 1814 2 0 7 5 1426 6 4 0 3 134 2 2 5 3 1230 0 0 8 4 1217 3 3 2 2 102 0 2 1 5 85 2 0 2 3 719 5 2 0 0 723 0 0 3 4 740 2 0 2 3 73 0 0 3 3 631 4 1 1 0 67 1 2 0 2 518 0 0 1 4 511 1 1 1 1 412 2 0 1 1 421 1 3 0 0 441 1 3 0 0 4

Soil Sample Tests for Biases

Page 12: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 13: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Bioinformatic Processing of Fungal ITS Sequences from the Environment

I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences

II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script

III. Chimeras-> Uclust

• Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->

Phylobinning• Pseudogenes -> ???

I. Identifying OTUs• Curated databases with masking of conserved

seqs

• SIMPLE DEMO/TUTORIAL

Page 14: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Challenges: Chimeras

Reports in literature up to 30% of clone datasets

3% in our earliest clone libraries

<1% in a 30,000 clone black spruce dataset*

Currently used detection methods depend upon global MSA and/or library of clean reference sequences

Page 15: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

STEP 1: Identify 97% contigs that are represented in multiple libraries. Sequences belonging to these contigs are deemed to be real and non-chimeric.

STEP 2: BLAST sequences against all known databases of fungi (including GenBank and lab databases) and identify passing matches (queries)

STEP 3: BLAST ITS1 and ITS2 of remaining sequences against curated database hunting for 97+% matches of both sides to same species. Sequences for which both the ITS1 and ITS2 regions match the same species at 97+% over 200+ bp are considered real and non-chimeric.

STEP 4: BLAST ITS1 and ITS2 sequences against database from which they came (including all libraries), hunting for matches to possible chimera parents

STEP 5: Align full length queries against best ITS1 and ITS2 matches, examine by eye

Page 16: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 17: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

First Uclust test runs on fungal ITS sequences:

1) dataset of 45 OTUs with ITS plus 600bp LSU• 10 out of 10 synthetic chimeras

detected, including intrageneric• only 2 real sequences suggested as

possible chimeras, with low probability2) examined another dataset of 547 real,

relatively reliable sequences• spits out 3 way alignments that can be

examined• Bellerophon suggested 53% chimeric,

Uclust found ZERO

Page 18: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Bioinformatic Processing of Fungal ITS Sequences from the Environment

I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences-> Orienting to fix direction

II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script

III. Chimeras-> Uclust

IV. Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->

Phylobinning• Pseudogenes -> ???

V. Identifying OTUs-> Curated databases with masking of conserved

seqs

Page 19: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Challenges: Arbitrary % Identity Thresholds

- Multicopy- Intra-individual variation (including

pseudogenes)- Intra-specific variation- Different rates of evolution in different

lineages- How does 97% identity threshold perform?

SSU ITS-1 5.8S ITS-2 LSU

Page 20: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Groupings Differ Depending on Alignment Program and Parameter

Settings

Page 21: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

X. Huang & A. Madan. 1999. Genome Research 9: 868-877

Page 22: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

ITS phylogram of LactariusML tree, thick branches have >0.95 Bayesian

Posterior Probability

OTU 13 (97% ITS sim.)

Geml J, Laursen GA, Timling I, McFarland J, Booth MG, Lennon N, Nusbaum HC, Taylor DL. 2009. Molecular Ecology 18: 2213–2227.

Page 23: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

OTU 19 (97% ITS sim.)

OTU 12 (97% ITS sim.)

Geml J, Laursen GA, Timling I, McFarland J, Booth MG, Lennon N, Nusbaum HC, Taylor DL. 2009. Molecular Ecology 18: 2213–2227.

Page 24: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Our Phylobinning Approach:

- cluster with Cap3 at low % identity (90%)- extract sequences from clusters- find related sequences in GenBank (everything & uncultured excluded)- generate alignments for each cluster using Muscle- feed alignments to RAxML- use fast-bootstraping method and find best tree using maximum likelihood- parse tree to determine phylobins

If branch length > 0.001 AND bootstrap >= 98, then name new phylobin

If branch length < 0.01 AND bootstrap < 98, move to next cluster

If branch length >= 0.01 AND bootstrap < 70, then move to next cluster

If branch length >= 0.01 AND bootstrap >= 70, then name new phyobin

If branch length >= 0.03, then name new phylobin (even if individual sequence)

All sequences from a contig that are not assigned to a phylobin at this point go into a last, default phylobin

Page 25: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Systematic Biology 57(5): 758–771, 2008{RAxML version 7.0.4 released by Alexandros Stamatakis in April 2008}

Page 26: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 27: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 28: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 29: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Meliniomyces bicolor

Cistella acuum

Uncultured fungus clone TD9_OTU5

Uncultured fungus clone G20_OTU5

Uncultured fungus clone IH_Tag102_3331

Mycorrhizal fungal sp. pkc09

Mycorrhizal fungal sp. pkc12

Mycorrhizal fungal sp. pkc22

Mycorrhizal fungal sp. pkc18

Mycorrhizal fungal sp. pkc33

Mycorrhizal fungal sp. pkc38

TKN7_3179P22 phylobin18*gi|133753088| Uncultured fungus clone G20_OT phylobin18*gi|133753170| Uncultured fungus clone TD9_OT phylobin18TKN12_3255J11 phylobin19TKN12_3258A12 phylobin20TKN9_3238J10 phylobin21*gi|37624773| Mycorrhizal fungal sp. pkc18 1 phylobin21*gi|37624772| Mycorrhizal fungal sp. pkc33 1 phylobin21*gi|37624762| Mycorrhizal fungal sp. pkc38 1 phylobin21TKN10_3235I22 phylobin21TKN11_3260O3 phylobin21*gi|37624759| Mycorrhizal fungal sp. pkc12 1 phylobin21*gi|37624763| Mycorrhizal fungal sp. pkc22 1 phylobin21*gi|162311725| Uncultured fungus clone IH_Tag phylobin22TKN12_3249H16 phylobin22

18

21

22

TKN12_3255J11

TKN9_3238J10

TKN12_3258A12

TKN10_3235I22

TKN11_3260O3

TKN12_3249H16

TKN7_3179P2

Hyalodendriella betulae

19

20

Page 30: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Bioinformatic Processing of Fungal ITS Sequences from the Environment

I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences-> Orienting to fix direction

II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script

III. Chimeras-> Uclust

IV. Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->

Phylobinning• Pseudogenes -> ???

V. Identifying OTUs-> Curated databases with masking of conserved

seqs

Page 31: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Challenges: Pseudogenes

- Sequences from cultures and fruitbodies for phylogenetics are rarely cloned - usually averages of variants that equate with the dominant sequence type

- Pseudogenes found in ITS clone libraries of Zooxanthellae

Page 32: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Thornhill, Lajeunesse & Santos. 2007. Molecular Ecology 16: 5326-5340.

Page 33: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

“Based on these results, we conclude that artefacts due to Taq polymerase and cloning error only account for a small percentage of our clones while the remaining sequence diversity and divergence originates from ribosomal operon variation within the Symbiodinium Genome.”

Page 34: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Challenges: Non-fungal Sequences

30,000 black spruce clonesPrimers ITS1-F and TW13

5.8S

LSU

Page 35: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Bioinformatic Processing of Fungal ITS Sequences from the Environment

I. Initial sequence cleanup-> Quality Scores-> Masking if Sanger Sequences-> Orienting to fix direction

II. Bar-coding/tagging-> Long, bias tested, edit distance-> Tag-finder script

III. Chimeras-> Uclust

IV. Defining OTUs• Introns -> TGICL/Cap3 Genome Assemblers• Percent Identity Thresholds ->

Phylobinning• Pseudogenes -> ???

V. Identifying OTUs-> Curated databases with masking of conserved

seqs

Page 36: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 37: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

>TKN14_3314_P9CTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATTGAAATTATAGGTGAGGGTTGTAGCTGGCCTCTCGGGGCATGTGCACGCCCGAGCCCTTAATCCACACACACCTGTGAACCTATTGTAAGGGCCCTTAAAAAAGGCCTTTACGTCTTATCATCAACCCATCGTATGTCTCATAGAATGTAAATATATGTCCTCGCCTTAAAAAGCGTTGATAAACTTATACAACTTTCAACAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGATTTTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATGCCTGTTTGAGTGTCATTAAATTCTCAACTCTGATCGATTTGTTTCGACTTCGGAGCTTGGATTTGGAGCGTGCTGGCGTCGGTCGGCTCCTCTTAAATGCATCAGCGGAATCTAACGTTTCGGACGTCAGTGTGATAATCATGTTGCGCTGTCTGCCTGATCTGAAAGCCCGCTCACAATGGTCTTCGGACAACTTCATATCAAATTTGACCTCAAATCAGGTAGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAATTTAAAATCTGGCGGTCTTGCGGCCGTCCGAGTTGTAATCTGGAGAAGCGTTTATCCGCGTCGGACCGTGTACAAGTCTTCTGGAAGGGAGCGTCGTAGAGGGTGAGAATCCCGTCTTTGACACGGACAACCGGTGCTTTTGTGATGCGCTCTCGAAGAGTCGAGTTGTTTGGGAATGCAGCTCAAAATGGGTGGTAAATTCCATCTAAAGCTAAATATTGGCGAGAGACCGATAGCGAACAAGTACCGTGAGGGAAAGATGAAAAGCACTTTGGAAAGAGAGTTAAACAGTACGTGAAATTGTTGAAAGGGAAACGTTTGAAGTCAGTCGCGTCGGCCGAGACTCAACCTTGCTTCTGCTCGGTGCACTTCTCGGTTGACGGGTCAGCATCAATTTTGACCGCCGGATAAAGGTCGGGGGAATGTGGCATCCTTCGGGATGTGTTATAGACCTCGATTCGGATACGGCGATTGGGATTGAGGAACTCGGCGCTTTGCGTCCAGGATGCTGGCATAATGGCTTTAAGCGACCCGTCTTGAAACACGGANC

>TKN14_3314_P9CTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATTGAAATTATAGGTGAGGGTTGTAGCTGGCCTCTCGGGGCATGTGCACGCCCGAGCCCTTAATCCACACACACCTGTGAACCTATTGTAAGGGCCCTTAAAAAAGGCCTTTACGTCTTATCATCAACCCATCGTATGTCTCATAGAATGTAAATATATGTCCTCGCCTTAAAAAGCGTTGATAAACTTATACAACTTTCAACAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGATTTTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATGCCTGTTTGAGTGTCATTAAATTCTCAACTCTGATCGATTTGTTTCGACTTCGGAGCTTGGATTTGGAGCGTGCTGGCGTCGGTCGGCTCCTCTTAAATGCATCAGCGGAATCTAACGTTTCGGACGTCAGTGTGATAATCATGTTGCGCTGTCTGCCTGATCTGAAAGCCCGCTCACAATGGTCTTCGGACAACTTCATATCAAATTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCGTCTTGAAACACGGANC

Page 38: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 39: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

query match bit score E value -------- -------- ---------- ----------TKN14_3314_P9 gi|56126498|gb|AY822743.1| Uncultured ectomycor 1168 0.0TKN14_3314_P9 gi|296184581|gb|AY884238.2| Ectomycorrhizal fun 1168 0.0TKN14_3314_P9 gi|299778250|gb|HM069482.1| Uncultured fungus c 1128 0.0TKN14_3314_P9 gi|13470319|gb|AY010281.1| Piloderma fallax iso 1100 0.0TKN14_3314_P9 gi|104295534|gb|DQ474631.1| Uncultured ectomyco 1100 0.0

query match bit score E value -------- -------- ---------- ----------TKN14_3314_P9 gi|296184581|gb|AY884238.2| Ectomycorrhizal fun 1168 0.0TKN14_3314_P9 gi|13470319|gb|AY010281.1| Piloderma fallax iso 1100 0.0TKN14_3314_P9 gi|13470320|gb|AY010282.1| Piloderma fallax iso 1066 0.0TKN14_3314_P9 gi|86610857|gb|DQ365660.1| Piloderma fallax iso 1025 0.0TKN14_3314_P9 gi|86610864|gb|DQ365667.1| Piloderma fallax iso 1025 0.0

query: TKN14_3314_P9Click here to see the sequences of the best scoresThe best scores are:gi|296184581|gb|AY884238.2| Ectomycorrhizal fun cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; unclassified Fungi; ectomycorrhizal fungal sp. AR-Ny2gi|13470319|gb|AY010281.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallaxgi|13470320|gb|AY010282.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallaxgi|86610857|gb|DQ365660.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallaxgi|86610864|gb|DQ365667.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallax

Page 40: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Funding Sources and Supporting Agencies

Page 41: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Thanks!

Michelle AugustynMichael Booth Dan Cardin József Geml Hope GrayIan Herriott

Scott HillardTeresa Hollingsworth Sarah Hopkins Jason HuntTom MarrJack McFarland

Chad NusbaumGary LaursenNiall LennonJim LongMitali PatilIna Timling

Page 42: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 43: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Mask (marking low quality base calls)

Tag-Finder (identifying primer bar-codes)

Orient (fixing sequence directions)

Trim-Seq (removing low quality bases at ends)

Purge (removing low quality sequences)

Flag Non-Fungals Prepare_contigs(BLAST + Organism Lookup) (TGICL/Cap3 broad clusters)

(BLAST to add close relatives)(Muscle cluster alignments)

Flag Chimeras Phylo_table(Uclust) (RAxML bootstrap trees)

(Tree parsing)

Final Phylobin Table(Closest BLAST Relatives)

(Abundances of Phylobins across Samples)(Any Flags)

Page 44: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 47: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

ftp://folders.inbre.alaska.edu/FMP/http://www.borealfungi.uaf.edu/pipeline/

Upload sequences in .fasta format here

Upload quality file here(.qual file)

Place phred threshold here(phred = 20 is conservative)

Page 48: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

>R_UP1_3168_P8CAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTCTCCGTTGGTGAACCAGCGGAGGGATCATTACCGAGTTTACAAACTCCCAAACCCTTTGTGAACCTTACCTATCGTTGCTTCGGCGGGACCGCCCCGACGGCCACCTCGGTGGTCCCGGAACCAGGCGCCCGCCGAAGGCCCCAAACTCTTTGTTTCCTATGGTTTTCTCCTCTGAGTGGAAAATAAACAAATAAATAAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAGGCCCCCAGTGCCTGGCGTTGGGGATCGGCCGCTGGCGTCCTTCGGGGGCGCCTGGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGTAGTCCTCCTCTGCGTAGTAGCACAACCTCGCAGTTGGAACGCGGCGGTGGGCCATGCCGTTAAACACCCCACTTCTGAAAGTTGACCTCGGATCAGG

>R_UP1_3168_P836 22 41 51 51 51 51 45 20 26 22 21 27 33 43 41 61 61 61 45 61 61 61 51 57 61 61 61 61 57 45 61 57 61 61 57 49 61 61 61 57 61 61 61 61 61 57 52 61 2751 57 61 61 55 52 45 61 61 61 61 39 25 61 61 43 33 61 61 61 61 61 61 61 61 61 61 61 61 55 57 61 51 61 61 61 61 61 61 43 41 61 61 61 61 61 61 61 61 6157 61 61 57 51 61 61 61 61 61 61 55 61 61 61 61 51 61 61 61 61 61 61 61 43 42 61 61 61 45 39 61 61 61 61 61 61 61 47 47 61 52 61 61 51 61 61 61 61 6161 32 55 61 51 61 61 61 61 61 61 51 42 61 52 61 61 52 61 61 61 61 52 61 61 61 61 52 42 51 51 52 51 47 49 61 61 51 61 61 52 61 61 61 42 61 47 47 51 4761 55 61 52 51 61 61 61 61 51 61 61 51 52 52 61 52 47 49 61 51 51 61 61 47 31 55 61 51 49 40 61 55 47 61 61 52 41 61 61 52 61 55 52 61 61 61 52 41 6144 47 52 47 61 49 51 40 51 55 51 61 43 61 40 32 55 51 49 61 52 34 49 51 61 61 47 51 61 47 52 47 61 61 40 51 49 49 49 51 51 52 61 52 38 55 39 61 55 3161 51 51 46 61 61 45 47 25 52 43 24 25 55 43 27 47 40 55 46 39 51 49 29 49 47 55 51 51 37 51 34 49 55 49 49 52 39 51 46 55 47 40 44 55 47 51 46 51 4941 51 55 52 47 51 49 43 41 37 30 40 39 52 37 49 39 55 43 51 55 51 32 55 51 49 39 51 44 49 38 44 27 52 30 32 40 44 51 41 43 51 23 39 31 55 49 49 32 2337 46 41 35 47 40 47 31 32 33 52 41 41 44 35 45 27 40 35 47 34 47 47 31 45 15 24 39 37 36 38 39 20 28 44 21 26 39 40 29 28 24 20 29 47 26 25 27 40 3926 29 34 35 36 8 24 31 32 47 30 21 10 8 36 36 35 26 47 41 29 34 47 29 35 30 45 29 46 27 19 44 11 16 39 31 27 45 36 40 30 20 31 31 30 45 31 21 32 2222 22 27 33 24 23 28 27 17 24 29 37 28 29 5 6 20 7 7 14 24 30 29 29 28 28 30 25 7 10 21 29 35 32 14 24 22 22 27 23 29 21 16 27 26 8 22 30 23 2929 10 16 28 26 29 29 29 23 22 26 26 30 24 17 15 18 18 25 21 20 25 23 40 28 17 18 19 23 28 18 28 17 24 28 22 29 31 30 8 11 26

Page 49: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

>R_UP1_3168_P8_OriginalCAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTCTCCGTTGGTGAACCAGCGGAGGGATCATTACCGAGTTTACAAACTCCCAAACCCTTTGTGAACCTTACCTATCGTTGCTTCGGCGGGACCGCCCCGACGGCCACCTCGGTGGTCCCGGAACCAGGCGCCCGCCGAAGGCCCCAAACTCTTTGTTTCCTATGGTTTTCTCCTCTGAGTGGAAAATAAACAAATAAATAAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAGGCCCCCAGTGCCTGGCGTTGGGGATCGGCCGCTGGCGTCCTTCGGGGGCGCCTGGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGTAGTCCTCCTCTGCGTAGTAGCACAACCTCGCAGTTGGAACGCGGCGGTGGGCCATGCCGTTAAACACCCCACTTCTGAAAGTTGACCTCGGATCAGG

>R_UP1_3168_P8_MaskedCAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTCTCCGTTGGTGAACCAGCGGAGGGATCATTACCGAGTTTACAAACTCCCAAACCCTTTGTGAACCTTACCTATCGTTGCTTCGGCGGGACCGCCCCGACGGCCACCTCGGTGGTCCCGGAACCAGGCGCCCGCCGAAGGCCCCAAACTCTTTGTTTCCTATGGTTTTCTCCTCTGAGTGGAAAATAAACAAATAAATAAAAACTTTCAACAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCAACCCTCAGGCCCCCAGTGCCTGGCGNTGGGGATCGGCCGCTGGCGTCCTTCGGGGNCGCCTGNNCGGCCCCGAAATCTAGNGNNGGTCTCGCTGTAGTCCTCCTCTGCNTAGTANNANNNCCTCGCAGNNGGAANGCGGCGGNGGNCCATGNNGTTAAACACCCNNNNTCTGAAANNNGANCNCGGATCNNG

Page 50: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

primer = TTTCTTpigtail = TTGGTC

Upload your sequences here

Upload list of tags as text file here

Page 51: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Upload sequences here

Upload text file “Orient_Motifs” here

Page 52: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 53: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 54: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 55: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 56: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 57: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 58: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor
Page 59: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Challenges: Introns

- best fungal-selective primer is ITS1F, but it is 5’ of intron insertion site in 3’ end of SSU for many Ascomycetes

Page 60: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Challenges: Introns

Page 61: Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee  Taylor

Mask (marking low quality base calls)

Tag-Finder (identifying primer bar-codes)

Orient (fixing sequence directions)

Trim-Seq (removing low quality bases at ends)

Purge (removing low quality sequences)

Flag Non-Fungals Prepare_contigs(BLAST + Organism Lookup) (TGICL/Cap3 broad clusters)

(BLAST to add close relatives)(Muscle cluster alignments)

Flag Chimeras Phylo_table(Uclust) (RAxML bootstrap trees)

(Tree parsing)

Final Phylobin Table(Closest BLAST Relatives)

(Abundances of Phylobins across Samples)(Any Flags)

Only for Sanger, 454 & Illumina software do these steps