2015 pag-chicken
TRANSCRIPT
C. Titus Brown
Associate Professor
School of Veterinary Medicine
UC Davis
Jan 2015
Adventures in improving the chicken genome &
transcriptome
Current state of chicken genome● galGal2 (2004)
o Sanger sequencing (6.6X)
o Physical and genetic linkage maps
● galGal3 (2006)
o 198K additional reads
Contigs end
Regions of poor quality
o SNP mapping
o chrZ and chrW
● galGal4 (2011)
o 454 (12X)
o - 10Mb artifactual duplications
o +15Mb mapped to chromosomes
o increases in N50 contig size
2. Microchromosomes...
● 10 macrochromosomes
● 28 microchromosomeso GC rich
o high recombination rate
o high gene density
o low intron size
● not sequencing friendly!
Moleculo vs PacBio
Moleculo
● Cheapero High throughput
● Low error rate o ~0%
● Same problems as Illumina…
PacBio
● No 3' bias
● No PCR
● High error rateo ~15%
● Lower throughput
● "$$-plated genome"
Moleculo library preparationKuleshov et al (2014), Nature Biotechnology 32, 261–266
Exploring Moleculo
● 1,578,022 reads
● Covers 88% of galGal4
● 326 reads unmapped to galGal4 (0.02%)o Searched 5 random in ENA (exonerate)
o 3 matched Sediminibacterium sp...
Luiz Irber
But Moleculo does not contain
missing genes… ;(
Search for de novo-assembled UniProt orthologs
from chicken in (a) galGal4 genome, and (b)
Moleculo data.
Luiz Irber
The missing exons are not in
Moleculo data. Might be in
PacBio.
So, now working with PacBio.
● Dealing with PacBio datao Most tools break horribly
(It's getting better)
● Assembling PacBio datao High error rate (~15%)
o Most assemblers target short reads
o PacBio recommended assemblers interact poorly
with MSU HPCC
Would like to produce a step-by-step protocol to
do genome improvement or assembly with
PacBio… Luiz Irber
2) Evaluating effects of gene models
on pathway prediction
Likit Preeyanon
Vertically integrated comparison.
GIMME: Software for Merging Gene Models
Assembly-
based
Local
Assembly
GIMME
Reference
-guided
Merged
Models
In-house software
ENSEMBL
Cufflinks can incorporate
ENSEMBL
Exon Graph approach (“Gimme”)
intron1 intron2exon1
exon2 exons2
exon3
exon1 exon2 exon3
Exon3.bExon3.a
Likit Preeyanonhttps://github.com/ged-lab/gimme.git
Ensembl Enriched KEGG Pathway
Term Count Benjamin
Cytokine-cytokine receptor interaction 36 6.2E-02
Lysosome 25 1.2E-01
Apoptosis 19 3.5E-01
Arginine and proline metabolism 12 3.1E-01
Starch and sucrose metabolism 9 3.4E-01
Toll-like receptor signaling pathway 19 3.7E-01
Natural killer cell mediated cytotoxicity 17 3.4E-01
Cytosolic DNA-sensing pathway 9 4.2E-01
Valine, leucine and isoleucine degradation 11 4.1E-01
Glutathione metabolism 10 4.3E-01
NOD-line receptor signaling pathway 11 4.6E-01
Intestinal immune network for IgA production 9 5.6E-01
VEGF signaling pathway 14 5.6E-01
PPAR signaling pathway 13 6E-01
Gimme Enriched KEGG Pathway
Term Count Benjamin
Cytokine-cytokine receptor interaction 34 3.7E-02
Toll-like receptor signaling pathway 22 2.7E-02
Jak-STAT signaling pathway 28 3.4E-02
Arginine and proline metabolism 13 4.5E-02
Lysosome 22 1.3E-01
Natural killer cell mediated cytotoxicity 17 1.6E-01
Alanine, aspartate and glutamate metabolism 9 1.8E-01
Amino sugar and nucleotide sugar metabolism 10 3.6E-01
Cysteine and methionine metabolism 9 4E-01
ECM-receptor interaction 16 3.7E-01
Apoptosis 16 3.7E-01
Glycosis / Gluconeogenesis 11 4E-01
DNA replication 8 3.8E-01
Cell adhesion molecules (CAMs) 19 4.6E-01
PPAR signaling pathway 12 6E-01
Intestinal immune network for IgA production 8 6.1E-01
Compared Enriched KEGG PathwayTerm
Cytokine-cytokine receptor interaction
Toll-like receptor signaling pathway
Lysosome
Apoptosis
Arginine and proline metabolism
Natural killer cells
Intestinal immune network for IgA production
PPAR signaling pathway
Starch and sucrose
Valine, leucine and isoleucine degradation
Glutathione metabolism
NOD-like receptor signaling pathway
VEGF signaling pathway
Jak-STAT signaling pathway
Alanine, aspartate and glutamate metabolism
Amino sugar and nucleotide sugar metabolism
ECM-receptor interaction
Cell adhesion molecules (CAMs)
DNA replication
Glycosis / Gluconeogenesis
Common
Ensembl
Gimme
RNAseq: your models matterOur methods for generating hypotheses from mRNAseqdata are sensitive to references & technical details of the approaches.
(This is expected but Bad.)
More RNAseq data coming every day.
…but we are not regularly updating gene models…
… and the genome that we have is Not Great.
Follow on Smith & Burt (2014) to continually regenerate gene models for differential expression use.
A general model for vet/ag animals?