targeted rna sequencing, urban metagenomics, and astronaut genomics
TRANSCRIPT
104 genes, 25ng of RNA
Targeted RNA Sequencing, Urban Metagenomics, and Astronaut GenomicsChristopher E. MasonAssociate ProfessorDepartment of Physiology and Biophysics &The Institute for Computational Biomedicine at theWeill Cornell Medicine and theTri-Institutional Program on Computational Biology and MedicineFellow of the Information Society Project, Yale Law SchoolFebruary 11th , 2016@mason_lab
Exome:2% of the genome?
Human Genome Organization
from Mason, and Bozinosky, Kaplan & Sadocks Comprehensive Textbook of Psychiatry, 2016
3
ENCODE active elements!
These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions.
Some disagree
The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks.We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.
Can RNA-Seq replace microarrays?
Marioni and Mason et. al, Genome Research, 2008RNA-Seq: An assessment of technical reproducibility and comparison with gene expression arrays-Original title The Death Knell to Microarrays.
6
More DEGs found with RNA-seq(37%)DE inSolexa4,959(12%)DE inAffy1,579(50%)Both6,534Marioni and Mason et al, Genome Research, 2008
7Whislt this is interesting, a more relevant measure might be the number of genes that overlap between the two technologies
More DE genes found with RNA-Seq
Liu et al, 2010
8Estimated log2 fold-change from Solexa (y axis) and Aymetrix (x axis) are highly correlated. Weconsider only genes that were interrogated using both platforms and genes where the mean numberof counts across lanes was greater than 0 for both the liver and kidney samples. Red and green dotsrepresent genes called as dierentially expressed based on the Solexa data at an FDR of 0.1%, witha mean number of counts greater than (red), or less than (green), 250 reads. Black dots representgenes not called as dierentially expressed based on the Solexa data. The set of dierentiallyexpressed genes that show the strongest correlation between the two technologies seem to be thosethat are mapped to by many reads (colored red), while the correlation is weaker for dierentiallyexpressed genes mapped to by fewer reads (green).
Differential expression by gene, exon, splice isoform, allele, & transcriptAlgorithms: STAR, r-make, ASE, limma-voom, RSEM3readsalignmentsSorted BAMsGENCODE annotation
queries Reads/bpAlignment on HPC nodes
Referenceshg19RefSeqmiRBaserRNAAdapters
Find ncRNAs and new genesAlgorithms: r-make, Aceview4Sequencing DataGene fusion detectionAlgorithms: r-make, Snowshoes5
Genetic variation (SNVs and indels)Algorithms: STAR/GATK, r-make2
Predict polyA sites & gain/loss of miRNA binding sitesAlgorithms: r-make, BAGET AlexaSeq, TargetScan6
Viruses/Bacteria/Other Use: BLAST, MetaPhlANRemaining Reads TCTGCTTTAGGATAGATCGATAGCTAGTTCAT CTGCTTTAGGATAGATCGATAGCTAGTTCATCTCTGCTTTAGGATAGATCGATAGCTAGTTCAT7
1RNA-seq gives many views of biology.RNA-seq=Love.Li S, Nature Biotechnology, 2014.
9
Many ways to sequence RNA
Sept. 9, 2014http://www.nature.com/nbt/focus/seqc/index.html
All technologies have varying strengths
GENCODE annotationv24Coding genes: 19,815
Noncoding genes: 25,823
Psuedogenes: 14,505IgG/TcR/Other genes: 41160,554 totalhttp://www.gencodegenes.org/stats.html
But the lncRNAs hide at low levels
Derrien et al., Genome Research, 2012http://genome.cshlp.org/content/22/9/1775.full
What else is at low levels? A Clinical example:Using RNA-Seq to find chemo-resistant clones in ALL
Meyer et al, Nature Genetics, 2013
Meyer et al, Nature Genetics, 2013
Only Significantly Associated Clinical Variable Was Early Relapse
Meyer et al, Nature Genetics, 2013
7 of 40
We see functional mutation clustering within the protein
NT5C2: 5'-nucleotidase (purine), cytosolic type II Meyer et al, Nature Genetics, 2013
5nucleotidase, cytosolic II18
NT5C2 Mutants Confer Chemoresistance to Purine Nucleoside Analogue Treatment 6-MP 6-TG Reh cells transiently lentiviral infected with WT, GFP, and mutantsMeyer et al, Nature Genetics, 2013
Meyer et al, Nature Genetics, 2013Many mutations hide at low frequency how do we find them?
Global RNA-sequencing has wiggles
Wang, et al. 2011, Nat Rev Genetics
QIAseq Targeted RNA Panels for gene expression profiling using Digital RNA sequencing
Molecular barcodes enablingDigital RNAseq
Rapid Targeted RNA Panel Design(Life Moves Pretty Fast)
AML recurrent, relapse-specific dys-regulated genes140 patient cohort of diagnosis-relapse pairs of AML: WES, RRBS, RNA-seq on allDifferentially Expressed Genes (DEGs) were found with DESeq2, and also inverse correlation with methylation in gene promoters. 25% methylation difference, q-value 0.01, and >1.5-fold FCOut of 140 patients, 104 genes found in at least 30% of themOnly 10ng of total RNA left from patient samples
QIASeq Targeted RNA Panels Leverage Barcodes (a.k.a. UMIs)
Digital sequencing using molecular barcode technologyTagging each cDNA template with a unique barcode Counting the number of barcodes to correct any amplification artifactsProviding unparalleled value: accurate and unbiased gene expression analysis with NGS
QIAseq Targeted RNA Panels use a digital sequencing method, whereby a each transcript (after mRNA is converted to cDNA) get tagged with unique 12-base random molecular barcode prior to any amplification step. Thus, enrichment and amplification events yield a unique combination of molecular barcode and target sequence. At the end of sequencing, the relative amount of each mRNA target is determined by counting the number of unique molecular barcode-target combinations instead of reads, thereby eliminating PCR duplicates and amplification bias, resulting in more accurate, unbiased gene expression analysis.25
6 hour library prep procedure
6 hours
Very good performance metricsERCCsAccuracy (vs. qPCR): R2 = 0.90Specificity (on-target reads): >97%Uniformity (20% of mean): >97%Reproducibility (lab 1 vs. lab 2): R2 = 0.99Sensitivity: detect ~0.2 copies of RNA per cell
128 copies1 2 3 4 5 6 7 8 9 10 11 12 1310 tags
Easy-to-use, Online custom builderChoose your own gene content from 20,000+ human genes and lncRNAEasy to use online Custom Panel Builder to tailor panel specific to your research needsInput list of genesSelect proper controls (genomic DNA contamination control, HKGs)
Output: list of genomic ordinates for primers designed specific to genes of interest
We used custom panel consist of 104 genes of interest + GDCs + 10 HKGs
Targeted RNA Capture Panel ran well on the NextSeq (1x150)
UHU-24-1-E9_S57 VS NBM (Normalized to Top3 HKGs)
Easy to see switch of leukemia genes from normal bone marrow (NBM)
UHU-51-1-H11_S95(group1) AND UHU-51-2-H12_S96 (group2) VS NBM (Normalized to Top3 HKGs)
Differences from Diagnosis to Relapse relative to normal bone marrow (NBM)
HIST1H1C, HIST1H1D, HIST1H2BD, HIST2H2BE
R2=0.92All = 0.86
The Others
But! There is more than one genome:In your bodys cellular democracy, YOU are a minority party:
1.3-10X bacterial:human cells (Zhu et al., 2010, Sender et al, 2016)150:1 bacterial:human active transcripts in the gut microbiome (Qin et al, 2010)
http://biorxiv.org/content/early/2016/01/06/036103
Jessica Lee Green
38Flatworms, cnidarians, Ecteinascidia turbinata , invertebrates, vertebrates, ganesh
Open-Source GIS Cloud App (iOS and Android)
City-scale metagenomics
First city-scale metagenome profile
1. Swab (3 min)Data Entry2. Annotate3. GPS-tag/timestampExtract DNA (n=1,457 samples)
96-plex TruSeq/Qiagen Libraries10.2 billion 125x125 DNA Seqs.Quality Trim (Q20)MegaBLAST-LCA alignment
Confirm with MetaPhlANUpload
Pseudomonas DensityHalf of the world under our fingertips is unknown
Power to the Soil Gave Us All Kingdoms
http://www.wsj.com/articles/big-data-and-bacteria-mapping-the-new-york-subways-dna-1423159629http://graphics.wsj.com/patho-map/?sel=stn_311
Pseudomonas putida can help absorb chemicals
HMP Comparison ShowsThat the Subway Looks Like Skin
AssociatedBody Region
Log2 Ratio of (Observed/Expected)Staphylococcus epidermidisStaphylococcus aureusAcinetobacter radioresistensPropionibacterium acnes
>600 speciesride the subway with you!
Mostly harmless.
Pathogenicity markersabsent.NY Magazine, November 5th, 2013
molecular echoes
Hourly dynamics of a populated kiosk are far more heterogeneous
8:15
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
Some areas are more stable: Gowanus Canal
blogs.ei.columbia.edu
Apr. 222015
Gowanus Canal was a methanogen heaven
Hurricane-Flooded,
Staphylococcus aureusEnterococcus faecium
All species57552Specific species10Pseudoalteromonas haloplanktis
Species diversity varies by area of the city
A persistent molecular echo of the cold, ocean water
Shewanella frigidimarina Frolova, G. M.; Gumerova, P. A.; Romanenko, L. A.; Mikhailov, V. V. (2011). "Characterization of the lipids of psychrophilic bacteria Shewanella frigidimarina isolated from sea ice of the Sea of Japan". Microbiology 80 (1): 3036An Antarctic species with the ability to produce eicosapentaenoic acid. It grows anaerobically by dissimilatory Fe (III) reduction.[1] Its cells are motile and rod shaped
EPA is obtained in the human diet by eating oily fish or fish oil, e.g. cod liver, herring, mackerel, salmon, menhaden and sardine, and various types of edible seaweed.
The Hygiene HypothesisThe hygiene hypothesis is a hypothesis that states that a lack of early childhood exposure to infectious agents, symbiotic microorganisms (e.g. gut flora or probiotics), and parasites increases susceptibility to allergic diseases by suppressing the natural development of the immune system.
"Infants born by cesarean delivery are at increased risk of asthma, obesity and type 1 diabetes, whereas breastfeeding is variably protective against these and other disorders.- Rob Knighthttp://blog.ted.com/how-microbes-could-cure-disease-rob-knight-at-ted2014/
http://j-humphries.deviantart.com/art/Forcefield-397075521
Genotype data can predict your birthplace
Genes mirror geography within EuropeNovembre et al., 2008
http://www.cnn.com/2013/09/04/tech/innovation/dna-face-sculptures/from Heather Dewey-Hagborgs Stranger Visions at Genspace (Brooklyn)
http://demographics.coopercenter.org/DotMap/index.html. Image Segmentation (BIS)
Machine-learning Image Segmentation (BIS)
Human Ancestry Prediction
YorubaLuhyaAfrican AmericanPuerto RicanSpanishTuscanEuropean-UtahBritishFinnishHan ChineseJapaneseColombianMexican
Collection Site=P01461 Demographic data Ancestry Prediction
100 -
80 -
60 -
40 -
20 -
0 -Ancestry Mapper Genetic MatchPredicted Ancestry of DNA left behindcan Mirror Census Data in White areasAfshinnekoo E, Meydan C, et al., Cell Systems, 2015.
Alleles appear more Hispanic and more Asian in downtown Manhattan
Collection Site = P00951 Demographic data Ancestry Prediction
YorubaLuhyaAfrican AmericanPuerto RicanSpanishTuscanEuropean-UtahBritishFinnishHan ChineseJapaneseColombianMexican
100 -
80 -
60 -
40 -
20 -
0 -
Ancestry Mapper Genetic MatchChinatownAfshinnekoo E, Meydan C, et al., Cell Systems, 2015.
North Harlem and Washington Heights show more Yoruban alleles and Puerto Rican alleles
YorubaLuhyaAfrican AmericanPuerto RicanSpanishTuscanEuropean-UtahBritishFinnishHan ChineseJapaneseColombianMexican
100 -
80 -
60 -
40 -
20 -
0 -
Collection Site = P00166 Demographic data Ancestry Prediction
Ancestry Mapper Genetic Match
We can detect humans molecular echoAfshinnekoo E, Meydan C, et al., Cell Systems, 2015.
75
You can choose what DNA to leave behind
https://www.google.com/patents/US8073628
Should I ride the subway?
Yes!With Ice Cream
Washington D.C.
www.metasub.org
Now a Global Effort 45 citieshttp://www.metasub.org/interactive-map.html
Optimized PCR cycles for QiaSeqFX environmental DNA samples (10-25ng)
QIAseq FX generates mechanical-quality DNA fragmentation Title, Location, Date92Sample-to-sample fragmentation reproducibility:
Customized fragment size from any input or G/C content:100ng1ug
In our hands, fragmentation profiles generated using Qiagens FX technology are highly comparable to Covaris in both reproducibility and fragment size tunability. In the top two figures, you can see that fragmentation of multiple samples using the same FX reaction condition is highly reproducible as reported by both Bioanalyzer and also insert size calculated by downstream analysis.
Likewise, in the bottom left figure, we target three different fragment sizes with each of two input amounts, demonstrating both the tunability of the kit and also the flexibility to accommodate a range of DNA inputs.
Finally, in the lower right, weve treated human genomic DNA and a bacterial DNA mixture designed to have broad G/C content with either a 5 minute or 10 minute reaction to demonstrate consistent results regardless of sample origin. 92
The Olympiome Rio 2016
BeforeDuringAfterOlympiome:Rio -2016Tokyo -2020
ExtremeContext
http:/www.extrememicrobiome.org
Extreme Microbiomes for New Biology and Drug Discovery
Biosynthetic Gene Clusters show new drugs right under our fingertips
South Ferry station
PAB03Metal payphonePAB07Plastic signPAB09Metal stairway rail
PAB031 kb
PAB07
PAB09
Mohamed Donia
Clinical Context:Precision Metagenomics
Evidence of live & antibiotic resistant bacteria
Afshinnekoo E, Meydan C, et al., Cell Systems, 2015.
Metagenomics reveals the likely source of Tetracycline resistance (TetK) on both media
Antibiotic resistance genes
Afshinnekoo E, Meydan C, et al., Cell Systems, 2015.
Examine hospital settings at Chicago (Jack Gilbert) and now at WCMC/MSK
http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1005413
Genomic Classification gives more granularity of species present
Waiting three weeks for a culture is un-ethicalAny sufficiently advanced technology is indistinguishable from magic.Arthur C. Clark
Any sufficiently advanced ignorance is indistinguishable from malice.
Still challenges on the informatics:Organism is different from pathogen.http://read-lab-confederation.github.io/nyc-subway-anthrax-study/
Nonsense mutation106
Free computational tool for anyone with suspicious metagenomic sampleshttps://science.onecodex.com/bacillus-anthracis-panel/
Regardless of source, we should be able to detect what is inside. MetaSUB 2.0 collaborating with CLC
Agreement of tools is close to real number of species in a sample
All Kingdoms Deserve Love In Study and Clinical Practice
All Kingdoms Deserve Pulverizationfor Study and Clinical Practice
MGRG polyzyme summary
Optimizing MoBio kit for better extraction
Mutanolysin Achromopeptidase Chitinase Lysozyme Lysostaphin Lyticase
@StationCDRKelly: Superb Twitter Feed!
Longitudinal, Integrative Systems Biology
Participatory Medicine with Twin Astronauts
Cells only 36 hours after being in orbit
Plasma
PBMCs
Ficollplug
Mononuclear cells^Frozen tubes*
CD4+ cells
CD8+ cells
CD19+ cellsPlasma Lymphocyte depleted cells(LD)
Blood collection processing protocol * All collections (Ground and ISS) one date missed in pre-flight collections^ Performed once on pre-flight samples# Magnetic bead based positive selection
###
Purity validationOn 1/15/15, a CPT tube was obtained from a volunteer donor and subjected to parallel processing with flight subject sample
Isolation efficiency assessment by flow cytometry performed by Dr. Brian Crucian at JSC (CD4, CD8 and CD19 staining)Antibodies: CD8-FITC, human (clone: BW135/80)CD19-PE, human (clone: LT19)CD4-APC, human (clone: M-T466)
PBMCsLymphocyte depleted cells
CD4+ cells(91%)
CD8+ cells(88%)
CD19+ cells(72%)
Nucleic Acid ExtractionQiagen AllPrep Kit for DNA & RNA (#80204)https://www.qiagen.com/us/products/catalog/sample-technologies/rna-sample-technologies/dna-rna-protein/allprep-dnarna-mini-kit
Baseline (10/16/14)Post-vaccination (10/30/14)Flight subject
Ground subjectDNARNA10KbPre-flight collections yielded high quality nucleic acids for studyCD4 CD8 CD19 LD CD4 CD8 CD19 CD4 CD8 CD19 LD 1/15/15Baseline (10/16/14)CD4 CD8 CD19 LD CD4 CD8 CD19 CD4 CD8 CD19 LD bpBaseline (12/3/14)Post-vaccination (12/13/14)1/20/1510KbbpRIN average=9.85; range=8.1-10
CD4 CD8 CD19 LD CD4 CD8 CD19 CD4 CD8 CD19 LD 1/15/15Post-vaccination (10/30/14)
CD4 CD8 CD19 LD CD4 CD8 CD19 CD4 CD8 CD19 LD Baseline (12/3/14)Post-vaccination (12/13/14)1/20/15
28/18S rRNA Quality: Good Going In
QIAGEN MagAttract DNA extraction kit
Leveraging single molecules with the 10X Chromium System
Whole Genome Sequencing (WGS) with phased reads:20-100kb molecules
38-45.3X Sequencing depth22-23K mean molecule length1.4-1.5 M GEMs detected2 lanes, 2x150 HiSeq4000
NA12878 HMW control24X increase in N50 phase block lengthAll Prep #1NA12878
We can see drops in coverage when structural variants (SV) appear
TDG gene transposition
Spliced TDG inserted hereExonsExonic barcode signals
Scott Kelly ISS for one yearMark Kelly Earth control
Telomere Length
DNA Mutations & Structural VariationDNA Hydroxy-methylationChromatin
(small & large)RNA expression& RNA MethylationProteomics
Antibody TitersCytokines
DNA Methylation
B-cells / T-cells
Targeted and Global MetabolomicsMicrobiome
Cognition
Vasculature
These People are Awesome@mason_lab
133
Thanks to the Swabbing Teams! www.pathomap.org/people/
Gratitude to Many People and PlacesIlluminaGary SchrothMarc Van OeneUniv. ChicagoYoav GiladYale UniversityNenad SestanSherman Weissman
FDA/SEQC/Fudan Univ.Leming ShiNIH/UDP/NCBIJean & Danielle Thierry-MiegBaylorJeff RogersMSKCCDanwei HuangfuChristina LeslieRoss Levine
HudsonAlphaShawn LevyMason LabEbrahim AfshinnekooSofia AhsanuddinNoah AlexanderPradeep AmbroseMarjan BozinoskiDhruva ChandramohanSagar ChhangawalaShanin ChowdhuryJorge GandaraFrancine Garrett-BakelmanElizabeth Hnaff Sheng LiAlexa McIntyreCem MeydenLenore PipesDarryl ReevesYogesh SaletorePriyanka VijayCornell/WCMCJason BanfelderScott BlanchardSelina Chen-KiangOlivier ElementoYariv HouvrasSamie JaffreyAri MelnickMargaret RossAdam SiepelEpigenomics Core
Horner LabStacy HornerIcahn/MSSMEric Schadt, Andrew Kasarskis,Joel Dudley, Ali Bashir, Bobby SebraABRFGeorge GrillsScott TigheDon Baldwin
UMMSMaria E Figueroa
AMNHGeorge AmatoMark Sidall@mason_lab
NYUJane CarltonJulia Maritz
138