2013 oct 2 rna sequencing

59
RNA-sequencing: Taking Advantage of this Measurement Revolution October 1, 2013 Anne Deslattes Mays Wellstein/Riegel Laboratory Mentor: Anton Wellstein, MD, PhD 01/03/2022 Wellstein/Riegel Laboratory 1

Upload: anne-deslattes-mays

Post on 11-May-2015

186 views

Category:

Technology


0 download

TRANSCRIPT

04/12/2023 Wellstein/Riegel Laboratory 1

RNA-sequencing: Taking Advantage of this Measurement Revolution

October 1, 2013Anne Deslattes Mays

Wellstein/Riegel LaboratoryMentor: Anton Wellstein, MD, PhD

04/12/2023 Wellstein/Riegel Laboratory 2

Talk Outline

• On the Shoulders of Giants• Timelines• Personal Genome Project• RNA-Sequencing• Causality• Messenger Therapeutics

04/12/2023 Wellstein/Riegel Laboratory 3

Rosalind Franklin“pioneered use of x-rays to create images of unorganized matter – such as

large biological molecules – not just single crystals”

http://www.pbs.org/wgbh/aso/databank/entries/bofran.html

“Franklin made equipment adjustments to produce an extremely fine beam of x-rays. She extracted finer DNA fibers than ever before and arranged them in parallel bundles. Studied fibers’ reactions to humid conditions. … allowed her to discover cruical keys to DNA’s structure…. Wilkins shared this with Watson & Crick at Cambridge without her knowledge…”

04/12/2023 Wellstein/Riegel Laboratory 4

04/12/2023 Wellstein/Riegel Laboratory 5

04/12/2023 6

Computer Architecture Advances (64 bit)1961

IBM 7030 Stretch Supercomputer64 bit data words 32/64 bit instructions

1976

Cray-1 super computer 64-bit word architecture

1989

Intel i860 RISC processor“64-bit microprocessor”32 bit architecture3D graphics unit capable of 64 bit integer operations

1991

R4000 – 64 bit microprocessorSGI graphics workstation used this CPU

1992

DEC introduces pure 64-bit Alpha architecture

1997

IBM releases RS6464-bit PowerPC (partial)

1999

Intel releases instruction set for IA-64

2003

AMD Opteron and Athlon 64 processors (AMD64 – first x86 based 64 bit processorApple ships “G5” POWER PC CPU

2013

Apple announces iPhone 5s first 64 bit smartphone in the worldA7ARMv8 system on a chip

04/12/2023 7

Computer Architecture Advances (64 bit)1961

IBM 7030 Stretch Supercomputer64 bit data words 32/64 bit instructions

1976

Cray-1 super computer 64-bit word architecture

1989

Intel i860 RISC processor“64-bit microprocessor”32 bit architecture3D graphics unit capable of 64 bit integer operations

1991

R4000 – 64 bit microprocessorSGI graphics workstation used this CPU

1992

DEC introduces pure 64-bit Alpha architecture

1997

IBM releases RS6464-bit PowerPC (partial)

1999

Intel releases instruction set for IA-64

2003

AMD Opteron and Athlon 64 processors (AMD64 – first x86 based 64 bit processorApple ships “G5” POWER PC CPU

2013

Apple announces iPhone 5s first 64 bit smartphone in the worldA7ARMv8 system on a chip

04/12/2023 8

Computer Operating Systems (64 bit)1985

Cray releases UNICOS64 bit implementation of unix

1976

Cray-1 super computer 64-bit word architecture

1993

DEC releases DEC OSF/1 AXP Unix-like OSLater Named Tru64 UNIX

1991

R4000 – 64 bit microprocessorSGI graphics workstation used this CPU

1996

IRIX operating system supports 64 bit

2001

Linux first OS to support x86-64 (simulator – chip wasn’t there yet)

1999

Intel releases instruction set for IA-64

2003

Mac OS X 10.3 64 bit integer arithmetic support

2013

iOS7 AArch64 processors 65 bit kernal supporting 64 bit applications

Celera Infrastructure Choice 1998

Brian ReidPalo Alto IX Visit

1998

Bench marked TIGR assembler on available architecturesSGI, Sun SPARCIBM RISCDEC TRU64 Alpha

1998

DEC’s TRU 64 Architecture won out

1998

COMPAQ buys DEC

04/12/2023 Wellstein/Riegel Laboratory 10

04/12/2023 Wellstein/Riegel Laboratory 11

04/12/2023 Wellstein/Riegel Laboratory 12

04/12/2023 Wellstein/Riegel Laboratory 13

http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_Conference

04/12/2023 Wellstein/Riegel Laboratory 14

http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_Conference

04/12/2023 Wellstein/Riegel Laboratory 15

http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_Conference

04/12/2023 Wellstein/Riegel Laboratory 16

04/12/2023 Wellstein/Riegel Laboratory 17

04/12/2023 Wellstein/Riegel Laboratory 18

04/12/2023 Wellstein/Riegel Laboratory 19

04/12/2023 Wellstein/Riegel Laboratory 20

http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_Conference

04/12/2023 Wellstein/Riegel Laboratory 21

http://fora.tv/2013/04/25/Harvard_Professor_George_Church_Opens_the_GET_Conference

Cancer Systems BiologyTaking advantage of measurement revolution

Declining sequencing costs, decreasing computing costsHow do you leverage all this data?

GEO May 25, 2012

GEO June 25, 2013

04/12/2023 Wellstein/Riegel Laboratory 23

Here is an example RNA-Seq Workflow

Experimental Design

Sample Collection

Quality Control Read Trimming

Differential Analysis

Transcript Identification

Pathway Analysis

FeatureDiscovery

Sequencing

04/12/2023 Wellstein/Riegel Laboratory 24

http://rnaseq.uoregon.edu/index.html

04/12/2023 Wellstein/Riegel Laboratory 25http://rnaseq.uoregon.edu/index.html

04/12/2023 Wellstein/Riegel Laboratory 26http://rnaseq.uoregon.edu/index.html

04/12/2023 Wellstein/Riegel Laboratory 27

http://rnaseq.uoregon.edu/index.html

04/12/2023 Wellstein/Riegel Laboratory 28

Replicates: Type I and Type II errors

04/12/2023 Wellstein/Riegel Laboratory 29

Detecting Signal vs. Noise

04/12/2023 Wellstein/Riegel Laboratory 30

04/12/2023 Wellstein/Riegel Laboratory 31

RNA-seq

04/12/2023 Wellstein/Riegel Laboratory 32

What is unique about RNA-Seq?

• Allows you to discover and profile the entire transcriptome of any organism

• No probes or primers to design• Novel transcripts• Novel isoforms• Alternative splice sites• Rare transcripts• cSNPS – all of this in one experiment

04/12/2023 Wellstein/Riegel Laboratory 33

04/12/2023 Wellstein/Riegel Laboratory 34

04/12/2023 Wellstein/Riegel Laboratory 35

RNA Alternative Splicing: Why you need gapped aligners

04/12/2023 Wellstein/Riegel Laboratory 36

How much RNA-sequencing data?1. 20 million paired end reads ~ 2 GB of data2. 100 million paired end reads ~ 10 GB of data

How much computation power?3. More memory, more processors, less time it takes to compute4. Outsource the analysis, still will need to store the results somewhere

Amazon web servicesS3 storageEC elastic cloud on demand computational facility

Georgetown University High Performance Computer Corematrix.georgetown.edu

UPENN Galaxy services

How much RNA-sequencing data, how much computation power and where do you go to compute?

04/12/2023 Wellstein/Riegel Laboratory 37

Galaxy is a web based tool committed to enable a researcher (more than just for RNA-Seq)

04/12/2023 Wellstein/Riegel Laboratory 38

04/12/2023 Wellstein/Riegel Laboratory 39

How to visualize mapped results?

• UCSC Genome Browser (Gbrowse)• Integrated Genome Browser (IGB)• Integrated Genome Viewer (IGV)

Many shared formats, reading many of the outputs generated by the programs, ability to generate ones own tracks

04/12/2023 Wellstein/Riegel Laboratory 40

04/12/2023 Wellstein/Riegel Laboratory 41

04/12/2023 Wellstein/Riegel Laboratory 42

What do RNA-Seq reads look like for GAPDH?

Repeat masked allowing 1/2 mismatched bases blat’d reads viewed in IGB 6.7.2

04/12/2023 Wellstein/Riegel Laboratory 44

RNA-Seq Differential Expression analysis

What does GAPDH look like in terms of quantitation?

TOTAL BM HPPRPKM 3SEQ Counts BLAT Reads RPKM 3SEQ Counts BLAT Reads

CD34 0.7 340 230 8 8 14BST1 19.7 5374 31 31 CD133 0.2 173 176 16 16 33THY1 0 7 4 4 A12 1 0A5 0 0ALK 0 9 24 0 0 3B9 0 0C1 0 0C2 0 0C7 0 0E7 0 0E9 2 0F6 0 0G12 0 0GAPDH 3013.2 727831 356289 120.8 5559 2670H3 0 0

Blat read raw counts ratio == 3Seq counts ratio ~= 130 to 1RPKM ratio ~= 24.3

04/12/2023 Wellstein/Riegel Laboratory 46

04/12/2023 Wellstein/Riegel Laboratory 47

04/12/2023 Wellstein/Riegel Laboratory 48

04/12/2023 Wellstein/Riegel Laboratory 49

Given a list of differentially expressed Genes now enrichment analysis should be performed

• Enrichment analysis allows the researcher to leverage documented experiments which provide evidence for genes roles in pathways and functions that enable the researcher to determine the results and significance of their experiments

• DAVID– Gene ontology– Functional ontology

• Revigo– Output of David may be placed in REVIGO for further

interpretation and statistical exploration of significance of discovered sets of genes

04/12/2023 Wellstein/Riegel Laboratory 50

Using differentially expressed genes, biological pathways should be explored

• Differentially expressed genes are put into programs such as pathway studio or ingenuity

• Shortest path programs and• Canonical pathway analysis• Enables a researcher to reverse engineer the pathways

expressed in the course of a healthy response to a diseased response

• Ideally a pathway reveals the observed phenotype – connecting the expressed gene expression program with the phenotype – genotype – gene expression program to phenotype

04/12/2023 Wellstein/Riegel Laboratory 51

04/12/2023 Wellstein/Riegel Laboratory 52

http://bayes.cs.ucla.edu/home.htm

04/12/2023 Wellstein/Riegel Laboratory 53

04/12/2023 Wellstein/Riegel Laboratory 54

04/12/2023 Wellstein/Riegel Laboratory 55

RNA-Sequencing: What is it good for?

• Transcript Annotation– Mutation identification– Isoform determination– Alternative Splice Variation

• Differential Gene Expression– Phenotypically segregating experiments– Allows us to get at the How in looking at the response of an

organism within a particular cell population to events– Good and careful design will allow us to unfold the

dynamics of this response and identify targets for altering disease responses to improve ones chances of surviving

04/12/2023 Wellstein/Riegel Laboratory 56

Thank-you

Dr. Anton WellsteinDr. Anna Riegel

Dr. Marcel SchmidtDr. Elena TassiThe entire lab: Elena, Virginie, Ghada, Ivana, Eveline, Khalid, Eric the entire Wellstein/Riegel laboratory My Committee Dr. Yuri GusevDr. Anatoly DritschiloDr. Michael JohnsonDr. Christopher LoffredoDr. Habtom RessomDr. Terry Ryan (external committee member)

High Performance Core Group, Steve Moore, especially Woonki ChungAmazon Cloud ServicesDr. Ann Loraine, UNC, IGB DeveloperBrian Haas, Author Trinity SuiteKeygene

04/12/2023 Wellstein/Riegel Laboratory 57

Some Resources

• http://personalgenome.org• http://rnaseq.uoregon.edu/index.html• http://dx.doi.org/10.1038/npre.2010.4282.1 (DESeq)• http://galaxy.psu.edu/• http://seqanswers.com/• http://www.broadinstitute.org/igv/• http://bioviz.org/igb/index.html• http://www.illumina.com• http://www.otogenetics.com• http://www.dnanexus.com• http://bioconductor.org/packages/2.12/bioc/html/limma.html• http://trinityrnaseq.sourceforge.net/• http://trinityrnaseq.sourceforge.net/genome_guided_trinity.html• http://cufflinks.cbcb.umd.edu/• http://brb.nci.nih.gov/BRB-ArrayTools.html• http://www.modernatx.com/

04/12/2023 Wellstein/Riegel Laboratory 58

Systems Biology History (wikipedia)

• Systems biology roots found in– Quantitative modeling of enzyme kinetics– Mathematical modeling of population growth– Simulations to study neurophysiology– Control theory and cybernetics

• Theorists– Ludwig von Bertalanffy – General Systems Theory– Alan Lloyd Hodgkin and Andrew Fielding Huxley – constructed a

mathematical model that explained potential propagating along the axon of a neuron cell

– Denis Nobel – first computer model of the heart Pacemaker

04/12/2023 Wellstein/Riegel Laboratory 59

Scientific knowledge is limited (and advanced) by the limits (and advancements) of measurement

• Ilya Shmulevich Genomic Signal Processing “Validity of the model involves observation and measurement, scientific knowledge is limited by the limits of measurement”

• Erwin Shrödinger Science Theory and Man: “It really is the ultimate purpose of all schemes and models to serve as scaffolding for any observations that are at all means observable”