genomics: understanding the blueprint of life sujay datta statistical center for hiv/aids research...
TRANSCRIPT
![Page 1: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/1.jpg)
Genomics: Understanding the Blueprint of Life
Sujay DattaStatistical Center for HIV/AIDS Research & Prevention
(Vaccines & Infectious Diseases Institute)Fred Hutchinson Cancer Research Center
![Page 2: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/2.jpg)
Gregor Mendel
![Page 3: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/3.jpg)
Figure 10.1
Genes: The Unit of Inheritance
![Page 4: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/4.jpg)
The Structure of DNA
![Page 5: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/5.jpg)
The person who made a silent contribution to this great discovery
![Page 6: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/6.jpg)
The DNA Double Helix:
![Page 7: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/7.jpg)
Figure 10.10a & b
The Double Helix
![Page 8: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/8.jpg)
Figure 10.10a & b
Watson-Crick Base Pairs
![Page 9: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/9.jpg)
The DNA Chains Are Anti-Parallel
![Page 10: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/10.jpg)
“It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.” Watson and Crick, Nature 171, 737-738 (1953)
![Page 11: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/11.jpg)
Central Dogma of Molecular Biology
![Page 12: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/12.jpg)
A single aa change in hemoglobin causes sickle
cell anemia
![Page 13: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/13.jpg)
Levels of chromosomal organization
![Page 14: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/14.jpg)
A busy person reading a very, very long newspaper column!
![Page 15: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/15.jpg)
Figure 11.2
Information Flow in the Cell
![Page 16: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/16.jpg)
Transcription of DNA to RNA
![Page 17: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/17.jpg)
Genes can be expressed at different levels in different tissues
or organisms
![Page 18: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/18.jpg)
Figure 11.23
Intervening Sequences
![Page 19: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/19.jpg)
Figure 11.23
RNA splicing removes introns from the pre-mRNA and converts it to
mRNA
![Page 20: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/20.jpg)
Translation of spliced mRNAs to proteins occurs at the ribosomes
![Page 21: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/21.jpg)
Figure 11.41
The Genetic Code
![Page 22: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/22.jpg)
Measuring gene expression
• So the expression levels of genes in different tissues, organs or individuals can be measured by quantifying the amounts of mRNAs they produce. The totality of all mRNAs produced from an organism’s genome is its transcriptome
• Expression levels of genes in different tissues, organs or individuals can also be measured by measuring the amounts of proteins they code for (the totality of all proteins coded for by an organism’s genome is its proteome)
• The 1st method is a bit indirect but more manageable, as the proteome is much larger than the transcriptome
![Page 23: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/23.jpg)
Microarrays: A revolution in gene expression profiling
• Until the 1990s, could only measure the expression levels of a few genes at a time---at great expenses
• In came microarrays that, for the first time, enabled us to measure the expression of thousands of genes at once (the current versions of them can handle entire genomes)
• Types of microarrays: Short oligo- nucleotide (Affymetrix), long oligo-nucleotide (Agilent), nylon bead arrays (Illumina), cDNA arrays
• Other technologies: SAGE, MPSS
![Page 24: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/24.jpg)
Source: Affymetrix website
![Page 25: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/25.jpg)
Raw Microarray Image
![Page 26: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/26.jpg)
Custom software: getting representative value of a probe cell
![Page 27: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/27.jpg)
Microarray data: Promises & Problems• Data: log_2 (fluorescence intensity)• Which genes are significantly
differentially expressed between 2 individuals or conditions (at a particular time-instant)?
• Which genes show significant changes in expression over time?
• Which genes have expression levels that are correlated with some external variable?
• For a given pathway, which of the genes in our collection are most likely to be involved?
• For a diffuse disease, which genes are associated with different outcomes?
Problems: Many sources of noise and variation
• Spatial artifacts due to manufacturing defects
• Contamination by RNA and other substances from unwanted sources
• Batch-to-batch variation• Experimenter-to-experi-
menter variation• Day-to-day variation Problem: Cost. They have
become cheaper over the years. Still difficult to afford > 250-300
![Page 28: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/28.jpg)
Steps to extract meaningful information from microarrays
– preprocessing of the images
– normalization of the data from the images
– Probe-level modeling to extract expression level data
– gene-filtering– clustering– relating to biologic data
from other sources such as a pathway database or an annotation database
• Need replicates of each gene-individual combination to ensure a good estimate of the random error (i.e., the variation that still remains after getting rid of all known sources of variation)
• Need good models to take into account all systematic sources of variation
• Avoid confounding between the different experimental conditions (treatment/control or cancer/normal) with the systematic sources of var.
![Page 29: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/29.jpg)
Steps in image analysis
1. Addressing. Estimate location of spot centers.
2. Segmentation. Classify pixels as foreground (signal) or background.
3. Information extraction. For each spot on the array and each dye
• signal intensities;• background intensities; • quality measures.
![Page 30: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/30.jpg)
Segmentation
Adaptive segmentation, SRG Fixed circle segmentation
Spots usually vary in size and shape.
![Page 31: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/31.jpg)
Normalization is needed to minimize non-biological variation between arrays. It basically means an ‘intensity alignment’ between arrays
![Page 32: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/32.jpg)
Probe-level model for the normalized and background-corrected data
• The LHS is the normalized log expression value for the i-th probe, j-th array, k-th dye (in the case of a 2-color array) and l-th replicate.
• The fit from this model is the final summary out-put of the preprocessing steps. It is a huge matrix and is called an expression-set
* [ijkl i j k ij ik jk ijkly
![Page 33: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/33.jpg)
Finding “interesting” genes
• Now, for each array, we have obtained expression-level data as columns of a huge matrix (expression-set)
• The next step is to select those genes that have “interesting” expression levels.
• “Interesting” is interpreted in many different ways:– high levels of expression in a subgroup of interest– lack of expression in a subgroup of interest– pattern of expression that correlates well with
experimental conditions or certain covariates– pattern of expression that correlates well with time
![Page 34: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/34.jpg)
Methods to detect “interesting” genes
1. Fold-changes – ratio of expression levels between two groups (biologists are most comfortable with it)
2. t-tests – now statistical variation comes in to play3. Other statistical models: ANOVA, Cox Model, etc.4. For large enough samples we can tailor the test to
the distribution (which may be different in the 2 groups)
5. Whenever inference is being drawn about many, many genes simultaneously, a multiplicity adjustment to the p-values must be made
![Page 35: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/35.jpg)
Why need multiplicity-adjusted tests?
• Let A,B and C be events with P(A)=a, P(B)=b, P(C)= c. Then P(A* and B* and C*) >= 1-[P(A)+P(B)+P(C)], where A*, B* and C* are the complements of A,B,C
• If we are testing 3 hypotheses, each at level .05, let A={type-I error is made in hypo. #1}, B={type-I error made in hypo. # 2}, C={type I error made in hypo# 3}
• Then all we can say is: P(All 3 decisions are correct) is at least 1-(.05+.05+.05) or .85 (i.e., the chances of at least one type-I error can be as high as 15%)
• Now imagine testing 5000 or 10000 hypotheses!• That’s why the cut-off values for the tests should be
adjusted so that P(1 or more type-I error) <= .05
![Page 36: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/36.jpg)
Filtering genes• A filter is a mechanism for removing a gene from
further consideration• Want to reduce the number of genes under
consideration so that we can concentrate on those that are more interesting (it is a waste of resources to study genes that are not likely to be of interest)
Non-specific filtering:• at least k (or a proportion p) of the samples must
have expression values larger than some specified amount, A.
• the gene should show sufficient variation to be interesting:– either a gap of size A in the central portion of the data– or a interquartile range of at least B
![Page 37: Genomics: Understanding the Blueprint of Life Sujay Datta Statistical Center for HIV/AIDS Research & Prevention (Vaccines & Infectious Diseases Institute)](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ca75503460f9496949e/html5/thumbnails/37.jpg)
Gene-set enrichment analysis
• Suppose you have a short-list of genes that have significant p-values from a statistical test (say, for differential expression or temporal change in profile)
• Now you want to see if a given set of genes (that are known to be of interest to biologists or belong to a crucial pathway, etc.) is over-represented in this list
• Ex: In a list of ~ 5000 significantly differentially ex- pressed genes between acutely HIV-infected and healthy individuals, are the NK-cell KIR genes over-represented (compared to those not diff. exp.)?
• 2X2 contingency table (diff. exp. or not, NK-cell KIR or not) • Fisher’s exact test (or hypergeometric test)