functional genomics ihyphy.org/w/images/0/03/20140217_week7lecture_jyoung.pdf · functional...

Functional Genomics IMED263: Bioinformatics Applications to Human Disease

Jason Young | Email: [email protected] | MED 263 | Winter 2015

What You Will Learn Today...

• Functional genomic methods for gene expression analysis

• Typical workflow for a gene expression study

• Aspects of microarray data analysis• Kicic et al. (2010): Example of a

differential expression microarray study


The Central Dogma of Biology



Genomics

Transcriptomics

Proteomics


Genomics

Transcriptomics

ProteomicsFunctional Genomics



Functional Genomics

“Fishing Expeditions”


Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven” research.

Functional Genomics




“Without speculation there is no good and original observation” - Charles Darwin

Functional Genomics



Use functional genomics data to generate hypotheses that can then be tested with further experimentation.

“Without speculation there is no good and original observation” - Charles Darwin


History of Transcript AnalysisPre-Functional Genomics: One transcript at a time

- Northern blotting (1977)- Reverse Transcriptase PCR (RT-PCR)- RNase protection

* Highly-quantitative, still essential for validation of functional genomic gene expression results

Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s)- Serial Analysis Gene Expression (SAGE) (mid 1990s)- DNA Microarrays (late 1990s)- RNA-Seq (late 2000s)

* Less-quantitative, but provide a rapid, broad overview of genome-wide transcript abundance




* Highly-quantitative, still essential for validation of functional genomic gene expression results.

Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s)- Serial Analysis Gene Expression (SAGE) (mid 1990s)- DNA Microarrays (late 1990s)- RNA-Seq (late 2000s)

* Less-quantitative, but provide a rapid, broad overview of genome-wide transcript abundance.





Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s) - Serial Analysis Gene Expression (SAGE) (mid 1990s)- DNA Microarrays (late 1990s)- RNA-Seq (late 2000s)



cDNA Libraries

cDNA Library Construction 1. Isolate mRNA from organism,

cell type, developmental stage, or physiological condition

2. Reverse transcribe to cDNA3. Clone into a vector for

propagation in bacteria4. Sequence cDNA inserts to

produce Expressed Sequence Tags (ESTs)

5. ESTs represent a sampling of the expression repertoire of the original samples

Shotgun Single-Pass Approach Adams et al. 1991, 1993


cDNA Libraries

Caveats of cDNA libraries and ESTs 1. Time consuming and laborious2. Depth of sequencing of library

determines how well rare transcripts are represented (counting)

3. Incomplete transcripts often present (5’ end missing)

4. Clones can be used to express protein products in addition to measuring ESTs

Shotgun Single-Pass Approach Adams et al. 1991, 1993





Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s)- Serial Analysis Gene Expression (SAGE) (mid 1990s) - DNA Microarrays (late 1990s)- RNA-Seq (late 2000s)



Serial Analysis Gene Expression (SAGE)

SAGE Library Construction 1. Isolate mRNA from organism, cell

type, developmental stage, or physiological condition

2. Reverse transcribe to cDNA (with biotin tag)

3. Cleave w/ AE & attach to beads4. Divide into two pools and ligate

distinct linkers (A & B)5. Cleave using blunt end TE6. Perform blunt ligation to generate

ditags7. Concatenate, clone and

sequence

Velculescu et al. 1995


Serial Analysis Gene Expression (SAGE)

Caveats of SAGE libraries 1. Still time consuming and

laborious, but shorter tags make SAGE more cost efficient than EST libraries (specialized skill)

2. Only detect 3’ end of transcripts and relies on the presence of an appropriately spaced AE site

3. Relies on counting like ESTs, although genes expressed at low levels are difficult to reproduce

4. Like cDNA libraries, no need for knowledge of genome sequence to obtain tags (not true of microarrays)

Velculescu et al. 1995





Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s)- Serial Analysis Gene Expression (SAGE) (mid 1990s)- DNA Microarrays (late 1990s) - RNA-Seq (late 2000s)



DNA Microarrays


DNA Microarrays

Sequences need to be known for probe design

Relies on hybridization rather than sequence counting


DNA Microarrays

Sequences need to be known for probe design

Relies on hybridization rather than sequence counting

Fast: Can obtain genome-wide transcript levels in days

Comprehensive: Entire transcriptomes can be represented on one array

Flexible: Probes against any gene can be represented on a chip.

Affordable: Technology is >10 years old.


Types of DNA Microarrays

Spotted Array Generally 60-80 nucleotidesSpotted mechanicallyGenerally <10k features+s: flexibility-s: low-density, reproducibilityDual color (intra array)

In Situ Synthesized 25-80 nucleotidesGenerated using photolithography>1 million features, static+s: high-density, reproducibility-s: flexibilitySingle or dual color (inter or intra array)



Spotted Array Generally 60-80 nucleotidesSpotted mechanicallyGenerally <10k features+ flexibility- low-density, reproducibilityDual color (intra array)






Cy5 (Red)

Cy3 (Green)




In Situ Synthesized 25-80 nucleotidesGenerated using photolithography>1 million features, static+ high-density, reproducibility- flexibilitySingle or dual color (inter or intra array)

Cy5 (Red)

Cy3 (Green)




In Situ Synthesized 25-80 nucleotidesGenerated using photolithography>1 million features, static+ high-density, reproducibility- flexibilitySingle or dual color (inter or intra array)

Affymetrix / Nimblegen (Roche) / Agilent / Illumina

Cy5 (Red)

Cy3 (Green)



Affymetrix GeneChips • Traditionally have

dominated the market• 11-20 distinct 25nt

probes measure expression of each gene

• Attempted to account for non-specific hybridization to PM probes using MM probes



Nimblegen - Madison, WI



Nimblegen (Roche)



Agilent



Illumina



Illumina

23 & Me


But Wait!?! Who uses microarrays anymore?!?


Microarrays vs NGS


Microarrays

Microarrays vs NGS


Microarrays NGS


Microarrays NGS

Q1: Do you know what transcripts you’re looking for?

Microarrays vs NGS


Microarrays NGS

Yes No

Q1: Do you know what transcripts you’re looking for?

Microarrays vs NGS


Microarrays NGS

Microarrays vs NGSQ2: Do you have a lot of money to spend on experiments?


Microarrays NGS

YesNo

Microarrays vs NGSQ2: Do you have a lot of money to spend on experiments?


Microarrays NGS

Microarrays vs NGSQ3: Do you want to rely on the most well-tested and developed methods?


Microarrays vs NGS

From: David M. Rocke, UC-Davis


Microarrays NGS

Yes No

Microarrays vs NGSQ3: Do you want to rely on the most well-tested and developed methods?

Other Microarray Applications

• Genotyping arrays

• Methylation arrays

• Target enrichment (pre-sequencing)

• Rapid pathogen detection in-the-field (Influenza sub-typing)

• Protein arrays (parallelized ELISA)

• Antibody arrays

• High-throughput standardized testing (drug development) ($$$)


1 Knife = 1 Knife


Microarrays NGS

Microarray Gene Expression Workflow

1. Experimental Design

2. RNA Isolation and Labeling

3. Hybridization

4. Preprocessing

5. Data Analysis

6. Biological Confirmation


Experimental Design

Define Biological Question and Samples Needed • Tissue comparison. Ex. Regions of the brain• Time course. Ex. Pathogen life cycle• +/- Treatment. Ex. Drug treatment

Determine Appropriate Array Platform and Labeling Procedures • Are arrays commercially available for your purpose?• 1 or 2 color labeling needed? (2-color requires reverse labeling)• Amount of material needed? (1 to 5 ug total RNA/sample)• Make sure probes are randomized on an array

Plan entire workflow ahead of time to maximize experimental control • Prepare a well defined sample preparation procedure!• Do all steps for samples in parallel if possible, from RNA isolation, to

labeling, to hybridization, to scanning (same person and machine too).





3. Hybridization

4. Preprocessing

5. Data Analysis



RNA Isolation and Labeling

Isolate RNA • Total RNA with Trizol• Further isolation of mRNA if needed• Assess quality of RNA

Agilent 2100 Bioanalyzer • Calculates RNA Integrity Number (RIN)• Examines the entire electrophoretic trace of the RNA sample including the presence/absence of degradation products


RNA and Probe Preparation

Direct Labeling



Indirect Labeling• Improved

efficiency of nucleotide incorporation

Direct Labeling



Affymetrix Protocol Indirect labeling w/ Amplification (1 color)

1. Reverse Transcription2. In Vitro Transcriptionto produce cRNA (signal amplification)





3. Hybridization

4. Preprocessing

6. Data Analysis



Hybridization

Affymetrix Protocol

1. Pre-Hyb (10’)2. Hyb (16hr)3. Streptavidin -

Phycoerythrin (SAPE)

4. anti-SA Ab-biotin(more signal amplification!)5. SAPE6. Scan


Hybridization

Affymetrix Protocol

1. Pre-Hyb (10’)2. Hyb (16hr)3. Streptavidin -

Phycoerythrin (SAPE)

4. anti-SA Ab-biotin(more signal amplification!)5. SAPE6. Scan

~3 days from RNA isolation to scan



Why all the signal amplification?



Why all the signal amplification?

1-5 ugtotal RNA





3. Hybridization

4. Preprocessing

5. Data Analysis



PreprocessingGoal: To remove the systematic bias in the data as completely as possible while

preserving the variation in gene expression that occurs because of biologically relevant changes in transcription


Preprocessing

Steps:

• Quantitation - Convert image into a series of numbers (image analysis) (.CEL Files).• Data import - Data must be collated from different formats housed in different files/

databases.• Quality assessment - Detects divergent measurements beyond the level of random

fluctuations. • Background adjustment - Adjustment of observed expression levels to account for

non-specific hybridization (noise).• Normalization - Allows for arrays to be compared to one another by controlling for

different efficiencies of reverse transcription, labeling or hybridization reactions, physical problems with the arrays, reagent batch effects and different laboratory conditions.

• Summarization - Combines multiple probe intensities for a particular gene to produce a single expression value for that gene.

Goal: To remove the systematic bias in the data as completely as possible while preserving the variation in gene expression that occurs because of biologically

relevant changes in transcription


Quality Assessment• First thing to do: obtain overview of array signal • Box plots and histograms• Identify outlier arrays by examining probe intensities across all arrays at once.• Array “f” appears to stand out in box plot (Note: normalization can often correct thisdifference)• Array “a” appears to have a bimodal distribution in the histogram which usually indicatesa spatial artifact, i.e. large section of the array has abnormally high values.

Arrays

log

Inte

nsity

log Intensity


Quality Assessment• First thing to do: obtain overview of array signal • Box plots and histograms• Identify outlier arrays by examining probe intensities across all arrays at once.• Array “f” appears to stand out in box plot (Note: normalization can often correct thisdifference)• Array “a” appears to have a bimodal distribution in the histogram which usually indicatesa spatial artifact, i.e. large section of the array has abnormally high values.

Arrays

log

Inte

nsity

log Intensity

What do you see?


• First thing to do: obtain overview of array signal• Box plots and histograms• Identify outlier arrays by examining probe intensities across all arrays at once.• Array “f” appears to stand out in box plot (Note: normalization can often correct thisdifference)• Array “a” appears to have a bimodal distribution in the histogram which usually indicatesa spatial artifact, i.e. large section of the array has abnormally high values.

Arrays

log

Inte

nsity

log Intensity

Quality Assessment


• First thing to do: obtain overview of array signal • Box plots and histograms• Identify outlier arrays by examining probe intensities across all arrays at once.• Array “f” appears to stand out in box plot (Note: normalization can often correct thisdifference)• Array “a” appears to have a bimodal distribution in the histogram which usually indicatesa spatial artifact, i.e. large section of the array has abnormally high values.

Arrays

log

Inte

nsity

log Intensity

Quality Assessment


Raw Image Inspection

Crop circles

Ring of fire

Full moon

Tricolor

Thumb print

Arcs

http://plmimagegallery.bmbolstad.com/


http://plmimagegallery.bmbolstad.com

Preprocessing

Steps:










Background Adjustment

• Background noise distribution calculated using negative controls or empty spots• Subtract background noise from raw probe intensities


Preprocessing

Steps:










Why do we need normalization?

• Some arrays are brighter than others.• Not due to the biological data but to

unavoidable experimental differences.• Goal: Normalization corrects this kind of

difference w/o altering the biological data so that cross array analyses can be conducted (differential expression, etc.).




unavoidable experimental differences.• Goal: Normalization aims to correct this kind

of difference w/o altering the biological data so that cross array analyses can be conducted (differential expression, etc.).


Why do we need normalization?Before Normalization






After Normalization

Before Normalization





Scatter PlotsSimple to compare inter-array expression, no normalization


Scatter PlotsSimple to compare inter-array expression, no normalization

Genes on 45 degree angle expressed the same in both

1 - Higher expressed genes in Control2 - Higher expressed genes in Downs3 - Low expression genes in both4 - High expression genes in both


Why Log Transformation?• Experimentalists using microarrays are very often interested in fold change• Log scale provides symmetry in expression ratios• Example: 2-fold up-regulation = 2, but 2-fold down-regulation= 0.5• Without transformation, all down-regulated fold changes compressed between 0 and 1


Why Log Transformation?• Experimentalists using microarrays are very often interested in fold change• Log scale provides symmetry in expression ratios• Example in raw ratio space: 2-fold up-regulation = 2, but 2-fold down-regulation= 0.5• Without transformation, all down-regulated fold changes compressed between 0 and 1

t1 t2 t3

Raw Ratio 1 2 0.5

Log2 Ratio 0 1 -1


MA plots are used to determine data needs normalization and assess if the normalization worked (sideways scatter plot).

M = log fold change for a gene xA = average log intensity for gene x

• A local regression (LOESS) curve can be fitted to the scatter plot to summarize non-linear data.

• A LOESS curve that oscillates and/or has variability of M values greater than other arrays indicates an issue.

MA Plots

Arra

y1/A

rray2

Arra

y1/A

rray2

Before Norm.

After Norm.


MA plots are used to determine data needs normalization and assess if the normalization worked (sideways scatter plot).

M = log fold change for a gene xA = average log intensity for gene x

• A local regression (LOESS) curve can be fitted to the scatter plot to summarize non-linear data.

• A LOESS curve that oscillates and/or has variability of M values greater than other arrays indicates an issue.

• Instead of 1-to-1 comparisons, each array can also be compared to a “synthetic” array calculated by taking probe-wise medians

MA Plots


Normalization Strategies

Simplest idea:• Calculate median expression from all arrays • Do global normalization by multiplying all probes by a normalization constant

However...

Often there is a non-linear dependence on intensity




However...


Array 1 Array 2

Median Expression 5,000 10,000

Normalization Factor 2 1

Normalized Mean

Expression10,000 10,000

Global Normalization




However...




Gene 1 Gene 2 Gene 3 Gene 4 Total Reads

Sample 1 10,000 100 150 200 10,450

Sample 2 20,000 10 150 200 20,360

Global Normalization - NGS


Before



Sample 1 10,000 100 150 200 10,450

Sample 2 20,000 10 150 200 20,360

Global Normalization - NGS



Sample 1 14,742 147 221 294 15,405

Sample 2 15,133 8 113 151 15,405

Before

After

Normalization StrategiesParametric methods: Force distributions (not just medians) to be the same: • Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

Use curve estimators such as splines to adjust for the effect: • Li and Wong (2001)• Colantuoni et al. (2002)• Dudoit et al. (2002)

Adjustments based on additive/multiplicative model: • Rocke and Durbin (2003)• Huber et al. (2002)• Cui et al. (2003)

Quantile Normalization (non-parametric) • Bolstad et al. (2003)

• Every probe value on any one chip is mapped to the corresponding quantile of the standard distribution; hence quantile normalization

• The average of all available arrays can be used to form an average empirical distribution

• Simple and effective!Jason Young | Email: [email protected] | MED 263 | Winter 2015







I


Arrays







I

II


Arrays







I

II

III


Arrays







I

II

III

IV


Arrays







I

II

III

IV

V


Arrays







I

II

III

IV

V

VI


Arrays







• Simple & effective!

I

II

III

IV

V

VI


Arrays

Normalization StrategiesBefore Normalization


Normalization StrategiesAfter Normalization


MAS5.0, RMA, GCRMAMAS 5.0 (Microarray Suite - Affymetrix): • Adjusts for background noise by subtracting MM from PM signal but this is an over adjustment.• MM probes detect specific signal such that a third of all MM probes are brighter than their PM counterpart. Due to specific + non-specific binding.

RMA (Robust Multiarray Averaging): • Increases precision but sacrifices some accuracy by using a background adjustment step that corrects PM probe-intensities chip by chip but ignores MM intensities.• Also uses quantile normalization

GCRMA (GeneChip Robust Multiarray Averaging): • Similar to RMA, but corrects background using sequence data of probes to account for non-specific binding (NSB).• MM probes not ignored, improved precision and accuracy.


MAS5.0, RMA, GCRMAMAS 5.0 (Microarray Suite - Affymetrix): • Adjusts for background noise by subtracting MM from PM signal but this is an over adjustment.• MM probes detect specific signal such that a third of all MM probes are brighter than their PM counterpart. Due to specific + non-specific binding.

RMA (Robust Multiarray Averaging): • Increases precision but sacrifices some accuracy by using a background adjustment step that corrects PM probe-intensities chip by chip but ignores MM intensities.• Also uses quantile normalization

GCRMA (GeneChip Robust Multiarray Averaging): • Similar to RMA, but corrects background using sequence data of probes to account for non-specific binding.• MM probes not ignored, improved precision and accuracy.





3. Hybridization

4. Preprocessing

5. Data Analysis



How to Identify Differential Expression


How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. T-test Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value assesses statistical significance based on normal distributionMultiple Testing Problem:• 1 gene, p = 0.05 means 5% chance difference by chance alone. (OK)• 1000 genes, p = 0.05 means 50 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (FDR-adjusted p-value)



Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)


3. T-test Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value assesses statistical significance based on normal distributionMultiple Testing Problem:• 1 gene, p = 0.05 means 5% chance difference by chance alone. (OK)• 1000 genes, p = 0.05 means 50 would be false positives by chance alone. (!!!!)






3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value assesses statistical significance based on normal distributionMultiple Testing Problem:• 1 gene, p = 0.05 means 5% chance difference by chance alone. (OK)• 1000 genes, p = 0.05 means 50 would be false positives by chance alone. (!!!!)






3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value < threshold (x) indicates only x% of the time would the observed differences be due to chance (norm. dist.)Multiple Testing Problem:• 1 gene, p = 0.05 means 5% chance difference by chance alone. (OK)• 1000 genes, p = 0.05 means 50 would be false positives by chance alone. (!!!!)






3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value < threshold (x) indicates only x% of the time would the observed differences be due to chance (norm. dist.)Multiple Testing Problem:• 1 gene, p = 0.05 means 5% difference by chance alone. (OK)• 100 genes, p = 0.05 means 5 would be false positives by chance alone. (!!!!)








• p/# tests - too stringent!






Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (False Discovery Rate-adjusted p-value)









FDR = # false positives

# called significant









# called significant1 - 0.95100 = 0.994

Example: Assuming the 100 tests are statically independent, the probability of obtaining at least one significant result is…









# called significant1 - 0.95100 = 0.994

Benjamini-Hochberg procedure (1995) - produces an adjusted p-value

Example: Assuming the 100 tests are statically independent, the probability of obtaining at least one significant result is…


ClusteringWhich genes are associated with each other or a particular state/condition?


Clustering

Unsupervised (no prior knowledge used) • Hierarchical (Trees)• Non-hierarchical (K-means)• Cluster 3.0

http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm#ctv

1. Filter out genes that are not expressed in any samples2. Calculate distance between samples using expression of

genes• Euclidean• Pearson

3. Cluster samples based on these distances• Single• Complete• Centroid

Supervised (prior knowledge used) Many methods available...

Ontology-based Pattern Identification (OPI)

Which genes are associated with each other or a particular state/condition?



Clustering



1. Filter out genes (not expressed and/or stable)2. Calculate distance between samples based on gene

expression• Euclidean• Pearson

3. Cluster samples based on these distances• Single (Maximum similarity)• Complete (Minimum similarity)• Centroid (Average similarity)



Which genes are associated with each other or a particular state/condition?



ClusteringClustering





3. Cluster samples based on these distances• Single (Maximum similarity)• Complete (Minimum similarity)• Centroid (Average similarity)





Centroid Clustering


Clustering





3. Cluster samples based on these distances• Single• Complete• Average (Centroid)

Supervised (prior knowledge used) Many methods available (machine learning, etc.)...Ex. Ontology-based Pattern Identification (OPI)






3. Hybridization

4. Preprocessing

5. Data Analysis



Biological ConfirmationMicroarray gene expression must be confirmed using other

experimental techniques.

• Northern Blot• RT PCR• qPCR

• Also functionalconfirmation• mRNA != protein





3. Hybridization

4. Preprocessing

5. Data Analysis


7. Sharing of Data


Sharing of DataMinimum Information About a Microarray Experiment (MIAME) (2001)


Sharing of Data


Kicic, et al., Decreased fibronectin production significantly contributes to dysregulated repair of asthmatic epithelium. Am J. Respir Crit Care Med, 2010. 181(9): p.889-98.

AIM: Identify differentially expressed genes between disease and control groupWhat differences in gene expression may be responsible for differences in phenotype?


Kicic, et al., Decreased fibronectin production significantly contributes to dysregulated repair of asthmatic epithelium. Am J. Respir Crit Care Med, 2010. 181(9): p.889-98.

AIM: Identify differentially expressed genes between disease and control groupWhat differences in gene expression may be responsible for differences in phenotype?

a.k.a. A Fishing Expedition!Jason Young | Email: [email protected] | MED 263 | Winter 2015

Methods• Epithelial cells were collected by bronchial brushing and cultured, and then classified as

healthy non-atopic (pAECHNA), healthy atopic (pAECHA), or atopic asthmatic (pAECAA).• RNA from 16 hybridizations (9 pAECHNA, 7 pAECAA) was quantified, assessed for quality

using Agilent Bioanalyser, and processed for hybridization to Affymetrix Human Genome U133 Arrays.

• Data were normalized by GCRMA and differential gene expression between groups assessed using LIMMA (supervised method).

• LIMMA: fits a linear model to the expression data of each gene and uses empirical Bayes to calculate a moderate t-statistic which smooths the standard errors across genes giving a more reliable results.

For more information see the LIMMA user guide (http://www.bioconductor.org/packages/2.5/bioc/html/limma.html)

Note: atopic = caused by a hereditary predisposition towards developing certain hypersensitivity reactions, such as asthma.


http://www.bioconductor.org/packages/2.5/bioc/html/limma.html

Heatmap

• Figure 2. Differences in lower airway epithelial gene expression between healthy non-atopic children (HNA) and children with atopic asthma (AA). Differentially expressed genes based on false discovery rate of less than 0.25 and an absolute fold change of greater than or equal to 1.5 were arranged using unsupervised two-dimensional hierarchical clustering. Each column represents a differentially expressed gene and each row represents an individual subject. Colors represent fold change in each individual, with red indicating up-regulated genes and green indicating down-regulated genes with respect to the average of HNA subjects.

• Differentially regulated genes: 1612 (763 up, 848 down)


Conclusion• Deposition of the extracellular matrix (ECM) is required to heal wounded epithelial cells.• Kicic, et al. noted that fibronectin (FN1) was the only down regulated ECM component in

asthmatic epithelial cell samples and hypothesized this was the reason for their inability to heal wounds.

Practical 1: You will reanalyze the data from this study to see if you arrive at the same conclusions as the original authors. (R - http://www.bioconductor.org)


http://www.bioconductor.org

What You Learned Today...

Evaluations!

• Functional genomic methods for gene expression analysis

• Typical workflow for a microarray gene expression study

• Aspects of microarray data analysis• Kicic et al. (2010): Example of a

differential expression microarray study


functional genomics ihyphy.org/w/images/0/03/20140217_week7lecture_jyoung.pdf · functional...

Documents