cs 6293 advanced topics: transcriptional bioinformatics introduction to gene expression data...

108
CS 6293 Advanced Topics: Transcriptional Bioinformatics Introduction to Gene Expression Data Analysis

Upload: elfrieda-chase

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

  • CS 6293 Advanced Topics: Transcriptional BioinformaticsIntroduction to Gene Expression Data Analysis

  • OutlineBiological backgroundMicroarrayBasic categories of microarrayComputational and statistical methods involved in microarrayPre-processingDifferentially expressed gene identificationClustering and classificationNetwork / pathway modelingRNA-seq

  • Genome is fixed Cells are dynamicA genome is static(almost) Every cell in our body has a copy of the same genome

    A cell is dynamicResponds to internal/external conditionsMost cells follow a cell cycle of divisionCells differentiate during development

  • Gene regulation is responsible for the dynamic cell

    Gene expression (production of protein) varies according to:Cell typeCell cycleExternal conditionsLocationEtc.

  • Where gene regulation takes placeOpening of chromatin

    Transcription

    Translation

    Protein stability

    Protein modifications

  • Gene expressionGenes have different activities at different time / environmentDNA MicroarraysMeasure gene transcription (amount of mRNA) in a high-throughput fashionA surrogate of gene activityReverse transcription (in lab)Product is called cDNA

  • Transcriptional regulation of genes GenePromoterRNA polymerase(Protein)Transcription Factor (TF)(Protein)DNA

  • Transcriptional regulation of genes GeneTF binding site, cis-regulatory elementRNA polymerase(Protein)Transcription Factor (TF)(Protein)DNA

  • Transcriptional regulation of genes GeneRNA polymeraseTranscription Factor(Protein)DNATF binding site, cis-regulatory element

  • Transcriptional regulation of genes GeneRNA polymeraseTranscription FactorDNANew proteinTF binding site, cis-regulatory element

  • The cell as a regulatory networkABMake DCIf C then DIf B then NOT DIf A and B then DDMake BDIf D then BCgene Dgene B

  • http://www.escience.ws/b572/L13/north.htmlNorthern Blot (an old technique for measuring mRNA expression)1. mRNA extracted and purified.2. mRNA loaded for electrophoresis.

    Lane 1: size standards.Lane 2: RNA to be tested.3. The gel is charged and RNA swim through gel according to weight.-4. mRNA are transferred from the gel to a membrane.5. A labeled probe specific for the RNA fragment is incubated with the blot. So the RNA of interest can be detected.+HybridizationNeed relatively large amount of mRNA

  • See animation of RT-PCR:http://www.bio.davidson.edu/courses/Immunology/Flash/RT_PCR.htmlRT-PCR (reverse transcription-polymerase chain reaction) http://www.ambion.com/techlib/basics/rtpcr/real-time RT-PCR RNA is reverse transcribed to DNA.PCR procedures can be used amplify DNA at exponential rate.Gel quantification for the amplified product.

    ---- an semi-quantitative method. Smaller amount of sample needed.The PCR amplification can be monitored by fluorescence in real time.The fluorescence values recorded in each cycle represent the amount of amplified product.

    ---- a quantitative method. The current most advanced and accurate analysis for mRNA abundance. Usually used to validate microarray result.Often used to validate microarray

  • Limitation of the old techniquesLabor intensive

    Can only detect up to dozens of genes. (gene-by-gene analysis)

  • What is a microarrayA 2D array of DNA sequences from thousands of genes

    Each spot has many copies of same gene (probe)

    Allow mRNAs from a sample to hybridizeForm RNA-DNA double-strandMeasure number of hybridizations per spot

  • What is a Microarray (2)Conceptually similar to (reverse) Northern blot(Many) probes, rather than mRNAs, are fixed on some surface, in an ordered wayGene 9

  • Microarray categoriescDNAs microarrayEach probe is the cDNA of a gene (length: hundreds to thousands nucleotides)Stanford, Brown Lab

    Oligonucleotide microarrayEach probe is a synthesized short DNA (uniquely corresponding to a substring of a gene)Affymetrix: ~ 25mersAgilent: ~ 60 mers

    Others

  • Spotted cDNA microarray

  • Array ManufacturingEach tube contains cDNAs corresponding to a unique gene. Pre-amplified, and spotted onto a glass slide

  • Experimentcy3cy5

  • Data acquisitionComputer programs are used to process the image into digital signals. Segmentation: determine the boundary between signal and background Results: gene expression ratios between two samples

  • Affymetrix GeneChip

  • multiple probes (11~16) for each genefrom Affymetrix Inc.Array Design

  • from Affymetrix Inc.Technology adapted from semiconductor industry.(photolithography and combinatorial chemistry)Array ManufacturingIn situ synthesis of oligonucletides

  • GeneChip Probe Arrays24mMillions of copies of a specificoligonucleotide probe Image of Hybridized Probe Array>200,000 differentcomplementary probes Single stranded, labeled RNA targetOligonucleotide probe1.28cmGeneChip Probe ArrayHybridized Probe Cell

  • from Affymetrix Inc.Overview of the Affymetrix GeneChip technologyEach probe set combines to give an absolute expression level.Image segmentation is relatively easy. But how to use MM signal is debatable

  • Comparison of cDNA array and GeneChip

    cDNAGeneChipProbe preparationProbes are cDNA fragments, usually amplified by PCR and spotted by robot.Probes are short oligos synthesized using a photolithographic approach.colorsTwo-color(measures relative intensity)One-color(measures absolute intensity)Gene representationOne probe per gene11-16 probe pairs per geneProbe lengthLong, varying lengths(hundreds to 1K bp)25-mersDensityMaximum of ~15000 probes.38500 genes * 11 probes = 423500 probes

  • Affymetrix GeneChipOne color designcDNA microarrayTwo color designWhy the difference?

  • Affymetrix GeneChipPhotolithography(The amount of oligos on a probe is well controlled) cDNA microarrayRobotic spotting(The amount of cDNA spotted on a probe may vary greatly)

  • Advantage and disadvantage of cDNA array and GeneChip

    cDNA microarrayAffymetrix GeneChipThe data can be noisy and with variable qualitySpecific and sensitive. Result very reproducible.Cross(non-specific) hybridization can often happen. Hybridization more specific.May need a RNA amplification procedure.Can use small amount of RNA.More difficulty in image analysis.Image analysis and intensity extraction is easier.Need to search the database for gene annotation.More widely used. Better quality of gene annotation.

    Cheap. (both initial cost and per slide cost)Expensive (~$400 per array+labeling and hybridization)Can be custom made for special species.Only several popular species are availableDo not need to know the exact DNA sequence.Need the DNA sequence for probe selection.

  • Typical Microarray AnalysisNormalizeRaw dataFilterClassificationSignificanceClusteringFunction (Gene Ontology)Regulation (Motif finding) Present/Absent Minimum value Fold changePreprocess

    Sheet1

    normaltumortumornormalnormaltumor

    ID_REFVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALL

    AFFX-BioB-5_at210.6P234.6P362.5P389P305.6P330.5P316.1P275.9P294.6P289.5P319.3P891.2P760.9P828.3P751P806.8P772.4P322.5P600.6P290.6P719.9P1412.5P

    AFFX-BioB-M_at393P327.8P501.4P816.5P542P440.8P552P271.3P249.6P329.2P400.3P1662.6P1913.7P1661P1918.4P1801.4P1611.8P464.4P1724.4P599P2069.8P3417.1P

    AFFX-BioB-3_at264.9P164.6P244.7P379.7P261.3P303.7P262.7P192.5P210.7P217.3P257.9P737.7P711.3P855.3P746.6P792.4P752P224.3P667.7P404.5P823.5P2227.1P

    AFFX-BioC-5_at738.6P676.1P737.6P1191.2P917P767.9P992.8P640.4P715.5P844.8P842.3P1978.7P1794.2P1713.8P1854P1877.7P1717.5P743P1729.2P1113.9P1756.7P3593.1P

    AFFX-BioC-3_at356.3P365.9P423.4P711.6P560.3P484.9P617.4P352.6P494.2P535.6P485.9P1563.9P1427.6P1514.3P1350.6P1441P1491.4P456.6P1245.8P568.9P1421.6P2608.5P

    AFFX-BioDn-5_at566.3P442.2P649.7P834.3P599.1P606.9P697.3P400.1P381P486.5P632.3P2351.5P2411.6P2218.1P1732.3P2242.9P2018P588.8P1867.1P674.7P2045.8P3467.6P

    AFFX-BioDn-3_at3911.8P3703.7P4680.9P6037.7P4653.7P4232P4319.5P3986.8P3837.8P4365.4P4486.4P10220.7P10555.9P10360.5P11901.1P9541.8P9889.7P4650.1P10782.5P5092.4P11386.8P19359.2P

    AFFX-CreX-5_at6433.3P5980P7734.7P10591P8162.1P8428P9080.6P5660.7P5449.8P7155.3P7164.3P18277.9P19719.4P20062.1P17096.8P17498.3P18719.8P7527P17148.3P9353.6P17878.8P31233.3P

    AFFX-CreX-3_at11917.8P9376.7P11509.3P16814.4P13861.8P13653.4P13004.6P9903.1P11267.2P13304.6P13968.7P25033.5P24176.2P26943.5P29725P27275.8P28495.5P11673.9P25576.8P13544.7P28855.7P44550.8P

    AFFX-DapX-5_at12.2A44.3M31.2A37.7P33.3A12.8A29.7A18.9A25.5A45.3M41.9A1.5A18.7A19.1A5.7A19.5A16.4A25.9A4.9A30.7A5.1A3.7A

    AFFX-DapX-M_at57.8M42.5A79M48.8P39.5A39.2A45.6M54.5A102.5P62.2A44.2A27.4A53.6A39.9A63.3A21.4A28.8A53.5A53.2A57.6A56.5A60.2A

    AFFX-DapX-3_at29.8A6.2A23.4A28.4A3.2A7.6A9.6A34.3A15.6A8.3A4.6A4.7A2.7A4.5A3.8A6.3A12A6.1A5.3A5.7A7.2A9.1A

    AFFX-LysX-5_at15.3A16.2A15.6A16.7A3.1A3.9A24.9A3.6A32.7A10.7A3.1A5.5A48.9M10A24.7A14.7A29.5A1.5A3A33M16.6A28.1A

    AFFX-LysX-M_at33.2A12A17.7A37.3A49.2A9.1A67.8A38.5A16A13.4A39.8A7.1A43.9A29.6A61.4A8.4A15.2A29.6A11.8A19A48.5A9.3A

    AFFX-LysX-3_at40.7M10.7A36.2A22.1A22.8A28.2A5.2A13.3A41.7A27.9A48.5A6.3A14.5A8.4A18.2A8.3A20.4A17.6A28.4A22.9A53.9A17.7A

    AFFX-PheX-5_at7.8A3A7.6A5.6A5A6.4A3.6A8.8A8.1A5.1A5.5A4.9A6.2A5.2A2.8A10A2.8A5.2A9.1A5.8A12.4A25.2A

    AFFX-PheX-M_at4.2A4.8A6.8A6.1A3.7A5.5A9.1A5A25.4A7.1A24.4A5A7.5A2.6A31.1A13.5A5.7A6.4A5.2A16.1A10.1A3.2A

    AFFX-PheX-3_at54.2A39.6A19.4A16.1A44.7A31.2A31.4A61.5A63.9A50A71.7A59.5A65.4A79.4A94A118.7A43.7A17.7A82.2A42.4A104P69.3A

    AFFX-ThrX-5_at8.2A11.2A13.2A9.5A8.5A7.5A9.2A23.4A10.7A13.8A6.9A9.8A7.8A11A13.2A4.5A6.8A8.2A13A10.3A13.6A40.1A

    AFFX-ThrX-M_at38.1A30.6A37.6A7.2A26.9A36.3A24.4A44.9A36.9A43.1A9.4A30.7A30.1A6.7A34.2A56A13.5A27.7A34.9A37.4A9.4A14.6A

    AFFX-ThrX-3_at15.2A5A15A8.3A36.8A11.5A11.2A14.2A17.9A36.7A18.4A4.2A9.5A7.3A9.1A7.8A11.2A12A8.1A11.1A38.6A14.2A

    AFFX-TrpnX-5_at11.2A11.8A22.2A22.1A8.9A35.6A36.9A41A27.9A58.3A8.6A6.6A35.8A5.5A56.6A12.3A31.5A8.3A10A9.9A19.9A55.3A

    AFFX-TrpnX-M_at9A8.1A9.1A8.7A8.1A12A3.2A9.1A6.7A9.6A6.4A5.4A4.5A20.7A7.8A9.3A19.3A4.8A9.9A23A32.2A8.1A

    AFFX-TrpnX-3_at19.8A12.8A11.8A43.2M17.4A10A10.7A4.2A3.8A16.3A3.2A31.4A15.6A3.2A43.4A2.3A3.6A3.1A6.4A18.4A52.2A47.5A

    AFFX-HUMISGF3A/M97935_5_at82.7P120.7P92.7P46.4P55.9P46.5P75.4P74.6P81.1A58.8P107.9P68.7P47.4A22.2A56.7A29.7A57.4A119P85.2P55.3P113.4P94.1A

    AFFX-HUMISGF3A/M97935_MA_at397.6P416.7P244.8A181.4A197.5A192.3A208.8A279.3A261.5A288.3A368.3P168.5P115.1A114.6A209.7A166.7M157A343.7M236.4P279.3A232.6P330.3A

    AFFX-HUMISGF3A/M97935_MB_at206.2P303P300.8P253.5P195.3P216P150.5P166.1P249.4P211.2P245.9P156.1P180.5P191P262.5P168.5P158.2P292.8P258.3P266.2P191P259.7P

    AFFX-HUMISGF3A/M97935_3_at663.8P723.9P812.1P666.1P629.4P754.1P729.2P437P399.3P649.6P480.1P1426.7P1212P978.2P729P915.1P996.6P1131.5P854.8P1152.1P800.4P688.9P

    AFFX-HUMRGE/M10098_5_at547.6P405.9P6894.7P3496.1P1958.5P5799.4P9384.3P244.7P305.4P300.8P367.5P9.2A5.7A8.9A12.2A47.1A9.1A2085.1P9.5A4175.9P15.7A496.8P

    AFFX-HUMRGE/M10098_M_at239.1P175.8P3675P1348.6P695.9P2428.2P4633.4P126.6M162.2M217.9P204.6P31.4A16.4A61.7A11.7A45.5A14A1043.8P39.9A1984.3P84.4A758.5P

    AFFX-HUMRGE/M10098_3_at1236.4P721.4P9076.1P7795.9P4237.1P7890P12145.2P531.6P721.3P707.2P938P195.5P209.5P268.4P248.6P210.7P194.3A6030.2P127.5A8491.8P151.3P1647.9P

    AFFX-HUMGAPDH/M33197_5_at19508P19267.1P22892P26584P29666.6P25038.1P26635.3P27822P27121.1P26708.9P26426.7P21883.3P20055.1P29317.9P37552.6P34318.7P28054.2P14526.7P31809.6P16585.3P34314.4P40743.7P

    AFFX-HUMGAPDH/M33197_M_at18996.6P20610.4P21573.7P29936P30106.6P22380.2P28903.3P25630.2P23732.3P26293.2P23575.6P22794.3P23497P31522.6P39832.2P34753.8P27147.9P13913P29837.9P16303.4P34313.8P35276.8P

    AFFX-HUMGAPDH/M33197_3_at18016.4P17463.8P20921.3P26908.3P28382.2P21885P28202.2P23469.5P24297.2P23422.3P23797.2P22775.6P25217.4P29758.1P36482P33120.5P28945.5P15374.5P30306P19634.6P32865.4P40969P

    AFFX-HSAC07/X00351_5_at23294.6P21783.7P18423.3P21858.9P23517.1P19450.3P23452P26002.5P26794.3P26572.2P26252.9P14853.4P4712.7P11690.9P19340.1P15405.7P12542.9P15498.4P21155P19970.1P22129.2P40315.1P

    AFFX-HSAC07/X00351_M_at25373.1P24922.8P22384.2P25760.2P27718.5P21401.6P28559.3P29511.7P29582.7P26612.4P27116.9P23689.2P16466.7P21382.9P25955.9P23031P24866.1P19101P29376.9P27792.8P28482.6P49154.6P

    AFFX-HSAC07/X00351_3_at20032.8P20251.1P20961.7P23494.6P23381.2P21173.3P24080.2P22098.4P23454.5P22287.2P22368.1P24910.5P23435.9P24370.4P26831.5P25788P26660P19272P26867.6P26376.6P26232P42005.4P

    AFFX-M27830_5_at731.5P614.9P7824.9P1992.1P1480.5P5937.8P7698P557.7P773.2P643.8P722.6P247.9P290.6P331.6P217.2P192.5P139.9P1087.5P223.5P1995.8P241.7P2049.8P

    AFFX-M27830_M_at1010.2A1188.8A4241.2P2780.4P2380P3578.7P4191.1P950.7A1239.1A992.6A992.2A1309.9P1359.1P1822.3P898.1A1382.4P1193.2P2099.6P785.8M3744.6P702.5P3301.9P

    AFFX-M27830_3_at81A69.3A74.1A65.9A65A76.7A51.9A78.5A88.2A67.2A74A34.3A44.2A37.9A44.2A49A124.4A78.1A35.2A65.8A49.6A103.4A

    AFFX-hum_alu_at19621P20396.3P19844.2P24283.1P25676.5P20397.4P24600.4P23703.6P24923.8P21932P22025.5P26129.1P26736P27473.3P29117P28906.2P31201.7P16425.5P28380.2P20494.3P27279.6P32884.9P

    AFFX-r2-Ec-bioB-5_at243.2P193.6P253.7P462.1P326.4P318.8P361.7P147.4P188.7P260.7P282.5P765P810.2P798P767.2P636.4P685.8P292.6P743.6P351.4P763P1131.7P

    AFFX-r2-Ec-bioB-M_at482.6P448.1P594.3P819.5P676.9P596.2P766.5P407.8P381.2P507.6P508.1P1548.6P1744.3P1590.7P1406.5P1493.9P1344.3P583P1471.4P781P1825.8P3043.9P

    AFFX-r2-Ec-bioB-3_at254.3P231.8P345.3P519.5P399.6P330.5P308.8P219.2P242.5P357.6P310P1070.4P1125.9P1052.1P853.9P1172.2P803.4P363.2P986.4P489.7P1190P2464.5P

    AFFX-r2-Ec-bioC-5_at645.1P595.9P753.7P1115.7P776.7P740P888.9P604.4P541.2P793.1P782.2P2136.7P2216.8P2137.6P1839.4P1826.3P1784P768.1P1994.1P949.6P2271.6P3893.9P

    AFFX-r2-Ec-bioC-3_at848.5P874.3P974P1336.2P1035.8P959.7P1136.8P769.6P793.8P1040.8P1008.7P2424.4P2536P2399.8P2104.3P2307.4P2243.9P842.9P2341.2P1271.3P2546.5P4499.8P

    AFFX-r2-Ec-bioD-5_at2599.8P2252.4P2735.6P4011.8P2877.3P2730P3247.1P1795.1P1973P2531.8P2643.6P9343P9177.1P9748.3P9139.7P8699.8P9390.2P2815.5P9402.3P3308.4P10559.5P16905.4P

    AFFX-r2-Ec-bioD-3_at2866.7P2618P3150.6P4917.6P3857.5P3588.1P3817.5P3023.2P3102.7P3341.9P3540.7P10513.7P10451.6P10994.6P10189P9660.2P9506.8P3271.5P9735.3P4674.8P11113.8P16322.4P

    AFFX-r2-P1-cre-5_at12667.2P12782.6P14740.1P19846.3P16928.7P14437.9P16816.1P13400.8P13357.9P13998.3P15705.1P28504.9P29704.5P30898.6P31712P31446.5P33733.1P13628.9P28158P20762.2P30988.5P54324.5P

    AFFX-r2-P1-cre-3_at16690.5P17395.8P21243.5P25469.7P21833P19222.1P22998.4P15822.9P15951.7P17769P17911.9P38037.8P35454.6P35665P37282P39075.7P36027.3P19703.5P37179.3P26240.9P46345.4P77209.3P

    AFFX-r2-Bs-dap-5_at2.8A3.3A2.7A2.4A2.8A2.1A1.8A3.7A5.2A5A2.3A2.2A2.4A1.5A4.1A2.3A2.8A8.9A4.5A7.4A5.5A2.5A

    AFFX-r2-Bs-dap-M_at14.7A27P13.6A17.7A8.1A16.6A1.7A16.6A23.3A10.1A12.5A20A2.8A2.5A23.9A29.2A10.8A11.1A6.2A20.2A0.9A56A

    AFFX-r2-Bs-dap-3_at7.4A3.1A5.6A16.2A21.4A2.5A2.7A12.4A4.1A2.2A5.8A3.3A3.7A6A1.5A2.7A25.6A13A27.1A14.8A33A4.4A

    AFFX-r2-Bs-lys-5_at10.9A6.4A28.9A3.4A14.6A5.3A9.9A13.2A45A9.5A28.2A24.3A3.4A2.7A16.3A29.2A14A31.2A11.2A72.5A3.1A54.9A

    AFFX-r2-Bs-lys-M_at43.2P31.4A4.9A21.9A21A23.8A17.8P29.1A14.9A20.3A44.9A2.2A38.1A34.2A17.2A13.9A49.3A35.1A8.9A37.5A5.1A55.8A

    AFFX-r2-Bs-lys-3_at55.6P48.7A82P75.2P76.3P70.9P45.3A49.8A63.6A98.9P48.6A49.6A39.4P42.3A113.4P80.8P58.1A72.1P41.1A75.9P81.7A91.3P

    AFFX-r2-Bs-phe-5_at27A3.7A15.7A4.8A9.8A16.7A18.8A3A41.3A8.3A6.4A23.3A38.3A1.4A42.1A52.1A15.6A11.5A4.5A28.7A20.7A100.7A

    AFFX-r2-Bs-phe-M_at1.5A31.9A27A24.2A22.6M27A19A18.5A27.6A34.1A20.4A6.9A16.6A41.3A74.6M29.2A3A25.8A38.6A22.7A46.8A3.7A

    AFFX-r2-Bs-phe-3_at61.2A77.5A93.3M87.4A38.5A54.2A8.2A68.7M95.1A94.7A80.2A32.3A83.4A11.4A75.9A70.7A32.8A48.6A27.6A13.1A91.8A14.8A

    AFFX-r2-Bs-thr-3_s_at13.2A8A49.6A8.2A20.6A34A9.4A7.6A11.4A39.5A24.2A7.5A6.9A11.8A12.4A14.3A6.6A10.1A7.3A4.2A5.6A91.5A

    AFFX-r2-Bs-thr-M_s_at44A79.3A48.1A82.3A65.1A83.9A38.3A55.2A143.4A91.7A94.9A13.5A35.3A27.5A76.4A37.4A51.9A68.9A70.6A83.2A85.4A23.5A

    AFFX-r2-Bs-thr-5_s_at96.4P102.2P116.1P79.1A90.8M83.3A74.8P101.8P166.8P88.4P94.2P73.4P71.2A71.2P108.7P32.3A82.4M120.6P120P93.2P79.6A149A

    AFFX-r2-Hs18SrRNA-3_s_at1044.6P719.2P10360.7P8882.5P4229.4P10168.5P14669.6P646.9P793.9P668.9P685P229.7P115.8A183.6P157.2P306.2P180.3A6805.5P128.2A9604.8P210P1296.9P

    AFFX-r2-Hs18SrRNA-M_x_at269P250.1P4548.6P1518.8P705.8P2899.3P5258.5P184.1P226.7P222.1P295.8P28.6A73.3A60.8A15.5A74.7A6.7A1561.2P86.6A2268.4P17.5A789P

    AFFX-r2-Hs18SrRNA-5_at831.6P570.8P7851.3P3963.4P2585.9P6886P10754.2P281.8A397.5A301.5A461.7P20.2A20A46.9A19.3A88.4A9.4A2079.6P81.1A3511.4P69.2A598.8P

    AFFX-r2-Hs28SrRNA-3_at651.1P438.1P4422.5P960.9P978.6P2685.2P3684.4P382.1P415.1P475.3P567.6P151.6P205.3P224.6P223.8A328.2P203.1P797.5P128.6A1052.7P224.8A1469.4P

    AFFX-r2-Hs28SrRNA-M_at183.4A221.8A422.8P249.7A332A405.2P528.4P139.4A249.6A193.9A183.1A73.3A74.9A85.2A95.9A82.2A14.9A219.3A49A292.4A89.3A375.6A

    AFFX-r2-Hs28SrRNA-5_at126A100.9A1257.7P362.7P203.6A722.7P1132.4P90.5A109.4A76.2A115.1A63.4A9.4A60.6A20A11.7A10.8A201.5A17A264.7A22.2A158.1A

    200000_s_at2698.9P2701.3P1950.7P2491.8P2198.9P2182.9P2454.4P2839.2P2960.4P2952.1P3106.3P1802.2P823.3P1249.8P1796.6P2167.7P1702.4P1659.2P2039.4P2557.3P1604.3P963.5P

    200001_at7695.8P6613.7P6303.7P7525.3P6564P6026P7791.9P9186.2P9077.7P8389.4P8497.6P3150.2P1344.3P2670.2P5515P5173.9P2389.7P5585.1P6209.9P5223.7P5898.3P6256.2P

    200002_at14678.9P14211.2P14274P19098.8P20025.4P14405.9P16094.2P14751P14670.1P15242.2P14868.8P19276.3P14791.4P19035.8P20930.1P18220.1P16696P10737.9P17994.5P16617.4P20997.5P35772.2P

    200003_s_at12596.6P12638.7P14242.2P15085.5P15284.6P14589.9P17190.7P17211.7P16305P16418.1P16951.8P16816.4P16216.9P20776.2P21691.9P20613.7P18357.8P11336.6P21282.4P13254P23640.2P25298.6P

    200004_at6921.3P7009.1P8999.6P9749.9P9277.3P9524.8P8704.3P7600.7P7295.9P8216.4P8513.2P6547.4P9230.2P8384.2P7124.7P8221P8253.7P6870.2P6963.9P7977.9P6884.9P7804.3P

    200005_at3970.3P3676.8P3739.5P4199.8P4580.6P4485.9P4134.5P3695P3587.2P3903P5050.3P6621.8P4644.3P5935.5P6330.7P5356.7P6849.4P3045.4P6925.4P5604.6P4761.5P6491.5P

    200006_at5668.7P6990P7869.9P10307.1P8835.4P9624.1P9049.8P6935.8P6506.2P7042.5P9454.7P7881.7P11877.3P12399.9P12273P12839.1P10988.4P6426.4P9353P5535.2P11125.5P7718.2P

    200007_at5466.4P5368P6836.3P6511.8P6194.7P6523.7P7412.8P5612.9P5876.9P5739.5P5014.2P6011.3P6868P8856.7P7866P7260.6P6836.5P6538.3P6517.2P6038.9P7752.8P5812.9P

    200008_s_at3486.4P3463.1P3440.8P3248.5P3771.6P3636.2P4156P4481.3P4590.9P3479.2P4357.3P3354.9P3357.2P3677.1P3958P3424P4136.4P2413.9P4377.7P4360.7P4670.4P4353.1P

    200009_at6376P6046.2P6110.3P7077.3P7479.8P6759.1P7229.5P7676.1P7397.4P6840.5P7168.7P7648.2P7718.9P8374.9P7988.1P8503.3P9500.2P4792.8P8687.6P8776.6P7734.5P7830.1P

    200010_at7049.1P8599.9P9005.3P10723.1P9810.1P11043.8P11753.8P9684.3P10243.6P11669.2P11858.6P14503.5P14198.8P15202.6P15170.4P20103.4P16403.3P5751.1P13659.5P7737.4P12262.2P18125P

    200011_s_at1974.8P2032.8P1928.8P1830.6P1932.7P1824.1P1500.2P2058.1P1897.7P1768.7P1392P1225.7P912.6P1878.5P2022.8P1530.6P870.5P2238.3P1192.3P1514.4P1196.2P2264.4P

    200012_x_at12603P10529.8P10120.3P15197.4P13708.3P12065.6P11129.4P12639.4P13067.6P11607P13167P13244.7P13148.4P15525.1P16047.9P14986.2P17245.3P8138.4P13084.1P12324.2P12907P23086.3P

    200013_at11760P12193.6P13798.6P16020.6P15651.5P18009.6P14743.8P14949.9P16461P16746.1P17119P20341.9P16699.3P18906.1P22338.7P21768.7P20664.7P10449.5P19617.3P15068.3P19634.4P25811.7P

    200014_s_at3027.4P3039.3P3435.4P3842.4P3959.8P3774.4P3811.5P2948.2P2516.4P3364.7P3302.6P4480.9P5257.4P5030.3P4858.9P5153.7P4794.7P3214.3P3744.6P3569.3P5074.8P2975.4P

    200015_s_at3506.3P3077.4P3540.3P3859.9P3492.5P3608.8P3840.7P3209.8P2684.6P2820.9P3502.9P2923P4090.4P3158.2P2899.4P3259.7P3196P3086.4P2683.2P3368P2608P2746.3P

    200016_x_at11676.9P12482.5P14783.4P14912.2P15202P15227.5P16421.3P16338.8P14119.9P13396.5P12704.2P19013.6P20567.1P21593.3P19210P24843.3P22072.2P12549.6P17605.8P14268.9P18341.5P15506.1P

    200017_at10892.5P11641P10098.1P10144P9526P10727.2P9698.9P7603.5P9095.8P10211.9P12683.3P14505.1P14496.6P14116.7P12785.6P12383.6P8531.7P8430.3P12450.9P10291.3P14569.2P16990.1P

    200018_at13280.7P14560.6P14254.3P17077.1P17871.8P15079.2P14832.5P17399.6P16936.5P16315.3P16391.8P18798.4P17706.3P22482.7P25047P21623.3P22552.9P13239P24635.7P14951.3P22171.9P31188.5P

    200019_s_at12991.2P14190.6P12924.8P12764.3P13177.1P12817.3P16293.3P13471.6P14331.1P12978.4P14614.3P19836P14637.9P17103.7P19446.4P19697P16468.7P10647.9P19791.8P13968.6P20127.5P19282.4P

    200020_at2725.1P2580.9P2771.1P3179.2P3114.1P3323.5P2699.6P2695P2555.5P2641.4P3003.4P2425.6P2723.1P2759.6P2389.2P2553.7P2918.3P2467.8P2529.4P3491.1P2429P2242.5P

    200021_at19554.4P18966P18954.2P22107P22943.7P20117.3P23250.1P22762P26079.5P22885.2P20789.3P23848.2P19320.5P22326.8P24994.6P29913.9P27299.2P18742.8P25421.2P22694.8P31207.3P36887.5P

    200022_at13014.4P14211.7P12634.4P13979.7P15497.1P11837.4P14235.4P13900.3P15637.6P13592.7P13770.9P17928.5P10219.1P18441.9P19174.2P17280P12239P8893P19802P12706.4P22594.3P18500.5P

    200023_s_at5866.9P5762.9P6222.2P5527P6110.2P5875.9P6533.2P6387.1P6268.4P6636.3P7208.4P9839.9P9817.8P9526.8P8538.5P10560.7P10136.5P5307.1P9122.4P6450.7P9931.8P7735.4P

    200024_at11106.8P9545.6P12951.8P15217P13505.2P12439.8P18462.3P12616.9P13133.4P13547P12973.1P14255.2P10136.1P16813.6P17510.4P12312.6P15476P8160.3P15238.2P8546.1P17914.2P21122.2P

    200025_s_at14197.3P13379.4P14306.1P17247.2P18349P15409.8P15833.1P15787P15246.4P15859.5P17849.9P22058P20495.3P22793.8P24061.7P23531.9P22458.3P12192P21611.3P15739.1P20520.6P26440.9P

    200026_at10156.6P9799P12391.1P13172.5P13003.9P12811.4P13454.2P13276.9P13102P12815.4P11729.7P19068.2P15233.5P19439P20797.2P17738.2P23095.4P9828.3P15705.4P8393.5P18702P25659.7P

    Sheet2

    Sheet3

  • PreprocessingBackground subtractionAccount for non-specific hybridizationTransformation (e.g. to log scale)ConvenienceConvert data into a certain distribution (e.g. normal) assumed by many statistical proceduresNormalizationRemove systematic biasesMake data from different samples comparableFiltering, averaging, etc.Remove random noisesOrder may be different.May be combined.Garbage in => Garbage out

  • Background subtractionFor cDNA array, relatively straightforwardRaw data contain foreground and background valuesForeground values obtained from detected spotsBackground values obtained from surrounding areaIt may occur that background > foreground For oligo array, probes are densely packed, so cannot be used directly. Hope: MM captures non-specific hybridization?Recent studies suggest that PM and MM are correlated. Better ignore MM entirely or use with cautionAvailable software toolsMAS 5 (by affymetrix)dChIPGCRMA

  • NormalizationWhere errors could come from?Random noisesRepeat the same experiment twice, get diff resultsUsing multiple replicates reduces the problemSystematic errorsArrays manufactured at different timeOn the same array, probes printed with different printer tips may have different biasesDye effect: difference between Cy5 and Cy3 labelingExperimental factorsArray A being applied more mRNAs than array BSample preparation procedureExperiments carried out at different time, by different users, etc.

  • cDNA microarray data preprocessing

  • Typical experimentsWide-type cells vs mutated cellsDiseased cells with normal cellsCells under normal growth condition vs cells treated with chemicalsTypically repeated for several timesProbes (genes)Ratios

  • Transforming cDNA microarray dataData: Cy5/Cy3 ratios as well as raw intensitiesMost common is log2 transformation2 fold increase => log2(2) = 12 fold decrease => log2(1/2) = -1

  • Dye effectcDNA microarray experiments using two identical samples.Observation: Cy5 consistently lower than Cy3. (mean log (cy5/cy3) < 0)Solution: dye swapping.

  • Dye swappingChip 1: label test by cy5 and control by cy3Chip 2: label test by cy3 and control by cy5Ideally cy5/cy3 = cy3/cy5Not so due to dye effectCompute average ratio: log2 (cy5/cy3 on chip 1) + log2 (cy3/cy5 on chip 2)

  • Total intensity normalizationEven after dye-swapping, may still see systematic biasesAssume the total amount of mRNAs should not change between two samplesRescale so that the two colors have same total intensityAssumption not necessarily trueRescale according to a subset of genesHouse-keeping genesMiddle 90% (for example) of genesSpike-in genes

  • M-A plotAlso know as ratio-intensity plotM: log2(cy5 / cy3) = log2(cy5) log2(cy3)A: log2(cy5 * cy3) = (log2(cy5) + log2(cy3)) / 2AMIdeal: M centered at zerovariance does not depend on A.

    However: Systematic dependence between M and AHigh variance of M for smaller A

  • Lowess normalizationLowess: Locally Weighted RegressionFit local polynomial functionsM adjusted according to fitted lineAMAM

  • Replicate filteringExperiments repeatedGenes with very high variability is questionableLog2(ratio1) Log2(ratio2) Ratio 1Ratio 2

  • oligo microarray data preprocessing(Affymetrix chip)

  • Typical experimentsMultiple microarraysn samples (from different time, location, condition, treatment, etc.)k replicates for each samplesFor exampleSamples collected from 100 healthy people and 100 cancer patientsCells treated with some drugs, take samples every 10 minutesRepeat on 3 5 microarrays for each sampleImprove reliability of the resultsOften averaged after some preprocessing

  • Main characteristicsFor each gene, there are multiple PM and MM probes (11-16 pairs)how to obtain overall intensities from these probe-level intensities?Array outputs are absolute values rather than ratiosCross-array normalization is important for them to be comparable

  • TransformationLog transformation for one-color arrayWhen get a data set from someone, be careful with the scale

  • NormalizationIdeas similar to cDNA microarraysFor cDNA microarray arrays, normalize on log ratios. May have one or more arrays.Here, normalize absolute expression values. Usually multiple array.Total intensity normalizationEach array has the same mean intensityCan be based on all genes or a selected subset of genesHouse-keeping genesMiddle 90% (for example) of genesSpike-in genesLowess: using a common reference, or cyclicMany useful tools implemented in R (Bioconductor)

  • Quantile normalizationNormalize multiple arraysAssume the distribution of the values obtained from each array is the same or similarQuantilenormalization

  • Quantile normalizationSort colmeanX 3Restoreorder

  • An example data setJ DeRisi, V Iyer, and P Brown, Exploring the Metabolic and Genetic Control of Gene Expression on a Genomic Scale, Science, 278: 680 686, 1997Yeast cells grow in glucose mediumWhen glucose was depleted, cells change their metabolic pathwayscDNA microarrayTest: 2, 4, 6, 8, 10, 12, 14 hours after growth Control: 0 hourTotal data points: ~6000 x 7No replicates!No normalization!Use fold-change to get differentially expressed genes!

  • Histogram of log ratiosTwo possibilities: Dye effect Sample differenceMedian = -0.27

  • Total intensity normalizationmean(cy3) = 3141mean(cy5) = 28383141 / 2838 = 1.11

    Other options: use median use subset of genesExclude 10% extreme House-keeping genesSpike-in genesEtc.

    Net effect: constant factor for every geneMedian = -0.1

  • Intensity-intensity plotTotal intensity normalization worked well hereTotal intensitynormalization

  • Intensity-intensity plotDid not work well for this experimentDye-swapping can probably helpTotal intensitynormalization

  • M-A plotA: log2(cy5 * cy3) = log2(cy5)+log2(cy3)M: log2(cy5 / cy3) = = log2(cy5)-log2(cy3)

  • M-A plotDependency of M on A

  • Box plot

  • ConclusionsMicroarray provides a way to measure thousands of genes simultaneously and make the global monitoring of cellular activities possible.

    The method produces noisy data and normalization is crucial.

    Real Time RT-PCR for validation of small number of genes.

  • LimitationMeasures mRNA instead of proteins. Actual protein abundance and post-translation modification can not be detected.

    Suitable for global monitoring and should be used to generate further hypothesis or should combine with other carefully designed experiments.

  • Mechanisms in microarrayImportant mechanisms that make microarray work: Reverse transcription: mRNA => cDNA. This is usually also the step to label dyes.(Protein can not be reverse translated to mRNA or to another form. So difficult to label dyes.)

    Double strand binding of complimentary DNA sequences. (Protein does not enjoy such a good property; there are 20 amino acids without complementary binding)

  • Typical Microarray AnalysisNormalizeRaw dataFilterClassificationSignificanceClusteringFunction (Gene Ontology)Regulation (Motif finding) Present/Absent Minimum value Fold change

    Sheet1

    normaltumortumornormalnormaltumor

    ID_REFVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALL

    AFFX-BioB-5_at210.6P234.6P362.5P389P305.6P330.5P316.1P275.9P294.6P289.5P319.3P891.2P760.9P828.3P751P806.8P772.4P322.5P600.6P290.6P719.9P1412.5P

    AFFX-BioB-M_at393P327.8P501.4P816.5P542P440.8P552P271.3P249.6P329.2P400.3P1662.6P1913.7P1661P1918.4P1801.4P1611.8P464.4P1724.4P599P2069.8P3417.1P

    AFFX-BioB-3_at264.9P164.6P244.7P379.7P261.3P303.7P262.7P192.5P210.7P217.3P257.9P737.7P711.3P855.3P746.6P792.4P752P224.3P667.7P404.5P823.5P2227.1P

    AFFX-BioC-5_at738.6P676.1P737.6P1191.2P917P767.9P992.8P640.4P715.5P844.8P842.3P1978.7P1794.2P1713.8P1854P1877.7P1717.5P743P1729.2P1113.9P1756.7P3593.1P

    AFFX-BioC-3_at356.3P365.9P423.4P711.6P560.3P484.9P617.4P352.6P494.2P535.6P485.9P1563.9P1427.6P1514.3P1350.6P1441P1491.4P456.6P1245.8P568.9P1421.6P2608.5P

    AFFX-BioDn-5_at566.3P442.2P649.7P834.3P599.1P606.9P697.3P400.1P381P486.5P632.3P2351.5P2411.6P2218.1P1732.3P2242.9P2018P588.8P1867.1P674.7P2045.8P3467.6P

    AFFX-BioDn-3_at3911.8P3703.7P4680.9P6037.7P4653.7P4232P4319.5P3986.8P3837.8P4365.4P4486.4P10220.7P10555.9P10360.5P11901.1P9541.8P9889.7P4650.1P10782.5P5092.4P11386.8P19359.2P

    AFFX-CreX-5_at6433.3P5980P7734.7P10591P8162.1P8428P9080.6P5660.7P5449.8P7155.3P7164.3P18277.9P19719.4P20062.1P17096.8P17498.3P18719.8P7527P17148.3P9353.6P17878.8P31233.3P

    AFFX-CreX-3_at11917.8P9376.7P11509.3P16814.4P13861.8P13653.4P13004.6P9903.1P11267.2P13304.6P13968.7P25033.5P24176.2P26943.5P29725P27275.8P28495.5P11673.9P25576.8P13544.7P28855.7P44550.8P

    AFFX-DapX-5_at12.2A44.3M31.2A37.7P33.3A12.8A29.7A18.9A25.5A45.3M41.9A1.5A18.7A19.1A5.7A19.5A16.4A25.9A4.9A30.7A5.1A3.7A

    AFFX-DapX-M_at57.8M42.5A79M48.8P39.5A39.2A45.6M54.5A102.5P62.2A44.2A27.4A53.6A39.9A63.3A21.4A28.8A53.5A53.2A57.6A56.5A60.2A

    AFFX-DapX-3_at29.8A6.2A23.4A28.4A3.2A7.6A9.6A34.3A15.6A8.3A4.6A4.7A2.7A4.5A3.8A6.3A12A6.1A5.3A5.7A7.2A9.1A

    AFFX-LysX-5_at15.3A16.2A15.6A16.7A3.1A3.9A24.9A3.6A32.7A10.7A3.1A5.5A48.9M10A24.7A14.7A29.5A1.5A3A33M16.6A28.1A

    AFFX-LysX-M_at33.2A12A17.7A37.3A49.2A9.1A67.8A38.5A16A13.4A39.8A7.1A43.9A29.6A61.4A8.4A15.2A29.6A11.8A19A48.5A9.3A

    AFFX-LysX-3_at40.7M10.7A36.2A22.1A22.8A28.2A5.2A13.3A41.7A27.9A48.5A6.3A14.5A8.4A18.2A8.3A20.4A17.6A28.4A22.9A53.9A17.7A

    AFFX-PheX-5_at7.8A3A7.6A5.6A5A6.4A3.6A8.8A8.1A5.1A5.5A4.9A6.2A5.2A2.8A10A2.8A5.2A9.1A5.8A12.4A25.2A

    AFFX-PheX-M_at4.2A4.8A6.8A6.1A3.7A5.5A9.1A5A25.4A7.1A24.4A5A7.5A2.6A31.1A13.5A5.7A6.4A5.2A16.1A10.1A3.2A

    AFFX-PheX-3_at54.2A39.6A19.4A16.1A44.7A31.2A31.4A61.5A63.9A50A71.7A59.5A65.4A79.4A94A118.7A43.7A17.7A82.2A42.4A104P69.3A

    AFFX-ThrX-5_at8.2A11.2A13.2A9.5A8.5A7.5A9.2A23.4A10.7A13.8A6.9A9.8A7.8A11A13.2A4.5A6.8A8.2A13A10.3A13.6A40.1A

    AFFX-ThrX-M_at38.1A30.6A37.6A7.2A26.9A36.3A24.4A44.9A36.9A43.1A9.4A30.7A30.1A6.7A34.2A56A13.5A27.7A34.9A37.4A9.4A14.6A

    AFFX-ThrX-3_at15.2A5A15A8.3A36.8A11.5A11.2A14.2A17.9A36.7A18.4A4.2A9.5A7.3A9.1A7.8A11.2A12A8.1A11.1A38.6A14.2A

    AFFX-TrpnX-5_at11.2A11.8A22.2A22.1A8.9A35.6A36.9A41A27.9A58.3A8.6A6.6A35.8A5.5A56.6A12.3A31.5A8.3A10A9.9A19.9A55.3A

    AFFX-TrpnX-M_at9A8.1A9.1A8.7A8.1A12A3.2A9.1A6.7A9.6A6.4A5.4A4.5A20.7A7.8A9.3A19.3A4.8A9.9A23A32.2A8.1A

    AFFX-TrpnX-3_at19.8A12.8A11.8A43.2M17.4A10A10.7A4.2A3.8A16.3A3.2A31.4A15.6A3.2A43.4A2.3A3.6A3.1A6.4A18.4A52.2A47.5A

    AFFX-HUMISGF3A/M97935_5_at82.7P120.7P92.7P46.4P55.9P46.5P75.4P74.6P81.1A58.8P107.9P68.7P47.4A22.2A56.7A29.7A57.4A119P85.2P55.3P113.4P94.1A

    AFFX-HUMISGF3A/M97935_MA_at397.6P416.7P244.8A181.4A197.5A192.3A208.8A279.3A261.5A288.3A368.3P168.5P115.1A114.6A209.7A166.7M157A343.7M236.4P279.3A232.6P330.3A

    AFFX-HUMISGF3A/M97935_MB_at206.2P303P300.8P253.5P195.3P216P150.5P166.1P249.4P211.2P245.9P156.1P180.5P191P262.5P168.5P158.2P292.8P258.3P266.2P191P259.7P

    AFFX-HUMISGF3A/M97935_3_at663.8P723.9P812.1P666.1P629.4P754.1P729.2P437P399.3P649.6P480.1P1426.7P1212P978.2P729P915.1P996.6P1131.5P854.8P1152.1P800.4P688.9P

    AFFX-HUMRGE/M10098_5_at547.6P405.9P6894.7P3496.1P1958.5P5799.4P9384.3P244.7P305.4P300.8P367.5P9.2A5.7A8.9A12.2A47.1A9.1A2085.1P9.5A4175.9P15.7A496.8P

    AFFX-HUMRGE/M10098_M_at239.1P175.8P3675P1348.6P695.9P2428.2P4633.4P126.6M162.2M217.9P204.6P31.4A16.4A61.7A11.7A45.5A14A1043.8P39.9A1984.3P84.4A758.5P

    AFFX-HUMRGE/M10098_3_at1236.4P721.4P9076.1P7795.9P4237.1P7890P12145.2P531.6P721.3P707.2P938P195.5P209.5P268.4P248.6P210.7P194.3A6030.2P127.5A8491.8P151.3P1647.9P

    AFFX-HUMGAPDH/M33197_5_at19508P19267.1P22892P26584P29666.6P25038.1P26635.3P27822P27121.1P26708.9P26426.7P21883.3P20055.1P29317.9P37552.6P34318.7P28054.2P14526.7P31809.6P16585.3P34314.4P40743.7P

    AFFX-HUMGAPDH/M33197_M_at18996.6P20610.4P21573.7P29936P30106.6P22380.2P28903.3P25630.2P23732.3P26293.2P23575.6P22794.3P23497P31522.6P39832.2P34753.8P27147.9P13913P29837.9P16303.4P34313.8P35276.8P

    AFFX-HUMGAPDH/M33197_3_at18016.4P17463.8P20921.3P26908.3P28382.2P21885P28202.2P23469.5P24297.2P23422.3P23797.2P22775.6P25217.4P29758.1P36482P33120.5P28945.5P15374.5P30306P19634.6P32865.4P40969P

    AFFX-HSAC07/X00351_5_at23294.6P21783.7P18423.3P21858.9P23517.1P19450.3P23452P26002.5P26794.3P26572.2P26252.9P14853.4P4712.7P11690.9P19340.1P15405.7P12542.9P15498.4P21155P19970.1P22129.2P40315.1P

    AFFX-HSAC07/X00351_M_at25373.1P24922.8P22384.2P25760.2P27718.5P21401.6P28559.3P29511.7P29582.7P26612.4P27116.9P23689.2P16466.7P21382.9P25955.9P23031P24866.1P19101P29376.9P27792.8P28482.6P49154.6P

    AFFX-HSAC07/X00351_3_at20032.8P20251.1P20961.7P23494.6P23381.2P21173.3P24080.2P22098.4P23454.5P22287.2P22368.1P24910.5P23435.9P24370.4P26831.5P25788P26660P19272P26867.6P26376.6P26232P42005.4P

    AFFX-M27830_5_at731.5P614.9P7824.9P1992.1P1480.5P5937.8P7698P557.7P773.2P643.8P722.6P247.9P290.6P331.6P217.2P192.5P139.9P1087.5P223.5P1995.8P241.7P2049.8P

    AFFX-M27830_M_at1010.2A1188.8A4241.2P2780.4P2380P3578.7P4191.1P950.7A1239.1A992.6A992.2A1309.9P1359.1P1822.3P898.1A1382.4P1193.2P2099.6P785.8M3744.6P702.5P3301.9P

    AFFX-M27830_3_at81A69.3A74.1A65.9A65A76.7A51.9A78.5A88.2A67.2A74A34.3A44.2A37.9A44.2A49A124.4A78.1A35.2A65.8A49.6A103.4A

    AFFX-hum_alu_at19621P20396.3P19844.2P24283.1P25676.5P20397.4P24600.4P23703.6P24923.8P21932P22025.5P26129.1P26736P27473.3P29117P28906.2P31201.7P16425.5P28380.2P20494.3P27279.6P32884.9P

    AFFX-r2-Ec-bioB-5_at243.2P193.6P253.7P462.1P326.4P318.8P361.7P147.4P188.7P260.7P282.5P765P810.2P798P767.2P636.4P685.8P292.6P743.6P351.4P763P1131.7P

    AFFX-r2-Ec-bioB-M_at482.6P448.1P594.3P819.5P676.9P596.2P766.5P407.8P381.2P507.6P508.1P1548.6P1744.3P1590.7P1406.5P1493.9P1344.3P583P1471.4P781P1825.8P3043.9P

    AFFX-r2-Ec-bioB-3_at254.3P231.8P345.3P519.5P399.6P330.5P308.8P219.2P242.5P357.6P310P1070.4P1125.9P1052.1P853.9P1172.2P803.4P363.2P986.4P489.7P1190P2464.5P

    AFFX-r2-Ec-bioC-5_at645.1P595.9P753.7P1115.7P776.7P740P888.9P604.4P541.2P793.1P782.2P2136.7P2216.8P2137.6P1839.4P1826.3P1784P768.1P1994.1P949.6P2271.6P3893.9P

    AFFX-r2-Ec-bioC-3_at848.5P874.3P974P1336.2P1035.8P959.7P1136.8P769.6P793.8P1040.8P1008.7P2424.4P2536P2399.8P2104.3P2307.4P2243.9P842.9P2341.2P1271.3P2546.5P4499.8P

    AFFX-r2-Ec-bioD-5_at2599.8P2252.4P2735.6P4011.8P2877.3P2730P3247.1P1795.1P1973P2531.8P2643.6P9343P9177.1P9748.3P9139.7P8699.8P9390.2P2815.5P9402.3P3308.4P10559.5P16905.4P

    AFFX-r2-Ec-bioD-3_at2866.7P2618P3150.6P4917.6P3857.5P3588.1P3817.5P3023.2P3102.7P3341.9P3540.7P10513.7P10451.6P10994.6P10189P9660.2P9506.8P3271.5P9735.3P4674.8P11113.8P16322.4P

    AFFX-r2-P1-cre-5_at12667.2P12782.6P14740.1P19846.3P16928.7P14437.9P16816.1P13400.8P13357.9P13998.3P15705.1P28504.9P29704.5P30898.6P31712P31446.5P33733.1P13628.9P28158P20762.2P30988.5P54324.5P

    AFFX-r2-P1-cre-3_at16690.5P17395.8P21243.5P25469.7P21833P19222.1P22998.4P15822.9P15951.7P17769P17911.9P38037.8P35454.6P35665P37282P39075.7P36027.3P19703.5P37179.3P26240.9P46345.4P77209.3P

    AFFX-r2-Bs-dap-5_at2.8A3.3A2.7A2.4A2.8A2.1A1.8A3.7A5.2A5A2.3A2.2A2.4A1.5A4.1A2.3A2.8A8.9A4.5A7.4A5.5A2.5A

    AFFX-r2-Bs-dap-M_at14.7A27P13.6A17.7A8.1A16.6A1.7A16.6A23.3A10.1A12.5A20A2.8A2.5A23.9A29.2A10.8A11.1A6.2A20.2A0.9A56A

    AFFX-r2-Bs-dap-3_at7.4A3.1A5.6A16.2A21.4A2.5A2.7A12.4A4.1A2.2A5.8A3.3A3.7A6A1.5A2.7A25.6A13A27.1A14.8A33A4.4A

    AFFX-r2-Bs-lys-5_at10.9A6.4A28.9A3.4A14.6A5.3A9.9A13.2A45A9.5A28.2A24.3A3.4A2.7A16.3A29.2A14A31.2A11.2A72.5A3.1A54.9A

    AFFX-r2-Bs-lys-M_at43.2P31.4A4.9A21.9A21A23.8A17.8P29.1A14.9A20.3A44.9A2.2A38.1A34.2A17.2A13.9A49.3A35.1A8.9A37.5A5.1A55.8A

    AFFX-r2-Bs-lys-3_at55.6P48.7A82P75.2P76.3P70.9P45.3A49.8A63.6A98.9P48.6A49.6A39.4P42.3A113.4P80.8P58.1A72.1P41.1A75.9P81.7A91.3P

    AFFX-r2-Bs-phe-5_at27A3.7A15.7A4.8A9.8A16.7A18.8A3A41.3A8.3A6.4A23.3A38.3A1.4A42.1A52.1A15.6A11.5A4.5A28.7A20.7A100.7A

    AFFX-r2-Bs-phe-M_at1.5A31.9A27A24.2A22.6M27A19A18.5A27.6A34.1A20.4A6.9A16.6A41.3A74.6M29.2A3A25.8A38.6A22.7A46.8A3.7A

    AFFX-r2-Bs-phe-3_at61.2A77.5A93.3M87.4A38.5A54.2A8.2A68.7M95.1A94.7A80.2A32.3A83.4A11.4A75.9A70.7A32.8A48.6A27.6A13.1A91.8A14.8A

    AFFX-r2-Bs-thr-3_s_at13.2A8A49.6A8.2A20.6A34A9.4A7.6A11.4A39.5A24.2A7.5A6.9A11.8A12.4A14.3A6.6A10.1A7.3A4.2A5.6A91.5A

    AFFX-r2-Bs-thr-M_s_at44A79.3A48.1A82.3A65.1A83.9A38.3A55.2A143.4A91.7A94.9A13.5A35.3A27.5A76.4A37.4A51.9A68.9A70.6A83.2A85.4A23.5A

    AFFX-r2-Bs-thr-5_s_at96.4P102.2P116.1P79.1A90.8M83.3A74.8P101.8P166.8P88.4P94.2P73.4P71.2A71.2P108.7P32.3A82.4M120.6P120P93.2P79.6A149A

    AFFX-r2-Hs18SrRNA-3_s_at1044.6P719.2P10360.7P8882.5P4229.4P10168.5P14669.6P646.9P793.9P668.9P685P229.7P115.8A183.6P157.2P306.2P180.3A6805.5P128.2A9604.8P210P1296.9P

    AFFX-r2-Hs18SrRNA-M_x_at269P250.1P4548.6P1518.8P705.8P2899.3P5258.5P184.1P226.7P222.1P295.8P28.6A73.3A60.8A15.5A74.7A6.7A1561.2P86.6A2268.4P17.5A789P

    AFFX-r2-Hs18SrRNA-5_at831.6P570.8P7851.3P3963.4P2585.9P6886P10754.2P281.8A397.5A301.5A461.7P20.2A20A46.9A19.3A88.4A9.4A2079.6P81.1A3511.4P69.2A598.8P

    AFFX-r2-Hs28SrRNA-3_at651.1P438.1P4422.5P960.9P978.6P2685.2P3684.4P382.1P415.1P475.3P567.6P151.6P205.3P224.6P223.8A328.2P203.1P797.5P128.6A1052.7P224.8A1469.4P

    AFFX-r2-Hs28SrRNA-M_at183.4A221.8A422.8P249.7A332A405.2P528.4P139.4A249.6A193.9A183.1A73.3A74.9A85.2A95.9A82.2A14.9A219.3A49A292.4A89.3A375.6A

    AFFX-r2-Hs28SrRNA-5_at126A100.9A1257.7P362.7P203.6A722.7P1132.4P90.5A109.4A76.2A115.1A63.4A9.4A60.6A20A11.7A10.8A201.5A17A264.7A22.2A158.1A

    200000_s_at2698.9P2701.3P1950.7P2491.8P2198.9P2182.9P2454.4P2839.2P2960.4P2952.1P3106.3P1802.2P823.3P1249.8P1796.6P2167.7P1702.4P1659.2P2039.4P2557.3P1604.3P963.5P

    200001_at7695.8P6613.7P6303.7P7525.3P6564P6026P7791.9P9186.2P9077.7P8389.4P8497.6P3150.2P1344.3P2670.2P5515P5173.9P2389.7P5585.1P6209.9P5223.7P5898.3P6256.2P

    200002_at14678.9P14211.2P14274P19098.8P20025.4P14405.9P16094.2P14751P14670.1P15242.2P14868.8P19276.3P14791.4P19035.8P20930.1P18220.1P16696P10737.9P17994.5P16617.4P20997.5P35772.2P

    200003_s_at12596.6P12638.7P14242.2P15085.5P15284.6P14589.9P17190.7P17211.7P16305P16418.1P16951.8P16816.4P16216.9P20776.2P21691.9P20613.7P18357.8P11336.6P21282.4P13254P23640.2P25298.6P

    200004_at6921.3P7009.1P8999.6P9749.9P9277.3P9524.8P8704.3P7600.7P7295.9P8216.4P8513.2P6547.4P9230.2P8384.2P7124.7P8221P8253.7P6870.2P6963.9P7977.9P6884.9P7804.3P

    200005_at3970.3P3676.8P3739.5P4199.8P4580.6P4485.9P4134.5P3695P3587.2P3903P5050.3P6621.8P4644.3P5935.5P6330.7P5356.7P6849.4P3045.4P6925.4P5604.6P4761.5P6491.5P

    200006_at5668.7P6990P7869.9P10307.1P8835.4P9624.1P9049.8P6935.8P6506.2P7042.5P9454.7P7881.7P11877.3P12399.9P12273P12839.1P10988.4P6426.4P9353P5535.2P11125.5P7718.2P

    200007_at5466.4P5368P6836.3P6511.8P6194.7P6523.7P7412.8P5612.9P5876.9P5739.5P5014.2P6011.3P6868P8856.7P7866P7260.6P6836.5P6538.3P6517.2P6038.9P7752.8P5812.9P

    200008_s_at3486.4P3463.1P3440.8P3248.5P3771.6P3636.2P4156P4481.3P4590.9P3479.2P4357.3P3354.9P3357.2P3677.1P3958P3424P4136.4P2413.9P4377.7P4360.7P4670.4P4353.1P

    200009_at6376P6046.2P6110.3P7077.3P7479.8P6759.1P7229.5P7676.1P7397.4P6840.5P7168.7P7648.2P7718.9P8374.9P7988.1P8503.3P9500.2P4792.8P8687.6P8776.6P7734.5P7830.1P

    200010_at7049.1P8599.9P9005.3P10723.1P9810.1P11043.8P11753.8P9684.3P10243.6P11669.2P11858.6P14503.5P14198.8P15202.6P15170.4P20103.4P16403.3P5751.1P13659.5P7737.4P12262.2P18125P

    200011_s_at1974.8P2032.8P1928.8P1830.6P1932.7P1824.1P1500.2P2058.1P1897.7P1768.7P1392P1225.7P912.6P1878.5P2022.8P1530.6P870.5P2238.3P1192.3P1514.4P1196.2P2264.4P

    200012_x_at12603P10529.8P10120.3P15197.4P13708.3P12065.6P11129.4P12639.4P13067.6P11607P13167P13244.7P13148.4P15525.1P16047.9P14986.2P17245.3P8138.4P13084.1P12324.2P12907P23086.3P

    200013_at11760P12193.6P13798.6P16020.6P15651.5P18009.6P14743.8P14949.9P16461P16746.1P17119P20341.9P16699.3P18906.1P22338.7P21768.7P20664.7P10449.5P19617.3P15068.3P19634.4P25811.7P

    200014_s_at3027.4P3039.3P3435.4P3842.4P3959.8P3774.4P3811.5P2948.2P2516.4P3364.7P3302.6P4480.9P5257.4P5030.3P4858.9P5153.7P4794.7P3214.3P3744.6P3569.3P5074.8P2975.4P

    200015_s_at3506.3P3077.4P3540.3P3859.9P3492.5P3608.8P3840.7P3209.8P2684.6P2820.9P3502.9P2923P4090.4P3158.2P2899.4P3259.7P3196P3086.4P2683.2P3368P2608P2746.3P

    200016_x_at11676.9P12482.5P14783.4P14912.2P15202P15227.5P16421.3P16338.8P14119.9P13396.5P12704.2P19013.6P20567.1P21593.3P19210P24843.3P22072.2P12549.6P17605.8P14268.9P18341.5P15506.1P

    200017_at10892.5P11641P10098.1P10144P9526P10727.2P9698.9P7603.5P9095.8P10211.9P12683.3P14505.1P14496.6P14116.7P12785.6P12383.6P8531.7P8430.3P12450.9P10291.3P14569.2P16990.1P

    200018_at13280.7P14560.6P14254.3P17077.1P17871.8P15079.2P14832.5P17399.6P16936.5P16315.3P16391.8P18798.4P17706.3P22482.7P25047P21623.3P22552.9P13239P24635.7P14951.3P22171.9P31188.5P

    200019_s_at12991.2P14190.6P12924.8P12764.3P13177.1P12817.3P16293.3P13471.6P14331.1P12978.4P14614.3P19836P14637.9P17103.7P19446.4P19697P16468.7P10647.9P19791.8P13968.6P20127.5P19282.4P

    200020_at2725.1P2580.9P2771.1P3179.2P3114.1P3323.5P2699.6P2695P2555.5P2641.4P3003.4P2425.6P2723.1P2759.6P2389.2P2553.7P2918.3P2467.8P2529.4P3491.1P2429P2242.5P

    200021_at19554.4P18966P18954.2P22107P22943.7P20117.3P23250.1P22762P26079.5P22885.2P20789.3P23848.2P19320.5P22326.8P24994.6P29913.9P27299.2P18742.8P25421.2P22694.8P31207.3P36887.5P

    200022_at13014.4P14211.7P12634.4P13979.7P15497.1P11837.4P14235.4P13900.3P15637.6P13592.7P13770.9P17928.5P10219.1P18441.9P19174.2P17280P12239P8893P19802P12706.4P22594.3P18500.5P

    200023_s_at5866.9P5762.9P6222.2P5527P6110.2P5875.9P6533.2P6387.1P6268.4P6636.3P7208.4P9839.9P9817.8P9526.8P8538.5P10560.7P10136.5P5307.1P9122.4P6450.7P9931.8P7735.4P

    200024_at11106.8P9545.6P12951.8P15217P13505.2P12439.8P18462.3P12616.9P13133.4P13547P12973.1P14255.2P10136.1P16813.6P17510.4P12312.6P15476P8160.3P15238.2P8546.1P17914.2P21122.2P

    200025_s_at14197.3P13379.4P14306.1P17247.2P18349P15409.8P15833.1P15787P15246.4P15859.5P17849.9P22058P20495.3P22793.8P24061.7P23531.9P22458.3P12192P21611.3P15739.1P20520.6P26440.9P

    200026_at10156.6P9799P12391.1P13172.5P13003.9P12811.4P13454.2P13276.9P13102P12815.4P11729.7P19068.2P15233.5P19439P20797.2P17738.2P23095.4P9828.3P15705.4P8393.5P18702P25659.7P

    Sheet2

    Sheet3

  • Identify differentially expressed genesTwo samples: one normal, one cancerWhich set of genes have significantly different expression levels between the two samples?Nave approach: fold change threshold (e.g. two fold)Log2 (cy5 / cy3) > 1: up-regulated / inducedLog2(cy5 / cy3) < -1: down-regulated / repressedStill widely used very simpleMain problem: genes with low expression levels may have a large fold change by chanceFrom 10 to 100: ten foldFrom 1000 to 3000: three foldHowever: low-intensity => relatively high variance

  • Problem with fold changeThe most differentially expressed genes are the ones with the lowest average expression levels

  • More robust estimation of differentially expressionEstimate variance as a function of average expressionCompute a Z-score depending on location: Z(x) = (x - ) / (x)x : log2(R/G) value. : local mean(x): local standard deviationReference: Quackenbush, Nat Gen, 2002

  • SAM (Significance Analysis of Microarrays)Tusher et. al. PNAS 2001, 98:5116-5121 Excel add-in (free download, technical details)Most cited method of microarray data analysisExample: Test - 3 reps; Control - 3 reps

    Which one is more significantly differentially expressed?

    T1T2T3C1C2C3RatioGene11000200015002003002506Gene2100020003000100015005002Gene310010001002080508Gene418001700190010008009002

  • Gene 2Ratio = 2000/1000 = 2Gene 4Ratio = 1800/900 = 2

  • SAM (Significance Analysis of Microarrays)Basic idea: compute a statistic (e.g. Students t-test)

    Larger t => higher significanceP-value can be directly computed for t-test or estimated from permutation test

    T1T2T3C1C2C3RatiotGene110002000150020030025064.3Gene21000200030001000150050021.5Gene3100100010020805081.2Gene41800170019001000800900211.0

  • Permutation test to determine significanceNumber of unique permutations: (6 choose 3) = 20. Smallest possible p-value: 1/20 = 0.05With 5 samples on each side: (10 choose 5) = 252With 10 samples on each side: (20 choose 10) ~ 200kFor small sample size: pool all genes

    T1T2T3C1C2C3tGene11000200015002003002504.3Perm11500300100025020002000.17Perm2100030020015002000250-1.3Perm-n2000300100015002002500.7

  • Permutation testSorted-

    Real tt1t2tntavgTreal - tavg

  • SAM

  • False Discovery Rate (FDR)Multiple testing problemP-value cutoff = 0.05We tested 10000 genesWould expect 500 genes by chance at this significance levelFound 600 genes with p < 0.05. Many might be due to noise.Bonferroni correctionUse p-value cutoff 0.05 / 10000Among all genes selected, P(at least one false positive)
  • FDR in SAMSorted-FDR = the median number of significant ones in permuted columns number of significant ones in realSmall : more genes selected; higher FDR.Large : less genes selected; lower FDR.

    Real tt1t2tntavgTreal - tavg

  • FDR in SAMFDR = 1855/5065=36%FDR = 1.5/209
  • Typical Microarray AnalysisNormalizeRaw dataFilterClassificationSignificanceClusteringFunction (Gene Ontology)Regulation (Motif finding) Present/Absent Minimum value Fold change

    Sheet1

    normaltumortumornormalnormaltumor

    ID_REFVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALLVALUEABS_CALL

    AFFX-BioB-5_at210.6P234.6P362.5P389P305.6P330.5P316.1P275.9P294.6P289.5P319.3P891.2P760.9P828.3P751P806.8P772.4P322.5P600.6P290.6P719.9P1412.5P

    AFFX-BioB-M_at393P327.8P501.4P816.5P542P440.8P552P271.3P249.6P329.2P400.3P1662.6P1913.7P1661P1918.4P1801.4P1611.8P464.4P1724.4P599P2069.8P3417.1P

    AFFX-BioB-3_at264.9P164.6P244.7P379.7P261.3P303.7P262.7P192.5P210.7P217.3P257.9P737.7P711.3P855.3P746.6P792.4P752P224.3P667.7P404.5P823.5P2227.1P

    AFFX-BioC-5_at738.6P676.1P737.6P1191.2P917P767.9P992.8P640.4P715.5P844.8P842.3P1978.7P1794.2P1713.8P1854P1877.7P1717.5P743P1729.2P1113.9P1756.7P3593.1P

    AFFX-BioC-3_at356.3P365.9P423.4P711.6P560.3P484.9P617.4P352.6P494.2P535.6P485.9P1563.9P1427.6P1514.3P1350.6P1441P1491.4P456.6P1245.8P568.9P1421.6P2608.5P

    AFFX-BioDn-5_at566.3P442.2P649.7P834.3P599.1P606.9P697.3P400.1P381P486.5P632.3P2351.5P2411.6P2218.1P1732.3P2242.9P2018P588.8P1867.1P674.7P2045.8P3467.6P

    AFFX-BioDn-3_at3911.8P3703.7P4680.9P6037.7P4653.7P4232P4319.5P3986.8P3837.8P4365.4P4486.4P10220.7P10555.9P10360.5P11901.1P9541.8P9889.7P4650.1P10782.5P5092.4P11386.8P19359.2P

    AFFX-CreX-5_at6433.3P5980P7734.7P10591P8162.1P8428P9080.6P5660.7P5449.8P7155.3P7164.3P18277.9P19719.4P20062.1P17096.8P17498.3P18719.8P7527P17148.3P9353.6P17878.8P31233.3P

    AFFX-CreX-3_at11917.8P9376.7P11509.3P16814.4P13861.8P13653.4P13004.6P9903.1P11267.2P13304.6P13968.7P25033.5P24176.2P26943.5P29725P27275.8P28495.5P11673.9P25576.8P13544.7P28855.7P44550.8P

    AFFX-DapX-5_at12.2A44.3M31.2A37.7P33.3A12.8A29.7A18.9A25.5A45.3M41.9A1.5A18.7A19.1A5.7A19.5A16.4A25.9A4.9A30.7A5.1A3.7A

    AFFX-DapX-M_at57.8M42.5A79M48.8P39.5A39.2A45.6M54.5A102.5P62.2A44.2A27.4A53.6A39.9A63.3A21.4A28.8A53.5A53.2A57.6A56.5A60.2A

    AFFX-DapX-3_at29.8A6.2A23.4A28.4A3.2A7.6A9.6A34.3A15.6A8.3A4.6A4.7A2.7A4.5A3.8A6.3A12A6.1A5.3A5.7A7.2A9.1A

    AFFX-LysX-5_at15.3A16.2A15.6A16.7A3.1A3.9A24.9A3.6A32.7A10.7A3.1A5.5A48.9M10A24.7A14.7A29.5A1.5A3A33M16.6A28.1A

    AFFX-LysX-M_at33.2A12A17.7A37.3A49.2A9.1A67.8A38.5A16A13.4A39.8A7.1A43.9A29.6A61.4A8.4A15.2A29.6A11.8A19A48.5A9.3A

    AFFX-LysX-3_at40.7M10.7A36.2A22.1A22.8A28.2A5.2A13.3A41.7A27.9A48.5A6.3A14.5A8.4A18.2A8.3A20.4A17.6A28.4A22.9A53.9A17.7A

    AFFX-PheX-5_at7.8A3A7.6A5.6A5A6.4A3.6A8.8A8.1A5.1A5.5A4.9A6.2A5.2A2.8A10A2.8A5.2A9.1A5.8A12.4A25.2A

    AFFX-PheX-M_at4.2A4.8A6.8A6.1A3.7A5.5A9.1A5A25.4A7.1A24.4A5A7.5A2.6A31.1A13.5A5.7A6.4A5.2A16.1A10.1A3.2A

    AFFX-PheX-3_at54.2A39.6A19.4A16.1A44.7A31.2A31.4A61.5A63.9A50A71.7A59.5A65.4A79.4A94A118.7A43.7A17.7A82.2A42.4A104P69.3A

    AFFX-ThrX-5_at8.2A11.2A13.2A9.5A8.5A7.5A9.2A23.4A10.7A13.8A6.9A9.8A7.8A11A13.2A4.5A6.8A8.2A13A10.3A13.6A40.1A

    AFFX-ThrX-M_at38.1A30.6A37.6A7.2A26.9A36.3A24.4A44.9A36.9A43.1A9.4A30.7A30.1A6.7A34.2A56A13.5A27.7A34.9A37.4A9.4A14.6A

    AFFX-ThrX-3_at15.2A5A15A8.3A36.8A11.5A11.2A14.2A17.9A36.7A18.4A4.2A9.5A7.3A9.1A7.8A11.2A12A8.1A11.1A38.6A14.2A

    AFFX-TrpnX-5_at11.2A11.8A22.2A22.1A8.9A35.6A36.9A41A27.9A58.3A8.6A6.6A35.8A5.5A56.6A12.3A31.5A8.3A10A9.9A19.9A55.3A

    AFFX-TrpnX-M_at9A8.1A9.1A8.7A8.1A12A3.2A9.1A6.7A9.6A6.4A5.4A4.5A20.7A7.8A9.3A19.3A4.8A9.9A23A32.2A8.1A

    AFFX-TrpnX-3_at19.8A12.8A11.8A43.2M17.4A10A10.7A4.2A3.8A16.3A3.2A31.4A15.6A3.2A43.4A2.3A3.6A3.1A6.4A18.4A52.2A47.5A

    AFFX-HUMISGF3A/M97935_5_at82.7P120.7P92.7P46.4P55.9P46.5P75.4P74.6P81.1A58.8P107.9P68.7P47.4A22.2A56.7A29.7A57.4A119P85.2P55.3P113.4P94.1A

    AFFX-HUMISGF3A/M97935_MA_at397.6P416.7P244.8A181.4A197.5A192.3A208.8A279.3A261.5A288.3A368.3P168.5P115.1A114.6A209.7A166.7M157A343.7M236.4P279.3A232.6P330.3A

    AFFX-HUMISGF3A/M97935_MB_at206.2P303P300.8P253.5P195.3P216P150.5P166.1P249.4P211.2P245.9P156.1P180.5P191P262.5P168.5P158.2P292.8P258.3P266.2P191P259.7P

    AFFX-HUMISGF3A/M97935_3_at663.8P723.9P812.1P666.1P629.4P754.1P729.2P437P399.3P649.6P480.1P1426.7P1212P978.2P729P915.1P996.6P1131.5P854.8P1152.1P800.4P688.9P

    AFFX-HUMRGE/M10098_5_at547.6P405.9P6894.7P3496.1P1958.5P5799.4P9384.3P244.7P305.4P300.8P367.5P9.2A5.7A8.9A12.2A47.1A9.1A2085.1P9.5A4175.9P15.7A496.8P

    AFFX-HUMRGE/M10098_M_at239.1P175.8P3675P1348.6P695.9P2428.2P4633.4P126.6M162.2M217.9P204.6P31.4A16.4A61.7A11.7A45.5A14A1043.8P39.9A1984.3P84.4A758.5P

    AFFX-HUMRGE/M10098_3_at1236.4P721.4P9076.1P7795.9P4237.1P7890P12145.2P531.6P721.3P707.2P938P195.5P209.5P268.4P248.6P210.7P194.3A6030.2P127.5A8491.8P151.3P1647.9P

    AFFX-HUMGAPDH/M33197_5_at19508P19267.1P22892P26584P29666.6P25038.1P26635.3P27822P27121.1P26708.9P26426.7P21883.3P20055.1P29317.9P37552.6P34318.7P28054.2P14526.7P31809.6P16585.3P34314.4P40743.7P

    AFFX-HUMGAPDH/M33197_M_at18996.6P20610.4P21573.7P29936P30106.6P22380.2P28903.3P25630.2P23732.3P26293.2P23575.6P22794.3P23497P31522.6P39832.2P34753.8P27147.9P13913P29837.9P16303.4P34313.8P35276.8P

    AFFX-HUMGAPDH/M33197_3_at18016.4P17463.8P20921.3P26908.3P28382.2P21885P28202.2P23469.5P24297.2P23422.3P23797.2P22775.6P25217.4P29758.1P36482P33120.5P28945.5P15374.5P30306P19634.6P32865.4P40969P

    AFFX-HSAC07/X00351_5_at23294.6P21783.7P18423.3P21858.9P23517.1P19450.3P23452P26002.5P26794.3P26572.2P26252.9P14853.4P4712.7P11690.9P19340.1P15405.7P12542.9P15498.4P21155P19970.1P22129.2P40315.1P

    AFFX-HSAC07/X00351_M_at25373.1P24922.8P22384.2P25760.2P27718.5P21401.6P28559.3P29511.7P29582.7P26612.4P27116.9P23689.2P16466.7P21382.9P25955.9P23031P24866.1P19101P29376.9P27792.8P28482.6P49154.6P

    AFFX-HSAC07/X00351_3_at20032.8P20251.1P20961.7P23494.6P23381.2P21173.3P24080.2P22098.4P23454.5P22287.2P22368.1P24910.5P23435.9P24370.4P26831.5P25788P26660P19272P26867.6P26376.6P26232P42005.4P

    AFFX-M27830_5_at731.5P614.9P7824.9P1992.1P1480.5P5937.8P7698P557.7P773.2P643.8P722.6P247.9P290.6P331.6P217.2P192.5P139.9P1087.5P223.5P1995.8P241.7P2049.8P

    AFFX-M27830_M_at1010.2A1188.8A4241.2P2780.4P2380P3578.7P4191.1P950.7A1239.1A992.6A992.2A1309.9P1359.1P1822.3P898.1A1382.4P1193.2P2099.6P785.8M3744.6P702.5P3301.9P

    AFFX-M27830_3_at81A69.3A74.1A65.9A65A76.7A51.9A78.5A88.2A67.2A74A34.3A44.2A37.9A44.2A49A124.4A78.1A35.2A65.8A49.6A103.4A

    AFFX-hum_alu_at19621P20396.3P19844.2P24283.1P25676.5P20397.4P24600.4P23703.6P24923.8P21932P22025.5P26129.1P26736P27473.3P29117P28906.2P31201.7P16425.5P28380.2P20494.3P27279.6P32884.9P

    AFFX-r2-Ec-bioB-5_at243.2P193.6P253.7P462.1P326.4P318.8P361.7P147.4P188.7P260.7P282.5P765P810.2P798P767.2P636.4P685.8P292.6P743.6P351.4P763P1131.7P

    AFFX-r2-Ec-bioB-M_at482.6P448.1P594.3P819.5P676.9P596.2P766.5P407.8P381.2P507.6P508.1P1548.6P1744.3P1590.7P1406.5P1493.9P1344.3P583P1471.4P781P1825.8P3043.9P

    AFFX-r2-Ec-bioB-3_at254.3P231.8P345.3P519.5P399.6P330.5P308.8P219.2P242.5P357.6P310P1070.4P1125.9P1052.1P853.9P1172.2P803.4P363.2P986.4P489.7P1190P2464.5P

    AFFX-r2-Ec-bioC-5_at645.1P595.9P753.7P1115.7P776.7P740P888.9P604.4P541.2P793.1P782.2P2136.7P2216.8P2137.6P1839.4P1826.3P1784P768.1P1994.1P949.6P2271.6P3893.9P

    AFFX-r2-Ec-bioC-3_at848.5P874.3P974P1336.2P1035.8P959.7P1136.8P769.6P793.8P1040.8P1008.7P2424.4P2536P2399.8P2104.3P2307.4P2243.9P842.9P2341.2P1271.3P2546.5P4499.8P

    AFFX-r2-Ec-bioD-5_at2599.8P2252.4P2735.6P4011.8P2877.3P2730P3247.1P1795.1P1973P2531.8P2643.6P9343P9177.1P9748.3P9139.7P8699.8P9390.2P2815.5P9402.3P3308.4P10559.5P16905.4P

    AFFX-r2-Ec-bioD-3_at2866.7P2618P3150.6P4917.6P3857.5P3588.1P3817.5P3023.2P3102.7P3341.9P3540.7P10513.7P10451.6P10994.6P10189P9660.2P9506.8P3271.5P9735.3P4674.8P11113.8P16322.4P

    AFFX-r2-P1-cre-5_at12667.2P12782.6P14740.1P19846.3P16928.7P14437.9P16816.1P13400.8P13357.9P13998.3P15705.1P28504.9P29704.5P30898.6P31712P31446.5P33733.1P13628.9P28158P20762.2P30988.5P54324.5P

    AFFX-r2-P1-cre-3_at16690.5P17395.8P21243.5P25469.7P21833P19222.1P22998.4P15822.9P15951.7P17769P17911.9P38037.8P35454.6P35665P37282P39075.7P36027.3P19703.5P37179.3P26240.9P46345.4P77209.3P

    AFFX-r2-Bs-dap-5_at2.8A3.3A2.7A2.4A2.8A2.1A1.8A3.7A5.2A5A2.3A2.2A2.4A1.5A4.1A2.3A2.8A8.9A4.5A7.4A5.5A2.5A

    AFFX-r2-Bs-dap-M_at14.7A27P13.6A17.7A8.1A16.6A1.7A16.6A23.3A10.1A12.5A20A2.8A2.5A23.9A29.2A10.8A11.1A6.2A20.2A0.9A56A

    AFFX-r2-Bs-dap-3_at7.4A3.1A5.6A16.2A21.4A2.5A2.7A12.4A4.1A2.2A5.8A3.3A3.7A6A1.5A2.7A25.6A13A27.1A14.8A33A4.4A

    AFFX-r2-Bs-lys-5_at10.9A6.4A28.9A3.4A14.6A5.3A9.9A13.2A45A9.5A28.2A24.3A3.4A2.7A16.3A29.2A14A31.2A11.2A72.5A3.1A54.9A

    AFFX-r2-Bs-lys-M_at43.2P31.4A4.9A21.9A21A23.8A17.8P29.1A14.9A20.3A44.9A2.2A38.1A34.2A17.2A13.9A49.3A35.1A8.9A37.5A5.1A55.8A

    AFFX-r2-Bs-lys-3_at55.6P48.7A82P75.2P76.3P70.9P45.3A49.8A63.6A98.9P48.6A49.6A39.4P42.3A113.4P80.8P58.1A72.1P41.1A75.9P81.7A91.3P

    AFFX-r2-Bs-phe-5_at27A3.7A15.7A4.8A9.8A16.7A18.8A3A41.3A8.3A6.4A23.3A38.3A1.4A42.1A52.1A15.6A11.5A4.5A28.7A20.7A100.7A

    AFFX-r2-Bs-phe-M_at1.5A31.9A27A24.2A22.6M27A19A18.5A27.6A34.1A20.4A6.9A16.6A41.3A74.6M29.2A3A25.8A38.6A22.7A46.8A3.7A

    AFFX-r2-Bs-phe-3_at61.2A77.5A93.3M87.4A38.5A54.2A8.2A68.7M95.1A94.7A80.2A32.3A83.4A11.4A75.9A70.7A32.8A48.6A27.6A13.1A91.8A14.8A

    AFFX-r2-Bs-thr-3_s_at13.2A8A49.6A8.2A20.6A34A9.4A7.6A11.4A39.5A24.2A7.5A6.9A11.8A12.4A14.3A6.6A10.1A7.3A4.2A5.6A91.5A

    AFFX-r2-Bs-thr-M_s_at44A79.3A48.1A82.3A65.1A83.9A38.3A55.2A143.4A91.7A94.9A13.5A35.3A27.5A76.4A37.4A51.9A68.9A70.6A83.2A85.4A23.5A

    AFFX-r2-Bs-thr-5_s_at96.4P102.2P116.1P79.1A90.8M83.3A74.8P101.8P166.8P88.4P94.2P73.4P71.2A71.2P108.7P32.3A82.4M120.6P120P93.2P79.6A149A

    AFFX-r2-Hs18SrRNA-3_s_at1044.6P719.2P10360.7P8882.5P4229.4P10168.5P14669.6P646.9P793.9P668.9P685P229.7P115.8A183.6P157.2P306.2P180.3A6805.5P128.2A9604.8P210P1296.9P

    AFFX-r2-Hs18SrRNA-M_x_at269P250.1P4548.6P1518.8P705.8P2899.3P5258.5P184.1P226.7P222.1P295.8P28.6A73.3A60.8A15.5A74.7A6.7A1561.2P86.6A2268.4P17.5A789P

    AFFX-r2-Hs18SrRNA-5_at831.6P570.8P7851.3P3963.4P2585.9P6886P10754.2P281.8A397.5A301.5A461.7P20.2A20A46.9A19.3A88.4A9.4A2079.6P81.1A3511.4P69.2A598.8P

    AFFX-r2-Hs28SrRNA-3_at651.1P438.1P4422.5P960.9P978.6P2685.2P3684.4P382.1P415.1P475.3P567.6P151.6P205.3P224.6P223.8A328.2P203.1P797.5P128.6A1052.7P224.8A1469.4P

    AFFX-r2-Hs28SrRNA-M_at183.4A221.8A422.8P249.7A332A405.2P528.4P139.4A249.6A193.9A183.1A73.3A74.9A85.2A95.9A82.2A14.9A219.3A49A292.4A89.3A375.6A

    AFFX-r2-Hs28SrRNA-5_at126A100.9A1257.7P362.7P203.6A722.7P1132.4P90.5A109.4A76.2A115.1A63.4A9.4A60.6A20A11.7A10.8A201.5A17A264.7A22.2A158.1A

    200000_s_at2698.9P2701.3P1950.7P2491.8P2198.9P2182.9P2454.4P2839.2P2960.4P2952.1P3106.3P1802.2P823.3P1249.8P1796.6P2167.7P1702.4P1659.2P2039.4P2557.3P1604.3P963.5P

    200001_at7695.8P6613.7P6303.7P7525.3P6564P6026P7791.9P9186.2P9077.7P8389.4P8497.6P3150.2P1344.3P2670.2P5515P5173.9P2389.7P5585.1P6209.9P5223.7P5898.3P6256.2P

    200002_at14678.9P14211.2P14274P19098.8P20025.4P14405.9P16094.2P14751P14670.1P15242.2P14868.8P19276.3P14791.4P19035.8P20930.1P18220.1P16696P10737.9P17994.5P16617.4P20997.5P35772.2P

    200003_s_at12596.6P12638.7P14242.2P15085.5P15284.6P14589.9P17190.7P17211.7P16305P16418.1P16951.8P16816.4P16216.9P20776.2P21691.9P20613.7P18357.8P11336.6P21282.4P13254P23640.2P25298.6P

    200004_at6921.3P7009.1P8999.6P9749.9P9277.3P9524.8P8704.3P7600.7P7295.9P8216.4P8513.2P6547.4P9230.2P8384.2P7124.7P8221P8253.7P6870.2P6963.9P7977.9P6884.9P7804.3P

    200005_at3970.3P3676.8P3739.5P4199.8P4580.6P4485.9P4134.5P3695P3587.2P3903P5050.3P6621.8P4644.3P5935.5P6330.7P5356.7P6849.4P3045.4P6925.4P5604.6P4761.5P6491.5P

    200006_at5668.7P6990P7869.9P10307.1P8835.4P9624.1P9049.8P6935.8P6506.2P7042.5P9454.7P7881.7P11877.3P12399.9P12273P12839.1P10988.4P6426.4P9353P5535.2P11125.5P7718.2P

    200007_at5466.4P5368P6836.3P6511.8P6194.7P6523.7P7412.8P5612.9P5876.9P5739.5P5014.2P6011.3P6868P8856.7P7866P7260.6P6836.5P6538.3P6517.2P6038.9P7752.8P5812.9P

    200008_s_at3486.4P3463.1P3440.8P3248.5P3771.6P3636.2P4156P4481.3P4590.9P3479.2P4357.3P3354.9P3357.2P3677.1P3958P3424P4136.4P2413.9P4377.7P4360.7P4670.4P4353.1P

    200009_at6376P6046.2P6110.3P7077.3P7479.8P6759.1P7229.5P7676.1P7397.4P6840.5P7168.7P7648.2P7718.9P8374.9P7988.1P8503.3P9500.2P4792.8P8687.6P8776.6P7734.5P7830.1P

    200010_at7049.1P8599.9P9005.3P10723.1P9810.1P11043.8P11753.8P9684.3P10243.6P11669.2P11858.6P14503.5P14198.8P15202.6P15170.4P20103.4P16403.3P5751.1P13659.5P7737.4P12262.2P18125P

    200011_s_at1974.8P2032.8P1928.8P1830.6P1932.7P1824.1P1500.2P2058.1P1897.7P1768.7P1392P1225.7P912.6P1878.5P2022.8P1530.6P870.5P2238.3P1192.3P1514.4P1196.2P2264.4P

    200012_x_at12603P10529.8P10120.3P15197.4P13708.3P12065.6P11129.4P12639.4P13067.6P11607P13167P13244.7P13148.4P15525.1P16047.9P14986.2P17245.3P8138.4P13084.1P12324.2P12907P23086.3P

    200013_at11760P12193.6P13798.6P16020.6P15651.5P18009.6P14743.8P14949.9P16461P16746.1P17119P20341.9P16699.3P18906.1P22338.7P21768.7P20664.7P10449.5P19617.3P15068.3P19634.4P25811.7P

    200014_s_at3027.4P3039.3P3435.4P3842.4P3959.8P3774.4P3811.5P2948.2P2516.4P3364.7P3302.6P4480.9P5257.4P5030.3P4858.9P5153.7P4794.7P3214.3P3744.6P3569.3P5074.8P2975.4P

    200015_s_at3506.3P3077.4P3540.3P3859.9P3492.5P3608.8P3840.7P3209.8P2684.6P2820.9P3502.9P2923P4090.4P3158.2P2899.4P3259.7P3196P3086.4P2683.2P3368P2608P2746.3P

    200016_x_at11676.9P12482.5P14783.4P14912.2P15202P15227.5P16421.3P16338.8P14119.9P13396.5P12704.2P19013.6P20567.1P21593.3P19210P24843.3P22072.2P12549.6P17605.8P14268.9P18341.5P15506.1P

    200017_at10892.5P11641P10098.1P10144P9526P10727.2P9698.9P7603.5P9095.8P10211.9P12683.3P14505.1P14496.6P14116.7P12785.6P12383.6P8531.7P8430.3P12450.9P10291.3P14569.2P16990.1P

    200018_at13280.7P14560.6P14254.3P17077.1P17871.8P15079.2P14832.5P17399.6P16936.5P16315.3P16391.8P18798.4P17706.3P22482.7P25047P21623.3P22552.9P13239P24635.7P14951.3P22171.9P31188.5P

    200019_s_at12991.2P14190.6P12924.8P12764.3P13177.1P12817.3P16293.3P13471.6P14331.1P12978.4P14614.3P19836P14637.9P17103.7P19446.4P19697P16468.7P10647.9P19791.8P13968.6P20127.5P19282.4P

    200020_at2725.1P2580.9P2771.1P3179.2P3114.1P3323.5P2699.6P2695P2555.5P2641.4P3003.4P2425.6P2723.1P2759.6P2389.2P2553.7P2918.3P2467.8P2529.4P3491.1P2429P2242.5P

    200021_at19554.4P18966P18954.2P22107P22943.7P20117.3P23250.1P22762P26079.5P22885.2P20789.3P23848.2P19320.5P22326.8P24994.6P29913.9P27299.2P18742.8P25421.2P22694.8P31207.3P36887.5P

    200022_at13014.4P14211.7P12634.4P13979.7P15497.1P11837.4P14235.4P13900.3P15637.6P13592.7P13770.9P17928.5P10219.1P18441.9P19174.2P17280P12239P8893P19802P12706.4P22594.3P18500.5P

    200023_s_at5866.9P5762.9P6222.2P5527P6110.2P5875.9P6533.2P6387.1P6268.4P6636.3P7208.4P9839.9P9817.8P9526.8P8538.5P10560.7P10136.5P5307.1P9122.4P6450.7P9931.8P7735.4P

    200024_at11106.8P9545.6P12951.8P15217P13505.2P12439.8P18462.3P12616.9P13133.4P13547P12973.1P14255.2P10136.1P16813.6P17510.4P12312.6P15476P8160.3P15238.2P8546.1P17914.2P21122.2P

    200025_s_at14197.3P13379.4P14306.1P17247.2P18349P15409.8P15833.1P15787P15246.4P15859.5P17849.9P22058P20495.3P22793.8P24061.7P23531.9P22458.3P12192P21611.3P15739.1P20520.6P26440.9P

    200026_at10156.6P9799P12391.1P13172.5P13003.9P12811.4P13454.2P13276.9P13102P12815.4P11729.7P19068.2P15233.5P19439P20797.2P17738.2P23095.4P9828.3P15705.4P8393.5P18702P25659.7P

    Sheet2

    Sheet3

  • Source: Practical Microarray Analysis, Presentation by Benedikt Brors, German Cancer Research Center

  • Classification (Supervised learning)(Clustering: unsupervised learning)Classification: separate items into groups based on features of the items and based on a training set of previously labeled itemsMany classification algorithms:Decision tree, SVM, nave bayes, nearest neighbors, neural networks, etc.Some tell you how the classification is made, which might help biologists to understand the molecular mechanismsSome are black boxesIn most cases, performance by different algs is similar. Having the right features (predictor variables) is the key.

  • Golub et. al., Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring, Science 286: 531 537, 1999Method: weighted vote (similar to centroid classifier)AML: acute myeloid leukemia ALL: acute lymphoblastic leukemia

    Classification is criticalfor successful treatment.

    Clinical distinction involves an experienced hematopathologists interpretation of tumor morphology, histochemistry, immunophenotyping, and cytogenetic analysis. each performed in a separate, highly specialized laboratory

    Still imperfect and errors do occur.

  • Centroid-based classifierModel Training: Based on the training data calculate the centroid for each class.

    Classification: Given a data point, calculate the distance between the point and each of the class centroids.Assign the point to the closest class*****G2G1c1***oooooc2ooox?d2d1ALL centroidAML centroid?

  • K-Nearest-Neighbour classifierModel Training: noneClassification: Given a data point, locate K nearest points.Returns the most common class label among the k points nearest to x

    We usually set K > 1 to avoid outliers

    Variations:Can also use a radius threshold rather than K.We can also set a weight for each neighbour that takes into account how far it is from the query point

  • Cancer classificationTons of papers have been published. Many claimed high accuracy. Be careful when evaluating those papers.Very easy to overfit: much more number of genes than number of samplesSimple methods often outperform fancy onesSVM and KNN among bestSimple methods usually also mean robustness and easy to interpretIn most cases, performance by different algs is similar. Having the right features (predictor variables) is the key.

  • Clustering microarray dataUnsupervised learningGroup genes into co-expressed sets Genes with similar expression patterns across multiple experiments may be co-regulatedGroup experiments into clustersExperiments within the same group may have similar gene expression signatureFor example, disease sub-types that can be classified from gene expression data

  • Clustering microarray dataHow to tell if two expression vectors are similar?Define the (dis)-similarity measure between two vectors How to group multiple profiles into meaningful subsets ?Describe the clustering procedure Are the results meaningful ? Evaluate biological meaning of a clustering

  • (Dis)-similarity measuresTwo genes, X=(x1,, xm) and Y=(y1,ym).Euclidean distancePearson correlation coefficient

    Cosine similarityMutual informationEtc.

  • Clustering algorithmsHierarchical clusteringK-means clusteringSelf Organizing Maps (SOMs)Spectral clusteringModel-basedGraph-basedEtc.Jiang and Zhang, Cluster Analysis for Gene Expression Data: A Survey, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11. (2004), pp. 1370-1386

  • Hierarchical clusteringAgglomerative or divisive (less popular)Agglomerative basic idea:Given n genesInitially every gene in a single clusterfor each iterationfind two most similar genes (or gene groups), combine into one clusterTerminate when only one cluster is left

    (how to define similarity between two groups?)

  • Hierarchical clusteringExact behavior depends on how to compute the distance between two clustersNo need to specify number of clustersA distance cutoff is often chosen to break tree into clustersabcdef

  • Distance between clustersSingle-linkageNot recommendedCan be reduced to MSTComplete-linkage

    Average-linkage(very similar to UPGMA)Centroid methodhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletH.html

  • An exampleGenesExperiments

  • Hierarchical clusteringAverage linkage. Cluster genes only.

  • Average linkage. Cluster both genes and experiments.

  • K-meansBasic idea:Given n genesGuess number of clusters: k(Randomly) choose k genes as cluster centersAssign each gene to the closest centerRe-compute center for each clusterUntil assignment is stableSimilarity to EM. Objective function: minimize total distance to cluster centers.

    May be trapped by local optima. Multiple runs with different random starting points are generally needed.

    http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

  • K-meansK = 15

  • Another view of clustersExperimentsLog ratioLog ratio

  • How to determine number of clusters?An open problemLarger K: More homogeneity within clustersLess separation between clustersSmall K:The oppositeMany heuristic methods have been proposed, none is uniformly good

  • Heuristics to determine number of clustersTibshirani, Walther and Hastie, Estimating the number of clusters in a dataset via the gap statistic (2000)Define some statistic with respect to the number of clustersGap statistic: (weighted) average log distance to cluster centers expected

  • Evaluating clusteringDo genes in the same cluster share similar functions?Functional enrichment analysisDo genes in the same cluster share similar cis-regulatory motifs?Motif finding

  • Gene Ontology (GO)Gene functions were often defined using free textHard to extract, transfer, revise, predict, annotate, comprehend, manage The list of vocabularies should be pre-defined and commonly agreedGene Ontology provides a controlled vocabulary to describe gene and gene product attribute

  • Gene ontologyTwo partsOntology: list of vocabularies (terms) to useAnnotations: characterizing genes using ontology terms

    Three ontology categoriesBiological processMolecular functionCellular components

  • Part of a GO graph

    Each GO category is a directed acyclic graph

    A term can have multiple parents, and multiple children.

    A gene can be annotated by multiple terms.

    If annotated by a child term, automatically annotated by all ascendant terms.

  • Example functional enrichment analysisTotal number of genes in yeast: 726865 genes have function in co-enzyme biosynthesisCluster A: 100 genes20 of them have function in co-enzyme biosynthesis65726810020Significance can be computed using cumulative hyper-geometric test: if we randomly draw 100 genes from the genome, whats the chance that well see at least 20 co-enzyme biosynthesis genes?

  • Example functional enrichment analysis65726810020If we randomly draw 100 genes from the genome, the prob that well see exactly 20 co-enzyme biosynthesis genes:Correction for multiple testing problem is usually preferred, as there are many GO terms being tested. Besides GO, other information can also be used to test for enrichment. E.g. protein complexes, pathways, motifs, etc. P-value of enrichment

  • Gene Ontology Toolsgeneontology.orgDownload ontology files, species-specific annotation filesLinks to many useful analysis toolsTools for enrichment analysisGO:TermFinder. Downloadable. (Web interface available at SGD for yeast only)FuncAssociate: Web tool. ~a dozen model organisms (human, mouse, fruit fly, c. elegan, yeast, Arabidopsis, etc).DAVID Bioinformatics Resources: Web tool. (Downloadable). Mammalian genes.

  • RNA-SeqFigure 5 | Overview of RNA-Seq. A RNA fraction of interest is selected, fragmented and reverse transcribed. The resulting cDNA can then be sequenced using any of the current ultra-high-throughput technologies to obtain ten to a hundred million reads, which are then mapped back onto the genome. The reads are then analyzed to calculate expression levels.Shirley Pepke, Barbara Wold & Ali MortazaviNature Methods 6, S22 - S32 (2009) Published online: 15 October 2009doi:10.1038/nmeth.1371Transcriptiome Analysis

  • RNA-Seq: StrategiesFigure 1 from Hass & Zody, 2010

  • RNA-Seq: Strategies*Shirley Pepke, Barbara Wold & Ali MortazaviNature Methods 6, S22 - S32 (2009) Published online: 15 October 2009doi:10.1038/nmeth.1371Alignment StrategyAlign to transcriptomeno new transcript discoveryAlign to genome and exon-exon junction sequencesextremely large search space due to all possible exon combinationsDe novo assemblyCufflinkScripture

  • RNA-seq

    MicroarrayRNA-seqHybridization-basedSequencing-basedCan only detect transcripts with known genomic sequencesFor both known and new transcriptsCannot be easily updated when new genome sequence info becomes availableMay be updated when new genome sequence info becomes availableLow signal to noise ratio due to cross-hybridization etc.No cross-hybridization issue => higher signal to noise ratio Relatively narrow dynamic rangeAbility to quantify a large dynamic range of expression levelsInsignificant computational challengeSubstantial computational challengeSubstantial data interpretation challengeIntermediate data interpretation challenge