computational laboratory: acgh data analysis feb. 4, 2011 per chia-chin wu

Post on 17-Jan-2016






Click to see full reader


Computational Laboratory: aCGH Data Analysis

Feb. 4, 2011

Per Chia-Chin Wu

Today’s Topics

• Review aCGH and its data analysis

• Homework of aCGH data analysis using tools in Genboree and ruby

Chromosomal Aberrations

REF: Albertson et al

Array CGHLabel

Patient DNA with


Label Control

DNA with Cy5

Hybridize DNA to genomic clone


Analyze Cy3/Cy5 fluorescence ratio of

patient to control (log of Cy3/Y5)

Workflow of aCGH Analysis

Finished chips (scanner) Raw image data (experiment info ) (image processing software)

Probe level raw intensity data

Background adjustment, Normalization, transformation

Raw copy number (CN) data [log ratio of tumor/normal intensities]

Segmentation and boundary determination Estimation of CN

Characterizing individual genomic profiles

• Background Adjustment/CorrectionReduces unevenness of a single chip

Before adjustment After adjustment

Corrected Intensity (S’) = Observed Intensity (S) – Background Intensity (B)

Eliminates non-specific hybridization signal


• NormalizationReduces technical variation between chips Before After

S – Mean of S

S’ =

STD of S

S’ ~ N(0,1 )


• Log Transformation

before Log transformation


after Log transformation


S : Probe raw intensity; S’ : Log transformation, S’ = log2(S)CN = S’tumor - S’normal = log2(Stumor/Snormal)








• Goal:To partition the clones into sets with the same copy number and to characterize the genomic segments.

Noise reduction Detection of Loss, Normal, Gain, Amplification Breakpoint analysis

• Biological model: genomic rearrangements lead to gains or losses of sizable contiguous parts of the genome. Recurrent (over tumors) aberrations may indicate an oncogene or a tumor suppressor gene

• AWS - Adaptive Weights Smoothing• CBS - Circular Binary Segmentation• HMM - Hidden Markov Model partitioning• Many more

All existing methods amount to unsupervised, location-specific partitioning and operating on individual


Segmentation Methods

Workflow of aCGH Data Analysis

Finished chips (scanner) Raw image data (experiment info ) (image processing software)

Probe level raw intensity data

Background adjustment, Normalization, transformation

Raw copy number (CN) data [log ratio of tumor/normal intensities]

Segmentation and boundary determination Estimation of CN

Characterizing individual genomic profiles

Homework: Analyze TCGA Data

The Cancer Genome Atlas Project (TCGA)

• Goal: find genomic alterations that cause cancer (mutations, CNA, methylation, …)

• Pilot project1. brain (glioblastoma multiforme): 186 pairs of tumor and normal samples2. lung (squamous)3. ovarian (serous cystadenocarcinoma )

Flowchart of Data Analysis

Raw copy number (CN) data [log ratio of tumor/normal intensities]

Segmenttion and boundary determination Estimation of CN

Characterizing individual genomic profiles


Identify Recurrent Genes

Ruby: Mapping Probes

Ruby: Mapping Probes

Ruby: Mapping Probes

LFF format

Upload Data

Data Analysis: Segmentation

Data Analysis: Combine Tracks

Data Analysis: Annotation Selector

Data Analysis: Mapping Genes

Data Analysis: Recurrent Genes

Overview of Data Analysis

Raw copy number (CN) data [log ratio of tumor/normal intensities]

Data Preprocessing (Ruby) and uploading data to Genboree

Segmentation (Segmentation Tool)

Characterizing individual genomic profiles

Combing data

Annotation (Annotation Selector; Attribute Lifter)

Identify Recurrent Genes (Ruby)

You Need To Submit

1. ruby script from step 1 that creates your lff file

2. ruby script from step 5 that parses your table

3. two-column final output from step 5

top related