sw-array: a dynamic programming solution for the identification of copy-number changes in genomic...

17
SW-ARRAY: a dynamic programming solution for the identification of copy-number changes in genomic DNA using array comparative gnome hybridization data

Post on 18-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

SW-ARRAY: a dynamic programming solution for the identification of copy-number

changes in genomic DNA using array comparative gnome

hybridization data

Motivation

• Chromosomal changes cause genetic diseases– aneusomies

• Easy to detect

– Copy number changes of genes• Not so easy

Array CGH

• Comparative Genome Hybridization CGH to DNA microarrays

• Method for detecting copy number changes– Data analyzed using thresholds– Not reliable to detect single-copy gains or losses

when using large insert clones as probes – High false positives and false negatives– Inconsistent for probes of different chromosomal

regions

• Cannot be used for clinical diagnostic applications!

Data Adjustment

• Normalization and Correction– Reason: variations between probes– Control vs. control data ratio

• Find mean and SD

– Divide control vs. test ratios by that mean

Threshold method

• Compare each data from control vs. test experiment to threshold values– Below 0.8=deletion– Above 1.2=polysomy

SW-ARRAY

• Smith-Waterman algorithm adapted for Array CGH

• New way to analyze Array CGH data

• Reason:– Log ratio data is contiguous one-dimensional

series, where locations of high values may indicate polysomic regions, low deletions

SW-ARRAY

• Step 1:– Remove outlying probes

• Log intensity ratio more than 2.5 MAD away from median of other probes in array

• MAD=Mean Absolute Deviation– Robust measure of Standard Deviation

1

1 n

iix x

n

SW-ARRAY

• Step 2:– Log ratio data - t0

– Ensures that the mean of adjusted data is negative

• t0=median + 0.2 x MAD

SW-ARRAY

• Step 3:– Search for high-scoring islands

• Definition– locally high-scoring segment-a positive

scoring segment whose score cannot be increased by shrinking or expanding segment boundaries

SW-ARRAY

( , ) ( )q

i pT p q X i

T(p,q)=score of segmentX(i)=score for the pth probe ordered along genome

SW-ARRAY

S(p)=score of island ending at pB(p)=beginning point of the islandS(0)=0P>0

SW-ARRAY

• Iterate through locations along gene probes

• Search where scores>0– Find max-scoring island– Record data– Set island=0– Find next max-scoring island

SW-ARRAY

• Statistical Significance– In 1000 runs with permuted log ratios for each

probe• find frequency of highest scoring island in each run

Experiment

• Test Group– DNA from subjects with well-characterized

monosomies

• Control groups

• Data analyzed using 2 methods– Threshold– SW-ARRAY

Experiment Results

• Threshold Method– 78.1% correct identification of copy-number

changes

• SW-ARRAY– Identified 13/14 of the monosomic regions

with high significance levels in the 14 blind tests

Ideal Conditions for SW-ARRAY

• numerious probes border region of copy number change

• long sequences for which edge effects are minimized

Output