genomic arrays: tools for cancer gene discovery ian roberts mrc cancer cell unit hutchison mrc...
Post on 19-Dec-2015
214 Views
Preview:
TRANSCRIPT
Genomic Arrays: Tools for cancer gene discovery
Ian RobertsMRC Cancer Cell UnitHutchison MRC Research Centreir210@cam.ac.uk
2/17
What’s a genomic array?
A platform of regularly spaced genomic sequences All known genes or a subset of genes of interest
A tool for querying the genome about damage Genomic gains (oncogenes) Genomic losses (tumour suppressor genes)
Applications Research disease gene discovery Clinical diagnostic tests
Comparative genomic hybridisation
Tumour DNA(Test)
Normal DNA(Reference)
+Available probe
GAIN: More test probe than reference probe (oncogene)
LOSS: Reference probe in excess of test(tumour suppressor)
Vast majority is normal Array platform
4/17
New generation arrays produce large amounts of data
Agilent 244K array
243,504 defined spots
Raw data is foreground and background signal intensities in two channels
Median ratio of foreground is important.
6/17
Genomic array analysis strategy using R1. array data is processed by snapCGH R package
Correct array data for background noise and mean distribution Order data by genomic location Apply an aCGH segmentation algorithm Draw some plots
2. Determine significant findings (in house R functions) Common and minimum genomic regions of gain and loss Summarise output
R www.cran.r-project.orgsnapCGH www.bioconductor.orgparrot R on camgrid http://www.bio.cam.ac.uk/local/condor-parrot.html
Distributed aCGH analysis
Consolidate output
Preprocess dataInput data to snapCGH(e.g. 3 chrs, 2 analysis
methods)Condor Job 1
Condor Job 2
Generate genome ordered data and condor dagman
analysis batch files
Chr 1 Chr 2 Chr 3
DNA copy GLAD DNA
copy GLAD DNA copy GLAD
Perform aCGH analysis + region
detection (1 run per Chr per analysis
method)
DNAcopy dagman description file
Score combining
1. Clone call scoring
n. Clone call scoring
Segmentation Step
CR
I M
RI
De
tec
tio
n
Dagman job 1 … n
10/17
Condor job scripting in BASH & R
BASH function Responsible for producing required condor files for discrete jobs Default_submit has 2 positional parameters
R script name $1 Data files $2
Initiates aCGH analysis on grid.
Condor dagman R function set R-scripter
Writes the appropriate R script for the current job R-condor-submitter
Writes the condor job submission file R-condor-executer
Writes the condor job executable file R-job-descriptor
Writes the condor dagman description file
11/17
End user abstraction – start_aCGH.sh
aCGH analysis undertaken by a single shell command Manages array data input Collects user specified parameters
Chromosome range Segmentation algorithms Significance thresholds
Links condor R job scripting
14/17
Summary findings (38 arrays)
• Rapid identification of regions of interest
• Easy comparison of aCGH analysis via different algorithms
Bio HMM
DNAcopy
Sam
ple
per
cen
tag
eS
amp
le p
erce
nta
ge
Reg
ion
siz
eR
egio
n s
ize
15/17
Real life application
Retrospective analysis confirms initial findings!(summary of 38 samples)
OSMR
Sam
ple
per
cen
tag
e
Reg
ion
siz
e
16/17
Future development
Tailor output for specific user requirements Produce overall summary plot Apply approach to expression arrays
top related