structural variant detection in colorectal cancer€¦ · structural variant detection in...
TRANSCRIPT
Structural variant detection in colorectal cancer
1
1 Dept. of Pathology, VU University Medical Center, Amsterdam, The Netherlands 2 Dept. of Epidemiology & Biostatistics, VU University Medical Center, Amsterdam, The Netherlands 3 Dept. of Pathology, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands 4 Dept. of Medical Oncology, Academic Medical Center, University of Amsterdam, The Netherlands 5 Dept. of Computer Science, VU University, Amsterdam, The Netherlands
E van den Broek1, JC Haan1, MH Jansen1, B Carvalho1, MA van de Wiel2, ID Nagtegaal3, CJA Punt4, B Ylstra1, S Abeln5, GA Meijer1, RJA Fijneman1
Colorectal cancer (CRC)
• Colorectal cancer is a major health concern worldwide
• Second cause of cancer related death – The incidence worldwide is 1,200,000 – The incidence in the US is 144,000
• Mortality rates
2
Stage 1 < 10 % Stage 2 25 - 30 % Stage 3 45 – 50 % Stage 4 > 90 %
Diagnostic biomarkers
Predictive biomarkers
Prognostic biomarkers
Clinical needs:
1. Screening
2. Predict recurrence
3. Personalized therapy
CRC research Clinical needs for biomarker discovery
3
normal colon progressive adenoma adenoma localized CRC CRC liver
metastasis CRC lymph node
metastasis
activated Wnt signaling
genomic instability (~5% of adenomas)
~15% MIN+ ~85% CIN+
~3% MIN+ ~97% CIN+
Key molecular features
0
20
40
60
80
100
5-ye
ar s
urvi
val r
ate
(%)
Stage IV Stage I+II Stage III CRC stage:
Chromosomal Instability a hallmark of CRC SKY: numerical & structural aberrations
4 M Hermsen et al., Oncogene 2005
CAIRO & CAIRO2 studies
Phase III clinical trials In total 1575 patients were included CApecitabine, IRinotecan, Oxaliplatin in advanced colorectal cancer CAIRO: Koopman et al. Lancet 2007 CAIRO2: Tol et al. N Engl J Med 2009 DNA from 356 patients: primary tumor and matched normal
– Representative group – Isolated from FFPE
5
Array CGH: 356 CAIRO & CAIRO2 samples
6
Calling
Copy numbers
Segmentation 1
2
3
Numerical aberrations
Comparative Genomic Hybridization (CGH) Agilent, 180k array CGH
Segmentation - array CGH Profile of one tumor with 180k probes
7
Segmentation was performed using Circular Binary Segmentation algorithm (DNAcopy. Olshen et al. 2004)
Calling - array CGH Profile of one tumor with 180k probes
8
Calling was performed using CGHcall (CGHcall. vd Wiel et al. 2007)
Structural Variants (SV) in cancer
Hematological disorders • Philadelphia chromosome
– t(9;22) – Fusion gene: BCR-ABL – Drug: Imatinib / Gleevec
Epithelial cancers • TMPRSS2-ERG in prostate cancers • VTI1A-TCF7L2 is confirmed in 3% of 97 CRCs
– Bass et al., Nature Genetics 2011
9
TO IDENTIFY RECURRENT SOMATIC STRUCTURAL GENOMIC VARIANTS THAT CAUSE CRC
AIM
10
Array CGH: 356 CAIRO & CAIRO2 samples
Breakpoint (BP) detection Based on array CGH
11
Breakpoint detection
Candidate genes
Segmentation 1
2
3
Structural variants
BP detection in array CGH Profile of one tumor with 180k probes
12
Breakpoints are defined by the start position of the first
probe of each segment
Breakpoint annotation per gene
• Total number of genes with BPs: 5,737 genes • 482 candidate genes were identified with recurrent BP (FDR < 0.1)
13
Results based on array CGH BP detection
020
4060
80100
120
140
Am
ount
of a
ffect
ed s
ampl
es in
arr
ay C
GH
Candidate genes (top 15)
MACROD2
Overall survival: MACROD2 Recurrent BP (1) versus no-BP (0)
14
Log rank P= 0.08 0 500 1000 1500
0.0
0.2
0.4
0.6
0.8
1.0
MACROD2
Overall Survival (days)
Sur
viva
l pro
babi
lity
BP (samples)0 ( 207 )1 ( 144 )
• Total number of genes with BPs: 5,737 genes • 482 candidate genes were identified with recurrent BP (FDR < 0.1) Limitations breakpoint determination using array CGH: – Location BP is estimation (average probe distance is ~17 kb) – DNA structure is unknown – Balanced events will be missed
15
CGH
482 CANDIDATE GENES
Results based on array CGH BP detection
• 482 candidate genes were identified with recurrent BP (FDR < 0.1)
16
CGH
482 CANDIDATE GENES
Candidate validation is required
Validation array CGH BPs NGS data from TCGA
The Cancer Genome Atlas CRC samples (COAD & READ)
Whole Genome DNA Seq from paired tumor-normal samples
• 482 candidate genes were identified with recurrent BP (FDR < 0.1)
17
CGH
482 CANDIDATE GENES
Candidate validation is required
Validation array CGH BPs NGS data from TCGA
Structural Variant (SV) detection Candidate driven
Negative Control Genes
(no BP)
Computational methods Focus on candidate genes
Based on paired-end NGS data • Read-pair approach
– Discordance: location / bridge length / orientation reads
Discordant pairs (DP) types • Translocation > different chromosomes • Insertion > bridge length • Deletion > bridge length • Inversion > orientation • Eversion > orientation • Single mapped could indicate a breakpoint
18
ref
Computational methods Focus on candidate genes
Based on paired-end NGS data 1. Read-pair approach
Combined with: 2. Read-depth
3. Define breakpoint location
4. Determine tumor specific events
19
ref
!
1
2
3
Translocation IGV MACROD2 • Discordant pairs • Breakpoints
20
!
!
Fusion partner
Based on DP groups:
Approximately 5 fold higher number of translocation-DP groups for candidate genes compared to control genes
Distribution DP groups per type Preliminary results candidate genes in TCGA data
21
Deletion 50.8%
Eversion 8.4%
Inversion 6.7%
Insertion 26.4%
Translocation 7.7%
Translocation-DP groups per candidate gene in TCGA samples Putative translocations
22
Freq
uenc
y of
tran
sloc
atio
n-D
P gr
oups
(au)
Candidate genes
Freq
uenc
y of
tran
sloc
atio
n-D
P gr
oups
(au)
Candidate genes (top 20)
MACROD2
0% 10% 20% 30% 40%
Correlation per candidate gene - Frequency of samples with BP based on array CGH - Frequency of translocation-DP groups in TCGA data
23 Frequency of affected samples in array CGH analysis
Freq
uenc
y of
tran
sloc
atio
n-D
P gr
oups
(au)
MACROD2
Conclusions • 482 candidate genes with recurrent breakpoints were identified
in a large cohort of 356 CRC samples, based on array CGH analysis
• The TCGA provided an essential CRC reference dataset (COAD, READ) to validate Structural Variants in candidate genes with recurrent breakpoints
• Identification of BPs based on array CGH is correlated with SV detection in TCGA CRC NGS data
• Further studies will be performed to investigate clinical and functional significance of validated candidate genes
24