supplemental information genetic alterations activating kinase … · 2012. 8. 13. · supplemental...
TRANSCRIPT
Cancer Cell, Volume 22 Supplemental Information
Genetic Alterations Activating Kinase
and Cytokine Receptor Signaling
in High-Risk Acute Lymphoblastic Leukemia Kathryn G. Roberts, Ryan D. Morin, Jinghui Zhang, Martin Hirst, Yongjun Zhao, Xiaoping Su, Shann-Ching Chen, Debbie Payne-Turner, Michelle L. Churchman, Richard C. Harvey, Xiang Chen, Corynn Kasap, Chunhua Yan, Jared Becksfort, Richard P. Finney, David T. Teachey, Shannon L. Maude, Kane Tse, Richard Moore, Steven Jones, Karen Mungall, Inanc Birol, Michael N. Edmonson, Ying Hu, Kenneth E. Buetow, I-Ming Chen, William L. Carroll, Lei Wei, Jing Ma, Maria Kleppe, Ross L. Levine, Guillermo Garcia-Manero, Eric Larsen, Neil P. Shah, Meenakshi Devidas, Gregory Reaman, Malcolm Smith, Steven W. Paugh, William E. Evans, Stephan A. Grupp, Sima Jeha, Ching-Hon Pui, Daniela S. Gerhard, James R. Downing, Cheryl L. Willman, Mignon Loh, Stephen P. Hunger, Marco A. Marra, and Charles G. Mullighan Inventory of Supplemental Information Supplemental Data Table S1, related to Table 1 Table S2, related to Table 1 Table S3, related to Table 1 Table S4, related to Table 1 Table S5, related to Table 1 Table S6, related to Table 1 Table S7, related to Table 1 Table S8, related to Table 1 Table S9, related to Table 1 Figure S1, related to Figure 1 Figure S2, related to Figure 2 Figure S3, related to Figure 3 Figure S4, related to Figure 4 Figure S5, related to Figure 5 Figure S6, related to Figure 6 Figure S7, related to Figure 7 Supplemental Experimental Procedures Supplemental References
Supplemental Data Table S1, related to Table 1. List of probe sets that identify Ph-like cases by PAM. Provided as an Excel file.
Table S2, related to Table 1. limma gene expression signature of Ph+ and Ph-like B-ALL versus non Ph-like B-ALL samples in P9906. Provided as an Excel file.
Table S3, related to Table 1. Validated fusions detected by mRNA-seq Sample ID Support-
mate pairs Gene 1 Gene 2 Comments Rearrangement
Gene Chrom Location Gene Chrom Location
PAKTAL
29 22 5
STRN3 C12orf35 FAM23A
14 12 10
exon 9 intron 4 exon 3
JAK2 AMN1 MRC1
9 12 10
exon 17 intron 6 exon 2
In-frame Aligned to intron In-frame
STRN3-JAK2*,#
C120rf35-AMN1* FAM23A-MRC1*
PAKKCA
66 6 5
EBF1 SEMA6A DOCK8 PAX5 ZCCHC7
5 5 9 9 9
exon 15 exon 1 exon 3-4 exon 6 exon 2
PDGFRB FEM1C CBWD2 ZCCHC7 PAZ5
5 5 2 9 9
exon 11 exon 2 intron 10 exon 3 exon 7
In-frame No ORF Inversion Inversion, disrupted ORF Inversion, disrupted ORF
EBF1-PDGFRB*,# SEMA6A-FEM1C#
DOCK8-CBWD2* PAX5-ZCCHC7*** ZCCHC7-PAX5***
PAKVKK
15 6 6 17
NUP214 SEMA6A TPM4 TSHZ2
9 5 19 20
exon 34 exon 1 exon 2 intron 2
ABL1 FEM1C KLF2 SLC35A1
9 5 19 6
exon 3 exon 2 exon 3 intron 5
In-frame No ORF Inversion
NUP214- ABL1*,# SEMA6A- FEM1C# TSHZ2- SLC35A1* TPM4- KLF2#
PALIBN 53 IGH@ 14 EPOR 19 IGH@-EPOR* PAKYEP 33 BCR 22 exon 1 JAK2 9 exon 15 In-frame BCR-JAK2*,# PAMDRM
8 12 8 10
IGH@ OAZ1 TPM4 SEMA6A SLC2A5
14 19 19 1
exon 1 exon 2 exon 1 intron 1
CRLF2 KLF2 KLF2 FEM1C BTBD7
X/Y 19 19 5 14
exon 3 exon 3 exon 2 intron 10
No ORF No ORF Inversion
IGH@-CRLF2** OAZ1-KLF2# TPM4-KLF2# SEMA6A-FEM1C# SLC2A5-BTBD7*
PAKKXB
13
IGH@ BTBD7
14 14
intron 10
CRLF2 SLC2A5
X/Y 1
intron 1
Inversion
IGH@-CRLF2** BTBD7-SLC2A5*
PALETF None PAKHZT IGH@ 14 CRLF2 X/Y IGH@-CRLF2** PALJDL
5 ZNF292 6 exon 1
SYNCRIP 6 exon 2
No ORF
ZNF292- SLC2A5*
PANNGL 16 PAX5 9 exon 5 JAK2 9 exon 19 In-frame PAX5-JAK2* PANSFD 28 ETV6 12 exon 5 ABL1 9 exon 2 In-frame ETV6-ABL1* PANEHF 3
76 RCSD1 ABL1
1 9
exon 3 exon 3
ABL1 RCSD1
9 1
exon 4 exon 4
In-frame RCSD1-ABL1* ABL1-RCSD1*
SJBALL085 NUP214 9
exon 34
ABL1
9
exon 3
In-frame
NUP214- ABL1*,#
SJBALL010 4 RANBP2 2 exon 18 ABL1 9 exon 2 In-frame RANBP2-ABL1*
*detected by deFuse; #detected by Mosaik; **previously identified (Harvey et al., 2010a); ***detected by Trans-ABySS; ORF, open reading frame. Cytokine receptor and kinase-activating fusions highlighted in bold.
Table S4, related to Table 1. Somatic single nucleotide variants (SNVs) and insertion/deletion mutations identified by mRNA-seq Sample ID Gene Chr Position Sequence change Amino acid change PAKHZT JMJD3 chr17 7691628 T>C S433P RAB11B chr19 8373013 G>A A94T JAK2* chr9 5079702 G>A R867Q OFD1 chrX 13696806 T>C S531P;S853P;S993P PAKVKK ILF3 chr19 10655385 G>A R642Q;R646Q NSMAF chr8 59682919 C>T R241H
PAX5** IKZF1**
chr9 chr7
37010775 50435466
C>G +CCTCCCC
G24R S402fs
PAKKXB TP53 chr17 7514718 T>A N345I
ATRX IKZF1*
chrX chr7
76805676 50411913
G>T -G
S1256;S1286;S131; S1324;S1357 L117fs
PALIBN ZNF12 chr7 6697396 G>A H494Y;H529Y;H530Y; H568Y
PHF8 chrX 54065429 C>G R94T;R130T PALJDL VCL chr10 75544616 G>A A1003T;A1071T WDR45L chr17 78167098 C>T D283N;D341N MOV10 chr1 113042873 G>A R785H;R841H
HECA IL7R
chr6 chr5
139529155 40940849
C>T -TTAATTACGC
P105S L242>FPGVC
PAMDRM
LGALS8 JAK2 MKI67 HDAC7
chr1 chr9 chr10 chr12
234768983 5068357 129793424 46477155
G>A +GGACCC -GAG +T
V105M;V106M GPinsI682 S2223_P2224>S L604
PALETF LYSMD4 chr15 98087051 C>A V231F;V232F RAD51C chr17 54129161 C>A D171E DMPK chr19 50975027 G>A P44L ADNP chr20 48942253 G>A S802F TSHZ2 chr20 51304885 C>T P494L
APC FLT3
chr5 chr13
112204169 27506253
T>C +ATTTTGGAAAGTAACATAAGAGATCATATTCATATTGTCTGAAATCAATGTAGAAGTACTCCCAATTTT
L1660P KWEFPRENEYFYVDFREYEYDLinsL604
SJBALL085 IKZF1 chr7 50411801 +GCTCAGG A79fs
SJBALL010
CREBBP
IFITM3
MPST
chr16
chr11
chr22
3739628
310606
35750701
Splice
G>T
G>A
Exon 18
P70T
E167K
*, Identified previously (Mullighan et al., 2009c); **, Identified previously (Mullighan et al., 2009b). Multiple SNVs for each gene represent different isoforms.
Table S5, related to Table 1. Validated somatic insertions/deletions and SNVs for PALJDL detected by WGS
Gene Accession Chr Position* Class Sequence change
Amino acid change Comments
IL7R NM_002185.2 chr5 35910329 insertion A>CCCGGGGGTCTGC L242>FPGVC Confirmed by mRNA-seq data
ZNF468 NM_199132 chr19 58036999 frameshift +GGG,-A A67fs Confirmed by mRNA-seq data
CGNL1 NM_032866 chr15 55608184 missense A>T A1027V Low coverage of locus in mRNA-seq
COL4A1 NM_001845 chr13 109628259 missense C>T V883I Low coverage of locus in mRNA-seq
HECA NM_016217 chr6 139529155 missense C>T P105S Confirmed by mRNA-seq data
MOV10 NM_020963 chr1 113042873 missense G>A R841H Confirmed by mRNA-seq data
OR8K1 NM_001002907 chr11 55870928 missense G>C A280P Low coverage of locus in mRNA-seq
PDZD2 NM_178140 chr5 32019253 missense G>A A238T Low coverage of locus in mRNA-seq
PPFIA2 NM_003625 chr12 80212905 missense C>T R922Q Low coverage of locus in mRNA-seq
ROBO1 NM_133631 chr3 78767718 missense C>A G1051C Low coverage of locus in mRNA-seq
VCL NM_014000 chr10 75544616 missense G>A A1071T Confirmed by mRNA-seq data
WDR45L NM_019613 chr17 78167098 missense C>T D341N Confirmed by mRNA-seq data
*aligned to human reference genome 18.
Table S6, related to Table 1. Validated somatic SNVs for PALETF detected by WGS
Gene mRNA_acc Chr Position* Class Sequence change
Amino acid Change Comments
ADNP NM_015339 chr20 48942253 missense G>A S802F Confirmed by mRNA-seq data
ALPK2 NM_052947 chr18 54335283 missense C>T G1926E Low coverage of locus in mRNA-seq
APC NM_001127511 chr5 112204169 missense T>C L1660P Confirmed by mRNA-seq data
ARSI NM_001012301 chr5 149657296 missense C>T V462M Low coverage of locus in mRNA-seq
ATP7B NM_000053 chr13 51437128 missense A>G S584P Low coverage of locus in mRNA-seq
DMPK NM_001081560 chr19 50975027 missense G>A P44L Confirmed by mRNA-seq data
IL7 NM_000880 chr8 79811310 missense C>T G123E Low coverage of locus in mRNA-seq
KRT1 NM_006121 chr12 51356339 missense C>T G488R Low coverage of locus in mRNA-seq
LYSMD4 NM_152449 chr15 98087051 missense C>A V232F Confirmed by mRNA-seq data
MAGI1 NM_015520 chr3 65322046 missense C>T D1196N Low coverage of locus in mRNA-seq
MYOF NM_013451 chr10 95147017 missense T>G K437N Low coverage of locus in mRNA-seq
MYOF NM_013451 chr10 95147027 missense A>T V434E Low coverage of locus in mRNA-seq
PAPPA2 NM_020318 chr1 174830357 missense G>A G332R Low coverage of locus in mRNA-seq
PDE4B NM_001037341 chr1 66156917 missense C>T S31F Low coverage of locus in mRNA-seq
RAD51C NM_058216 chr17 54129161 missense C>A D171E Confirmed by mRNA-seq data
RHAG NM_000324 chr6 49691341 missense C>T E199K Low coverage of locus in mRNA-seq
SPO11 NM_012444 chr20 55342171 missense C>T S81F; S119F Low coverage of locus in mRNA-seq
TSHZ2 NM_173485 chr20 51304885 missense C>T P494L Confirmed by mRNA-seq data
ZNF280A NM_080740 chr22 21199339 missense T>G T206P Low coverage of locus in mRNA-seq *aligned to human reference genome 18.
Table S7, related to Table 1. Somatic deletions for PALJDL and PALETF detected by WGS
Gene 1 Gene 2 Size Comments Sample ID Gene Chrom Position* Gene Chrom Position (bp) PALJDL NCR** 12 110325422 SH2B3 12 110361365 35,943 Deletes exons 1-2 of SH2B3 ELF1** 13 40448351 NCR 13 40489525 41,171 Deletes exon 1 of ELF1 NCR** 12 63246783 RASSF3 12 63291396 44,613 Deletes exon 1 of RASSF3 CDK6** 7 92281190 CDK6 7 92301324 20,134 Deletes exon 1 of CDK6 ARRDC5
19
4836226
UHRF1
19
4873165
36,939
Deletes ARRDC5 and removes exon 1 of UHRF1
CCNL1** 3 158360591 NCR 3 158375967 15,376 Deletes 5’ UTR of CCNL1 PALETF NCOA6 20 32853566 NCOA6 20 32876123 22,557 Deletes exon 1 of NCOA6 ARL8B**
3
5194887
EDEM1
3
5204877
9,990
Predicted fusion between ARL8B and EDEM1
CDH2** 15 91176886 NCR 15 91259725 82,839 Deletes exons 1-2 of CDH2 Deletions identified by copy number variant analysis using CONSERTING and CREST that were not detected by SNP array. These algorithms also identified all lesions determined by SNP array analysis. NCR, non-coding region; UTR, untranslated region. *aligned to human reference genome 18; **validated by genomic PCR and Sanger sequencing.
Table S8, related to Table 1. Summary of Ph+ cases and Ph-like prediction by PAM for the COG cohorts P9906, AALL0232 and St Jude Children’s Research Hospital Total XV
P9906
N (%) AALL0232_1
N (%) AALL0232_2
N (%) Total XV
N (%) Total cases 207 283 325 342
Ph+ 0 14 (4.9) 21 (6.5) 7 (2.1)
Ph-like 43 (20.8) 40 (14.1) 42 (13) 33 (9.6)
AALL0232_1 and AALL0232_2 are consecutively enrolled groups from the COG high-risk AALL0232 trial.
Table S9, related to Table 1. Description of COG P9906 and AALL0232 cohorts. Provided as an Excel file. Tabulated information outlining PAM prediction and ROSE clustering determining Ph-like cases, B-cell pathway and kinase activating lesions, and kinase expression (PDGFRB, JAK2, ABL1 and CRLF2) by gene expression profiling using U133 Plus 2.0 array. Column definition is listed below: A: Cohort – P9906 or AALL0232 B: Sample ID - *, mRNA-seq index cases; C: TARGET ID - nomenclature is consistent with previous publications (Mullighan et al., 2009b; Mullighan et al., 2009c; Harvey et al., 2010a) D: BCR-ABL1 status – Positive or negative E: PAM prediction – Ph-like or non Ph-like F: PAM Coefficient > 0.5=Ph-like. Samples are ranked in descending order for both cohorts G: ROSE clustering classification H: Group – TCF3-PBX1, ETV6-RUNX1, BCR-ABL1 positive, MLL rearranged, CRLF2 rearranged or unknown (other) I: B-cell pathway lesions in IKZF1, PAX5, EBF1 or CDKN2A deletion J: CRLF2 - Rearrangements in CRLF2 (IGH@-CRLF2 or PR2Y8-CRLF2) detected previously for P9906 (Harvey et al., 2010a), or over-expression indicating a rearrangement in AALL0232. K: JAK mut: - Mutation in JAK genes detected previously for P9906 (Mullighan et al., 2009c) and identified recently for AALL0232. L: Other kinase-activating lesion: Rearrangements and sequence mutations affecting kinase or cytokine signaling detected by mRNA-seq or whole genome sequencing analysis in the current study. M-R: Gene expression profiling of PDGFRB (M), JAK2 (N-P), ABL1 (Q) and CRLF2 (R)
10
Figure S1, related to Figure 1. Summary of COG ALL recruitment and frequency of Ph-like cases. Approximately two-thirds of COG cases are classified as standard risk, and one third are enrolled onto high-risk trials (P9906, AALL0232). Of the high-risk cases, 15-20% are classified Ph-like, determined by PAM. Within this Ph-like group, 50% harbor rearrangements of CRLF2, with 30% of these expressing concomitant JAK mutations, and the other 20% with unidentified lesions. From the current study, all 15 cases subjected to mRNA-seq harbored rearrangements or mutations affecting kinase and cytokine signaling. Recurrence testing of novel fusions in additional P9906 was prevented by limited availability of RNA. Consequently, the subsequent COG trial of high-risk B-ALL (AALL0232) was used, and the EBF1-PDGFRB fusion was detected in 3/40 (8%) of Ph-like cases. We also identified additional ABL1 and JAK2 rearrangements in this cohort using mRNA-seq. Furthermore, IL7R insertion/deletion mutations were detected in 5/42 (12%) of the P9906 Ph-like cases, but notably, not in AALL0232. No somatic SH2B3 sequence mutations were identified.
11
A
B
C
D
12
E
F
G
H
13
Figure S2, related to Figure 2. Additional fusion validation by RT-PCR RT-PCR and sequencing validation of additional fusions identified by mRNA-seq analysis. Representative RT-PCR gel and sequencing for (A) SEMA6A-FEM1C (S/F) from case PAKKCA; (B) the in-frame fusion TPM4-KLF2 (T/K) from case PAKVKK; (C) OAZ1-KLF2 (O/K) from case PAMDRM; (D) the in-frame fusions FAM23A-MRC1 (F/M) and C12orf35-AMN1 (C/A) from case PAKTAL; (E) the interchromosomal translocation TSHZ2-SLC35A1 (T/S) from case PAKVKK; (F) for ZNF292-SYNCRIP (Z/S) from case PALJDL; (G) the interchromosomal translocation DOCK8-CBWD2 (D/C) from case PAKKCA; and (H) the reciprocal inversion PAX5-ZCCHC7 (P/Z) and ZCCHC7-PAX5 (Z/P), which disrupts the open reading frame of PAX5, from case PAKKCA. NTC = non-template control. (I) Inferred log2 ratio copy number data from SNP array showing gain of one DNA copy between NUP214 and ABL1 in PAKVKK. Each vertical red line indicates log ratio copy number state for a single probe set on the 500K SNP array. (J) Sequencing validation for two additional NUP214-ABL1 cases from the P9906 cohort.
14
A
15
. Figure S3, related to Figure 3. Paired-end mRNA-seq reads aligning to EBF1 and PDGFRB and FISH confirmation of EBF1-PDGFRB in case PAKKCA (A) Paired-end mRNA-seq reads aligning to EBF1 and PDGFRB on chromosome 5q32 for case PAKKCA. The reads aligning to intron 15 of EBF1 and intron 10 of PDGFRB correspond to the genomic breakpoint, and the reads aligning to exon 15 of EBF1 and exon 11 of PDGFRB correspond to the in-frame fusion point. (B)The bacterial artificial chromosome (BAC) clone on chromosome 5 telomeric of EBF1 (RP11-583A20, red). (C) Two BAC clones flanking PDGFRB on chromosome 5; RP11-1079A8 (centromeric, green) and RP11-759G10 (telomeric, red). (D) PDGFRB break-apart assay using probes in B showing loss of the telomeric probe on one chromosome (arrow), due to the deletion between EBF1 and PDGFRB. (E) Colocalization assay showing the fusion signal (arrow) between the telomeric EBF1 clone (red) and centromeric PDGFRB clone (green), Normal signals are close together, but not fused. The fusion was detected in over 95% of cells analyzed, indicating that the EBF1-PDGFRB fusion is present in the predominant clone at diagnosis.
B
C
D
E
16
17
Figure S4, related to Figure 4. Multiple JAK2 breakpoints and genomic mapping of BCR-JAK2 in case PAKYEP. Soft-clipped reads showing two JAK2 breakpoints for the BCR-JAK2 (B/J) fusion in case PAKYEP. (A) Reads aligning to BCR are shown in black letters, with soft-clipped reads detected by CREST aligning to JAK2 (highlighted in blue) and represent either exon 15 (blue underline) or exon 17 (green underline) of JAK2. RT-PCR showing two bands corresponding to JAK2 exon 15 fusion (top band) or JAK2 exon 17 fusion (bottom band). Sequencing validation showing BCR fused to JAK2 exon 15 (left) and exon 17 (right). The red amino acid is changed from the wild-type sequence. (B) Bambino view of mRNA-seq split reads showing the genomic breakpoint of BCR at chr22:21905862 to JAK2 intron 14 (soft-clipped reads) and breakpoint of JAK2 at chr9:5066708 to BCR intron 1 (soft-clipped reads). (C) Genomic PCR and sequencing confirming the breakpoint between BCR and JAK2. Note microhomology of 2bp (CA) at the two breakpoints, which can be aligned to either BCR or JAK2.
B
C
18
Figure S5, related to Figure 5. Genomic PCR and Sanger sequencing validation of additional deletions in PALJDL. (A) Genomic PCR gel. (B) Deletion between non-coding region on chr12:63246783 and RASSF3 (chr12;63291396), which deletes exon 1 of RASSF3. (C) Deletion between ELF1 (chr13:40448351) and non-coding region (chr13:40489525) that deletes exon 1 of ELF1. (D) Deletion between chr7:92281190 and chr7: 92301324 which deletes exon 1 of CDK6. Alignment based on human reference genome 18.
A
B
C
D
19
Figure S6, related to Figure 6. Phosphosignaling analysis in non-Ph-like ALL cases. Primary leukemic cells were thawed, treated with or without dasatinib (100nM) and ruxolitinib (1µM) for 1 hr and stained for pSTAT5 and pCRKL according to Experimental Procedures.
20
Figure S7, related to Figure 7. Modeling EBF1-PDGFRB in vitro. (A) Ba/F3 EBF1-PDGRB cells are sensitive to dasatinib and dovitinib. No cytotoxic effects were observed with cells maintained in factor indicating that imatinib specifically targets the activated PDGFRB and ABL1 kinases. Error bars represent mean + SD of three independent experiments. (B) Imatinib inhibits phosphorylation of the EBF1-PDGFRB fusion protein. (C) pAKT and pERK1/2 are constitutively activated in EBF1-PDGFRB expressing cells, and signaling is inhibited by dasatinib (100 nM).
A
B
C
pAKT pERK
pAKT pERK
21
Supplemental Experimental Procedures
Patients samples and gene expression profiling
Ten Ph-like ALL cases from the COG P9906 high-risk B-ALL study (Bowman et al., 2011), three
cases enrolled on the high-risk COG AALL0232 study (ClinicalTrials.gov Identifier
NCT00075725) and two cases treated on the St Jude Children’s Research Hospital Total XV
(Pui et al., 2009) and Total XVI protocols (ClinicalTrials.gov Identifier NCT00137111 and
NCT00549848, respectively) were selected for mRNA-seq based on a similar gene expression
profile to Ph+ ALL, as determined by ROSE clustering (Harvey et al., 2010b) and PAM
(Tibshirani et al., 2002), and the availability of suitable genomic material. This selection included
a range of cases with variable Ph-like expression signature from strongest to weakest (see PAM
coefficient details in Table S9). Cases were initially chosen from P9906, as the description of
Ph-like ALL was first reported in this cohort (Mullighan et al., 2009b). Details of the P9906
cohort, and prior genomic analyses performed in this cohort, have been described previously
(Mullighan et al., 2009b; Mullighan et al., 2009c; Harvery et al., 20010a; Zhang et al., 2011).
All P9906 patients were classified as high-risk based on the presence of central nervous system
or testicular disease, MLL rearrangement, or based on age, sex, and leukocyte count at
diagnosis. BCR-ABL1 and hypodiploid ALL patients, in addition to those who experienced
primary induction failure were excluded. Cases with high hyperdiploid (as defined by trisomy of
chromosomes 4 and 10 on cytogenetic analysis) or ETV6-RUNX1 cases were excluded unless
central nervous system or testicular involvement was present at diagnosis. A total of 207
enrolled cases had suitable material for 500K SNP microarrays and U133 Plus 2.0 gene
expression microarrays (Affymetrix).
WGS of matched non tumor DNA obtained from remission bone marrow at day 29, or at a
subsequent remission timepoint after commencement of remission-induction therapy, was
performed for all cases. WGS of leukemic cell DNA was also performed for two cases that
22
lacked a chromosomal rearrangement identified by mRNA-seq and prior genomic analyses
(Harvey et al., 2010b). Due to limited availability of RNA material from the P9906 cohort,
recurrence testing of the ABL1, JAK2 and PDGFRB rearrangements was performed in a
separate cohort of B-ALL patients enrolled on the COG AALL0232 study.
All AALL0232 B-ALL patients were diagnosed with National Cancer Institute high-risk ALL
based on WBC count >50x109/L or age >10 years at presentation, prior steroid therapy, or the
presence of testicular disease. The average age at diagnosis was 10.0 + 5.8 years. Twenty-
eight cases (9.9%) were hyperdiploid, 20, (7.1%) were ETV6-RUNX1-positive, 17 (6.0%) were
TCF3-PBX1-positive, 14 (4.9%) were BCR-ABL1-positive, 5 (1.7%) harbored MLL
rearrangements and 199 cases (70.3%) lacked a known chromosomal abnormality. All samples
were obtained with patient or parent/guardian provided informed consent under protocols
approved by the Institutional Review Board at each COG institution. The clinical study was
approved by the National Cancer Institute and appropriate Institutional Review Boards. A total of
231 BCR-ABL1-negative patients had available RNA for RT-PCR.
Recurrence screening for each fusion was also performed on 23 JAK2/MPL-negative MPN
samples from the Harvard myeloproliferative disorders study (age range 35-81), including 13
with polycythemia vera, 5 with essential thrombocythemia and 2 with myelofibrosis (Levine et al.,
2005). In addition, 25 CMML samples obtained from the MD Anderson Cancer Centre (age
range 61-88), and 44 pediatric AML samples (16 cases with normal karyotypes, 18 with
miscellaneous or non-recurrent cytogenetic alterations, 5 inv(16) and 5 t(8;21)) from St Jude
Children’s Research Hospital (Radtke et al., 2009) were also included (age range 3-21).
Sequencing for IL7R and SH2B3 mutations was performed using whole genome amplified
leukemic DNA from the P9906 and AALL0232 high-risk ALL cohorts. All samples were obtained
23
with patient or parent/guardian provided informed consent under protocols approved by the
Institutional Review Board at each COG institution.
Gene expression profiling was performed using U133 Plus 2.0 arrays for P9906 (N=207), and
AALL0232 (N=608) (Affymetrix). Expression signals were normalized by MAS 5.0 algorithm.
Probe sets lacking present calls for every sample were excluded, and signal intensities with
values less than 2 were set to 2. Signals were then log2 transformed for subsequent analysis
(Mullighan et al., 2009b). To identify Ph+ and Ph-like cases, we trained PAM using the second
consecutively recruited subgroup of AALL0232 cases (N=325) to detect Ph+ and Ph-like cases
in the first subgroup of AALL0232 cases (N=283). The PAM predictor containing 257 probe sets
(Table S1) was obtained through cross validation analysis at a threshold of 2.2, and correctly
identified 13 of 14 Ph+ AALL0232_1 cases and classified 40/283 (15%) as Ph-like, determined
by a PAM coefficient greater than 0.5. The same training conditions were applied to the P9906
cohort, with 43/203 cases (21%) classified as Ph-like (Table S9). To identify differentially
expressed genes in Ph-like cases, limma (Linear Models for Microarray Analysis) (Smyth, 2004)
with estimation of fold-change and false discovery rate was also performed (Benjamini and
Hochberg, 1995) (Table S2).
mRNA-seq library preparation and sequencing
mRNA-seq was performed as previously described (Morin et al., 2010) with modifications. Total
RNA was extracted from leukemic cells obtained from bone marrow aspirates or peripherial
blood using TRIzol (Life Technologies). Poly(A)+ RNA was enriched from 5-10 µg of DNAse 1-
treated total RNA using the MACS mRNA isolation kit (Miltenyi Biotec). Double-stranded cDNA
was synthesized from the purified poly(A)+ RNA using the Superscript Double-Stranded cDNA
Synthesis kit (Life Technologies) and random hexamer primers (Life Technologies) at a
concentration of 5 μM. The cDNA was fragmented by sonication and a paired-end sequencing
library prepared following the paired-end library preparation protocol (Illumina). For mRNA-seq
24
library sequencing, clusters were generated on the Illumina cluster station and paired-end
sequence reads were generated using v3-v5 sequencing reagents on the Illumina GAIIx and
HiSeq 2000 platforms following the manufacturer's instructions. Read length summary for each
case is provided in below. Image analysis, base-calling and error calibration were performed
using v1.0, v1.3.2, v1.5.0 and v1.6.0 of Illumina's Genome analysis pipeline.
Read summary information for mRNA-seq data and matched constitutional DNA
*, samples sequenced on the HiSeq 2000.
Sample ID Sample type Number of lanes x read length (bp) Total reads
PAKTAL Tumor Normal
1x36, 6x50 14x50
144257180 352797054
PAKKCA Tumor NormaL
4x76 4x76
168403164 198063899
PAKVKK Tumor Normal
1x36, 6x50 14x50
144981646 352797054
PALIBN Tumor Normal
4x76 4x76
166126640 140487148
PAKYEP Tumor Normal
6x76 4x76
228085558 145008762
PAMDRM Tumor Normal
4x76 4x76
177659630 166810882
PAKKXB Tumor Normal
5x76 4x76
233574978 199671920
PALETF Tumor Normal
4x76 4x76
130810808 181507216
PAKHZT Tumor Normal
1x36, 6x50 14x50
175124596 401590800
PALJDL Tumor Normal
4x76 4x76
161808726 171312172
PANNGL Tumor 1/3x100* 92019240 PANSFD Tumor 1/3x100* 85436840 PANEHF Tumor 1/3x100* 159676104 SJBALL085 Tumor 1x75 71651522 SJBALL010 Tumor 1x100 70141484
25
Whole genome shotgun library preparation and sequencing
Illumina paired-end whole genome shotgun libraries were prepared from 1 µg of genomic DNA
from COG P9906 cases PALJDL and PALETF (both tumor and matched remission DNA) as
described (Shah et al., 2009). The resulting libraries were sequenced on the llumina GAIIx
platform using v5 paired-end 36-100bp sequence chemistry following the manufacturer's
instructions. For eight of the nine remaining cases, libraries were prepared from obtained
remission DNA only to aid in identifying inherited variants. Image analysis, base-calling and
error calibration were performed using v1.4.0, v1.5.0 and v.1.8.0 of Illumina's Genome analysis
pipeline.
Detection of SNVs and fusion transcripts from mRNA-seq data
Detection of SNVs and fusion transcripts from mRNA-seq data was performed independently by
BC Genome Sciences Centre (BC) and St. Jude Children’s Research Hospital (SJCRH) using
different approaches as previously described. The results generated from the two institutes
were combined and sent for validation to generate the final candidate fusion list for experimental
validation.
At BC, all reads were aligned to the human reference genome (hg18) or (for mRNA-seq) to a
genome file that was augmented with a set of all exon-exon junction sequences using BWA
version 0.5.4 (Li and Durbin, 2009). mRNA-seq libraries were aligned with an in-house modified
version of BWA that is aware of exon junction reads and considers them when determining
pairing distance in the “sampe” (read pairing) phase of alignment. Candidate SNVs were
identified in the aligned genomic sequence reads and the transcriptome (mRNA-seq) reads
using an approach similar to that have been previously described (Morin et al., 2010). One key
difference in our variant calling in this study was the application of a Bayesian SNV identification
algorithm ('SNVmix') (Goya et al., 2010). The deFuse software (version 0.2.0;
http://compbio.bccrc.ca/software/defuse/) (McPherson et al., 2011) was utilized for the
26
identification of putative fusion transcripts using the hg18 reference genome. Predicted fusion
sequences were subsequently aligned using BLAT and those with numerous high-confidence
alignments were removed. Predicted fusions were further filtered to remove those with less than
2 ‘split reads’ (those that cross the fusion point) and predicted fusions involving adjacent (or
nearby) gene pairs were also removed. The Trans-ABySS pipeline for detecting rearrangements
has been previously described (Robertson et al., 2010).
At SJCRH, Illumina pair-end reads in FASTQ format were aligned against NCBI build 36 of the
human reference genome using Mosaik 0.9 (Marth, 2010). The Mosaik alignment parameters
were: (a) hash size of 15; (b) maximum percentage of the read length that are allowed to be
errors: -mmp 0.05; (c) use the aligned read length instead of the original read length when
counting error; (d) minimum percentage of read length should be aligned: -minp 0.5; (e)
alignment candidate threshold of 35bp; (f) maximum number of hash position of 100; (g)
alignment mode: unique.
Inter- and intra-chromosomal structural variation (SV) detection was carried out using Spanner
(Durbin et al., 2010), which is a classification scheme for structural variants based on distinct
patterns of paired-end read genome coverage. For intra-chromosomal SVs, calls involving
adjacent genes were first filtered out to exclude potential read-through events. The candidate
list was then sorted in descending order by the number of genes between the fusion breakpoints
and higher confidence was given to fusions with larger number of genes separating the
breakpoints. Each SV candidates were also checked to see if the orientations of the supporting
pair-end reads and the orientations of the genes involved were consistent to generate a
potential fusion product.
27
Validation of candidate somatic mutations fusion transcripts
Validation was attempted for each of the candidate point mutations and fusion transcripts
identified in the P9906 and Total XV and XVI cases. For SNV this was accomplished by
designing primers to amplify a 200 to 300 bp region around the targeted variant with one primer
within reach of a single read (<=75 bases). Polymerase chain reactions were set up in 96-well
plates and comprised of 0.5 μM forward primer, 0.5 μM reverse primer, 1-3 ng of gDNA
template, 5X Phusion HF Buffer, 0.2 μM dNTPs, 3% DMSO, and 0.4 units of Phusion DNA
polymerase (New England Biolabs). Reaction plates were cycled on a MJR Peltier
Thermocycler (model PTC-225) with cycling conditions of a denaturation step at 98 °C for 30
sec, followed by 35 cycles of [98°C for 10 sec, 69°C for 15 sec, 72°C for 15 sec] and a final
extension step at 72°C for 10 min. PCR reactions were visualized by SybrGreen (Life
Technologies) in 1.2% agarose (SeaKem LE) gels run for 90min at 170V to assess PCR
success. The resulting amplicons were pooled by patient and template, one for tumor and one
for normal DNA, with equal volumes from each PCR reaction and an indexed Illumina paired-
end sequencing library was constructed from each pool as described (Wiegand et al., 2010).
The resulting library was sequenced using v5 paired-end sequencing reagents on the Illumina
GAiix platform following the manufacturer's instructions. Between the paired 75 base reads a
third 7 base pair read was performed using the following custom sequencing primer to
sequence the hexamer barcode [5’-GATCGGAAGAGCGGTTCAGCAG GAATGCCGAGACCG].
Image analysis, base-calling and error calibration were performed using v1.8.0/ RTA 1.8.70.0 of
Illumina's Genome analysis pipeline. Reads were aligned using BWA, de-multiplexed using their
hexamer sequence, and variants were visually confirmed for validity and somatic status in
integrative genomics viewer (Robinson et al., 2011) (absence from constitutional DNA).
A total of 1257 novel SNVs (candidate mutations) in 88 genes were identified by mRNA-seq
(ranging from 33-270 SNVs per case). PCR was attempted for 1163 of these events that were
28
amenable to PCR and direct amplicon sequencing. After sequencing and alignment, 1048 had
sufficient sequence coverage from both tumor and normal DNA. Of these, 631 variants were
found to be present in the germline and 394 were deemed to be false positives and the
remaining 23 events were somatic mutations. Mutations were annotated on genes using the
Ensembl transcripts (version 54) and those predicted to cause nonsynonymous or nonsense
changes are reported in Table S4.
From the P9906 and Total XV and XVI cases cases profiled by mRNA-seq, 425 distinct putative
insertion/deletion mutations were identified, with a range of 12-100 per case (mean of 38.6 per
case). Of these target loci, a successful PCR was achieved in both tumor and normal DNA in
365 cases. Deep resequencing of these amplicons revealed that 188 of these were present in
the germline, 167 were false calls and 10 were somatic events (Table S4).
Validation of candidate fusion transcripts
To validate fusion transcripts, primers were designed to span 200-300bp either side of the
fusion point and PCR reactions were set up with 0.4 μl cDNA template, 0.2 μM forward primer,
0.2 μM reverse primer, 5X Phusion HF Buffer, 0.2 mM dNTPs and 0.4 units of Phusion DNA
polymerase (New England Biolabs). Reactions were performed on an Eppendorf thermocycler
with cycling conditions consisting of a denaturation step at 98 °C for 1 min, followed by 33
cycles of [98°C for 10 sec, 66°C for 15 sec, 72°C for 15 sec] and a final extension step at 72°C
for 10 min. PCR products were visualized with GelRed (Biotium, Inc.) on a 1.5% agarose gel run
at 110V for 1 hour, purified using the Wizard PCR purification Kit (Promega) and fusion
transcripts were confirmed by direct Sanger sequencing.
Detection and validation of SNV and insertion/deletion sequence mutations from WGS data
Putative sequence variants including SNVs and indels were initially detected by running the
variation detection module of Bambino (Edmonson et al., 2011) as previously described (Zhang
29
et al., 2012). For confirmation of SNVs as somatic, matched tumor and normal DNA were
subject to PCR and Sanger sequencing. For PALJDL, a total of 19 high quality SNVs were
identified using Bambino, 4 of these were located in untranslated regions (UTRs) and three
were silent. The remaining 12 missense mutations were validated by PCR and Sanger
sequencing, with 1 false positive and two with sequencing failure (Table S5). For PALETF, 54
high quality SNVs were detected, with 3 in non-coding regions, 6 silent mutations, 24 UTR
variants and 20 missense mutations, of which 19 were determined as somatic (Table S6).
Sequence analysis of IL7R and SH2B3
PCR of IL7R exon 6 was carried out on whole genome amplified leukemic DNA from the P9906
(N=188) and AALL0232 cohorts (N=248). Exon 2 of SH2B3, a known hotspot for mutations
deletions, was sequenced (Pardanani et al., 2010). Sequencing and subsequent validation was
performed by Beckman Coulter Genomics (Danvers, MA), and sequencing analysis was
conducted as previously reported (Mullighan et al., 2009b). Base calls and quality scores were
determined using the program PHRED (Ewing et al., 1998). Sequence variations including
substitutions and insertion/deletions (indel) were analyzed using the SNPdetector (Zhang et al.,
2005) and the IndelDetector (Zhang et al., 2007) software. A useable read was required to have
at least one 30-bp window in which 90% of the bases have PHRED quality score of at least 30.
Poor quality reads were filtered prior to variation detection. The minimum threshold of
secondary to primary peak ratio for substitution and indel detection was set to be 20% and 10%,
respectively. All sequence variations were annotated using a previously developed variation
annotation pipeline.
Identification of structural variations and copy number variations from WGS data
SVs including inter-chromosomal translocations, intra-chromosomal translocations, inversions,
deletions, and insertions were analyzed by CREST (Wang et al., 2011). CNVs were identified by
evaluating the number of sequence reads aligned at each base using the novel algorithm
30
CONSERTING (COpy Number SEgmentation by Regression Tree In Next-Gen sequencing) as
previously described (Zhang et al., 2012). From this analysis, we identified six somatic deletions
in PALJDL involving SH2B3, RASSF3, ELF1, CDK6, UHRF1 and CCNL1 (Table S7), five of
these were validated by PCR and Sanger sequencing (Figure S5). In addition, the ARL8B-
EDEM1 fusion, and deletions within NCOA6 and CDH2 in PALETF (Table S7) were also
validated (data not shown). All validation and sequencing primers are listed in the tables below.
31
List of primers for fusion validation and genomic mapping.
FUSION Forward Reverse Primer ID Sequence 5' to 3' Primer ID Sequence 5' to 3' BCR-JAK2 Fusion by RT-PCR BCRe1F1 gtgccataagcggcaccggcact JAK2e17R2 tctcctccactgcagatttcccaca Genomic mapping BCRi1F2 tgcttccccacaccgacttcctaaa JAK2i14R2 cagaaatgttttgcgctgctttatc EBF1-PDGFRB Fusion by RT-PCR EBF1e14F2 cacgagcatgaacggatacggctct PDGFRBe13R2 tttcatcgtggcctgagaatggctc Full-length fusion EBF1e1_FL gggggaggagattttccacaagaaaagg PDGFRBe23_FL gggccagccccctacaggaagctat Genomic mapping EBF1i15F1 aacgggaacagcctgcaaggtaagc PDGFRBi10R1 gtactaggaggtccaggcagggctga PDGFRBi10R2 tccttcccctttatctgccacccttc PDGFRBi10R3 aagtggggctcacactggcccttac PDGFRBi10R4 gaggggcttaccttctgccaaagca NUP214-ABL1 NUPe20F1 cagtggccttggaggaaaacccagt ABL1e3R2 tgtagttgcttgggacccagccttg STRN3- JAK2 STRN3e9F1 atgatgagctgccccacatcccttc JAK2e17R3 cggcacatctccacactcccaaaat IGH@-EPOR
IGHJ5F1 EPORe8F1
tttcctgacctccaaaatgcctcca tctggtgctggacaaatggttgctg EPOR_40bpinsR3
IGHV4R1 gggggtggctacgattatagggtca gcaccaactacaacccctccctcaa
RANBP2-ABL1 RANBP2e16F2 tggttctttgcgaaatgcagattca ABL1e3R1 gccatttttggtttgggcttcacac PAX5-JAK2 PAX5e4F1 accaaccagtcccagcttccagtca JAK2e19R3 cggcacatctccacactcccaaaat ETV6-ABL1 ETV6e5F3 cggcactccgtggatttcaaacagt ABL1e3R2 tgtagttgcttgggacccagccttg RCSD1-ABL1 RCSD1e3F3 cagccagtaaaccaacccgaaggaa ABL1e4R3 gcttgttgcgctttggggctgga SEMA6A-FEM1C SEMA6Ae1F2 cccttctccgctcgtcattggagat FEM1Ce2R2 ttgccatcccgagctgcgttaaata OAZ1-KLF2 OAZ1e1F2 acgcagcggaggttttcctggttt KLF2e3R2 gatcgcacagatggcactggaatg TPM4-KLF2 TPM4e1F2 ctcaactccctggaggcggtgaaac KLF2e3R1 gtggcccgtgtgctttcggtagtg FAM23A-MRC1 FAM23Ae2F1 accacactgccctgcctcacctttt MRC1e2R1 tgtttttgatggcactcccaggcata C12orf35-AMN1 C12orf35i4F2 ccttgtgggaatgtcagaaaagtgtca AMN1i6R2 ctgacagccaccacgtccaaccaat ZNF292-SYNCRIP ZNF292e1F2 gacggagcggggtgtgaagatgg SYNCRIPv1e4R2 tggtcccttgtttttctctctgcctgt BTBD7-SLC2A5 BTBD7i10F2 tcgaatgctcctcaagcaggtcactc SLC2A5i1R2 tgggcatgatgtgcacctgtaatcc DOCK8-CBWD2 DOCK8e3F2 accctgtggagccagtggactttga CBWD2i10R2 ctgcgcttcccaagtgaggcaatg TSHZ2-SLC35A TSHZ2i2F1 cacatcggcctcccaaaatgctaga SLC35A1i5R1 tgccagtatgtcatttgctttggtca PAX5-ZCCHC7 PAX5e6F1 ccggaagcagatgcggggagact ZCCHC7e5R2 gggggctggacaggaatacaggaga ZCCHC7-PZX5 ZCCHC7e2F1 tgcccatggtctttcttcttctcttca PAX5e7R1 gcctgtcacaatggggtaggactgc Actin agtgtgacgtggacatccgcaaagac Actin gcttgctgatccacatctgctggaag
32
List of primers for SNV and SV validation
Primer Name Gene Sequence 5’ to 3’ Forward Reverse Single nucleotide variants HS0825_17_7691628
JMJD3
CGCTGACCATTACCAAACTCCC
CAGGCCGGACCCTTCAAC
HS0825_19_8373013 RAB11B TACCGTGGTGCAGTGGGC CTTCCAGACACCTCTCTGATGCC HS0825_9_5079702 JAK2 TTAGGGTAATTTTGGGAGTGTGGAG TTTAAAATAGGTTTCAATGGGCAGC HS0825_X_13696806 OFD1 AAAAGATGGTCCAAGAAGGCTCC TTACAATTTATATCAGGATCAATTTCACAAGTC HS0894_19_10655385 ILF3 CATGCACAACGAAGTGCCC AAAGACAGGGTTACCGGGGTC HS0894_8_59682919 NSMAF AGGCCGTGCCTCCTTTTGTAG GAATGGGAAACCAGATATCACCTTG HS0894_9_37010775 PAX5 CTTTATTTGAAAGATCAAGGGAAGCC TTTCAGGACATGGAGGAGTGAATC HS1533_17_7514718 TP53 CCTGGGCATCCTTGAGTTCC AGTCAGCTGTATAGGTACTTGAAGTGCAG HS1533_X_76805676 ATRX GTCTGTATCTTGGCTTCTTAGATTCTTCAG TGTCAAGTCTGATGTGTGAGATTACCTG HS1535_7_6697396 ZNF12 ATTCATAGGGCTTCTCTCCTGAATG CTCCCAGTTGTCATACCTCACTATCC HS1535_X_54065429 PHF8 CAGGGTTCCTCACCTGTCAAAAG GATTGGAAGAGAAGGATCTGCTGAG HS1536_10_75544616 VCL ACCATAAGCACCCAGCTCAAAATC ACACAGACTTTCCCCTCCTGAGAC HS1536_17_78167098 WDR45L GAGCTGGTCACAGCCACTGAG AGTGCATCCGAGATGTCTACGC HS1536_1_113042873 MOV10 CCTCCTTCCAGGTGGAGAAAATC AGCTCTATTCAAGGCAGCAGGG HS1536_6_139529155 HECA TGCCACTCCCCTGATCTGC TAGCCCTTCTTTGTCCACATGTTC HS1576_1_234768983 LGALS8 ACACGCCTTTCAAAAGAGAAAAGTC CACATTTAGGTGCCTTCCTGGTC HS1537_15_98087051 LYSMD4 TTAAAGTAGACCAAATAAAAGACAGGCAAG TTTAAGGGGATTGACCAGGATATTG HS1537_17_54129161 RAD51C GAGGGAAGTTTTATGGTTGATAGAGTGG AGGCTGTGGCATTTCTCATTTTG HS1537_19_50975027 DMPK AAGAACCGAGGGTCACCAGAAAG GAGCCCTCTGGGCCAATG HS1537_20_48942253 ADNP TCACGGACACACTTCTTCCTTTTG CTTTGAAGAGAAGCCTGAAGAGCC HS1537_20_51304885 TSHZ2 CGAGGACTATGAAGATCCTCTACAAAAAC GGGATCCCATAGGCAAAGGC HS1537_5_112204169 APC CACCTATAAACTTTTCCACAGCTACATCTC TGCAAGAATATCACCTTCCTCTGC HS1536_15_55608184 CGNL1 CTGTAGCTGGAAAACCTCCTCTTTTTAC TCAATCAAGAAATAACCTCACTACACCC HS1536_13_109628259 COL4A1 CAGCCTTCTGCTTGATGTTCCTAAC TAAAGGAGATAAAGGGGCTCAAGGAC HS1536_139529155 HECA_6 AGAAATCGGTATGAGACCGTCTATGTTC GAGCCAGACTTCTTCTTTTTCTCGTC HS1536_16_30435856 ITGAL TCTAGTGGAGATGCAGACATCCAAG ATTCATCCATCCTTACTCTCCATACACC HS1536_1_113042873 MOV10 GATGGTACCTCCTTAATATCCAGACCAC CTTAAGTTGGAATCTCCAGGAGTGAGAG HS1536_11_55870928 OR8K1 ATGTCCATACTCTGTTCTGACACAAATG GTCCAGAAAAAGAGGAAAATCTAAGCAG HS1536_5_32019253 PDZD2 CTTTATCTACCTGATCATGCTGCGTC GTTTGTCATTGCTTGTGGTCAGAG HS1536_12_80212905 PPFIA2 ATTTTGCTCTGTAGACTTAGACACACGG AGCATGTCTCAATTTCAAATCTTGTTTC HS1536_3_78767718 ROBO1 ACTCTTTGACACTGGAAATTTTGAAACC AATTTAAAAGGAATAACCACGCAGAATG HS1536_10_75544616 VCL TTTCTGGAGAAATGGATTGTACTGACC GTATCTATCCAACACAGACTTTCCCCTC
33
Primer Name Gene Sequence 5’ to 3’ Forward Reverse HS1536_17_78167098 WDR45L CTGGTCACAGCCACTGAGTTACCTAC ATGCTCACACCAAGACAGTCAGG HS1536_19_58036999 ZNF468 TAAGAGTGAGCTGCAATTAAAGGATTTG CTCACGTTTGGGAAGTTGAAAATAAG HS1537_20_48942253 ADNP TTCTTCCAAATTTTCAAAACTGTCTGAG GTTAGATGATGATAGTGATTCACCCAGC HS1537_18_54335283 ALPK2 GTCTGATTTTCATCTCGTCTTATCATGG TTAGCTATGCCTCTTTTGAAGGGTTAAG HS1537_5_112204169 APC GAATGTATTATTTCTGCCATGCCAAC TAATGCATTCTGCAAGAATATCACCTTC HS1537_5_149657296 ARSI TTGAGTTTACGGAAAAAGGATCGAAG GGCTAGATGGCTACGACGTGTG HS1537_13_51437128 ATP7B CATGGGAAAAGTTGAAGAATTTTGG TGTTTTGGGGAAGAGCTACTCTTTAGTC HS1537_19_50975027 DMPK TAGGTTCTAAGGCTCGGTCATTCATC TTATTTCCTTCTCCCCTTGTTCTTTAGG HS1537_8_79811310 IL7 ATCATGCTTTTGTATTTGCCCTAACAG TTATGGATTCTGGGTATTCAGAAGACTG HS1537_12_51356339 KRT1 TAAAGAATAATTTGCTCCACCTCAAAGC AGGGTGTCATATTCTTTTTCAGATCTCC HS1537_15_98087051 LYSMD4 ATCAACAGCCAATTGCTAAAGCTG ACAAAGAACTGAAACCCCTTCTGAG HS1537_3_65322046 MAGI1 AGAGGCTGGTAAGACACTGGTGAATAG CCTTTCTCATGTGATTTGGCATATATGTAG HS1537_10_95147017 MYOF GTTCCCCTTCTTTTGTTTATGAGCAG TACACTTTTGCTTTGGTGATTTTACAGG HS1537_1_174830357 PAPPA2 AGAAATCTGTGTCGCTTACTTTATTCCC GAAATAGTGCCCATCCTCAGAGC HS1537_1_66156917 PDE4B AAAAGCTGAGAATCAGCCTCTTCTATTG TCAATCATCTTAGGTATTTCACAGAAAACTC HS1537_17_54129161 RAD51C TCATGAAGAGTTAGACATTTCTGTTGCC TGTGGCATTTCTCATTTTGTAACAGTAG HS1537_6_49691341 RHAG AGAGGAGAATTGAGCCAGCATAAAGTAG AAAGTCACATTGCAAGGAAGTAACTGAC HS1537_20_55342171 SPO11 ACATTTTTCAAGGTATATGGTTCAGTGC TTGAAGAAAGCTTTAACTGGATGAAAAC HS1537_20_51304885 TSHZ2 ACATGATGGTCACAGGTCACTTTCTC ATAGAAACCAGTGGCATCACTTTCC HS1537_18_71042423 ZADH2 GGAGGAGAAGGAGAGACCTATTCATTAAC GGTTTATCTCTGGCTACCAAACTCCTAC HS1537_22_21199339 ZNF280A GCTAAGTAACACGATGGGATTTTCTTTC ATAAAATGAGCTCACCACAAGTTGTTTC SJBALL010_16_3739806 CREBBP GGGGCCATCATGTCTTTTTGTTTGAA ACACCAGAAATTCCACTTACGGCAACA Structural variations HS1536_5_35910359
IL7R
TGCAAAGCACCCTGAGACCCTACCC
CCGTGATCCCACACAATCACCCTCT
HS1536_12_110325422 SH2B3 CCGGGAGCTACGGATCAGTCATGG GAGACCAGCCTGACCAACATGGAGA HS1536_13_40448351 ELF1 TGTGGTCAAGACAGCTGTGGGCAAG CAAGAGGCACACAAATAAGCGGCATAA HS1536_12_632246783 RASSF3 CCTGGATGCCTGAAAAGGTCCTGAA CTAGCACCGGGCTCCTCAGTCCAG HS1536_7_92281190 CDK6 CCCCCTTAAAAACGGCTAAGCAGCA GCAGATGTCTGACCACCCCTTCTCG Recurrence screening
SH2B3F1 SH2B3F2
CGGTGTGTAATGGGGCCTACACCTG GGCCCTGCTCCTTCCAGCACTTTC
GGCGAAAGTGCTGGAAGGAGCAG ACCAGCTGGAAAGCCATCACACCTC
34
Fluorescence in-situ hybridization
Fluorescence in-situ hybridization (FISH) analyses to confirm the PDGFRB rearrangement and
EBF1-PDGFRB fusion, and to detect the rearrangement between the IGH@ and EPOR loci was
performed on cells stored in Carnoy fixative as previously described (Mullighan et al., 2009a;
Harvey et al., 2010a). For PDGFRB rearrangement, a break-apart mixture containing two
bacterial artificial chromosome (BAC) clones consisting of RP11-1079A8 (centromeric to
PDGFRB) labeled with a green probe (Alexa Fluor 488), and a telomeric clone (RP11-759G10)
labeled with a red probe (Alexa Fluor 568) was used to show loss of the telomeric portion of
PDGFRB. An additional mixture was used to show fusion of EBF1 to PDGFRB, and consisted of
clone RP11-583A20 (telomeric to EBF1) labeled with AlexaFluor 568, and clone RP11-1079A8
labeled with Alexa Fluor 488. Disruption of the IGH@ locus was determined using the LSI IGH
Dual Color Break-Apart Rearrangement Probe (Abbott Molecular), which spans 900kb of the
IGHV regions and 250kb of the 3’ region. For EPOR, a probe mixture was used, consisting of 2
BAC clones RP11-1114G9 (centromeric) and RP11-478I13 (telomeric). BAC clones, fluorescent
labels, and nick-translation materials were obtained through Invitrogen. Total BAC DNA was
isolated using a plasmid midi-prep kit (Invitrogen). Slide hybridization and washes were
performed using standard FISH protocols. Slides were counterstained with 4,6- diamidino-2-
phenylindole and analyzed with a Zeiss Axioskop microscope (Zeiss, Germany) equipped with
the appropriate filter combination and a CCD camera, and coupled to the CytoVision image
analysis system. A total of 25 to 200 interphase cells were scored for each probe.
35
Supplemental References
Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and
powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57, 289-
300.
Durbin, R. M., Abecasis, G. R., Altshuler, D. L., Auton, A., Brooks, L. D., Gibbs, R. A., Hurles, M.
E., and McVean, G. A. (2010). A map of human genome variation from population-scale
sequencing. Nature 467, 1061-1073.
Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998). Base-calling of automated sequencer
traces using phred. I. Accuracy assessment. Genome Res 8, 175-185.
Goya, R., Sun, M. G., Morin, R. D., Leung, G., Ha, G., Wiegand, K. C., Senz, J., Crisan, A.,
Marra, M. A., Hirst, M., et al. (2010). SNVMix: predicting single nucleotide variants from next-
generation sequencing of tumors. Bioinformatics 26, 730-736.
Levine, R. L., Wadleigh, M., Cools, J., Ebert, B. L., Wernig, G., Huntly, B. J., Boggon, T. J.,
Wlodarska, I., Clark, J. J., Moore, S., et al. (2005). Activating mutation in the tyrosine kinase
JAK2 in polycythemia vera, essential thrombocythemia, and myeloid metaplasia with
myelofibrosis. Cancer Cell 7, 387-397.
Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics 25, 1754-1760.
Radtke, I., Mullighan, C. G., Ishii, M., Su, X., Cheng, J., Ma, J., Ganti, R., Cai, Z., Goorha, S.,
Pounds, S. B., et al. (2009). Genomic analysis reveals few genetic alterations in pediatric acute
myeloid leukemia. Proceedings of the National Academy of Sciences of the United States of
America 106, 12944-12949.
Robinson, J. T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E. S., Getz, G., and
Mesirov, J. P. (2011). Integrative genomics viewer. Nat Biotechnol 29, 24-26.
36
Wiegand, K. C., Shah, S. P., Al-Agha, O. M., Zhao, Y., Tse, K., Zeng, T., Senz, J., McConechy,
M. K., Anglesio, M. S., Kalloger, S. E., et al. (2010). ARID1A mutations in endometriosis-
associated ovarian carcinomas. N Engl J Med 363, 1532-1543.
Zhang, J., Wheeler, D. A., Yakub, I., Wei, S., Sood, R., Rowe, W., Liu, P. P., Gibbs, R. A., and
Buetow, K. H. (2005). SNPdetector: a software tool for sensitive and accurate SNP detection.
PLoS Comput Biol 1, e53.
Zhang, J., Finney, R. P., Rowe, W., Edmonson, M., Yang, S. H., Dracheva, T., Jen, J.,
Struewing, J. P., and Buetow, K. H. (2007). Systematic analysis of genetic alterations in tumors
using Cancer Genome WorkBench (CGWB). Genome Res 17, 1111-1117.