supplemental information genetic alterations activating kinase … · 2012. 8. 13. · supplemental...

Cancer Cell, Volume 22 Supplemental Information

Genetic Alterations Activating Kinase

and Cytokine Receptor Signaling

in High-Risk Acute Lymphoblastic Leukemia Kathryn G. Roberts, Ryan D. Morin, Jinghui Zhang, Martin Hirst, Yongjun Zhao, Xiaoping Su, Shann-Ching Chen, Debbie Payne-Turner, Michelle L. Churchman, Richard C. Harvey, Xiang Chen, Corynn Kasap, Chunhua Yan, Jared Becksfort, Richard P. Finney, David T. Teachey, Shannon L. Maude, Kane Tse, Richard Moore, Steven Jones, Karen Mungall, Inanc Birol, Michael N. Edmonson, Ying Hu, Kenneth E. Buetow, I-Ming Chen, William L. Carroll, Lei Wei, Jing Ma, Maria Kleppe, Ross L. Levine, Guillermo Garcia-Manero, Eric Larsen, Neil P. Shah, Meenakshi Devidas, Gregory Reaman, Malcolm Smith, Steven W. Paugh, William E. Evans, Stephan A. Grupp, Sima Jeha, Ching-Hon Pui, Daniela S. Gerhard, James R. Downing, Cheryl L. Willman, Mignon Loh, Stephen P. Hunger, Marco A. Marra, and Charles G. Mullighan Inventory of Supplemental Information Supplemental Data Table S1, related to Table 1 Table S2, related to Table 1 Table S3, related to Table 1 Table S4, related to Table 1 Table S5, related to Table 1 Table S6, related to Table 1 Table S7, related to Table 1 Table S8, related to Table 1 Table S9, related to Table 1 Figure S1, related to Figure 1 Figure S2, related to Figure 2 Figure S3, related to Figure 3 Figure S4, related to Figure 4 Figure S5, related to Figure 5 Figure S6, related to Figure 6 Figure S7, related to Figure 7 Supplemental Experimental Procedures Supplemental References

Supplemental Data Table S1, related to Table 1. List of probe sets that identify Ph-like cases by PAM. Provided as an Excel file.

Table S2, related to Table 1. limma gene expression signature of Ph+ and Ph-like B-ALL versus non Ph-like B-ALL samples in P9906. Provided as an Excel file.

Table S3, related to Table 1. Validated fusions detected by mRNA-seq Sample ID Support-

mate pairs Gene 1 Gene 2 Comments Rearrangement

Gene Chrom Location Gene Chrom Location

PAKTAL

29 22 5

STRN3 C12orf35 FAM23A

14 12 10

exon 9 intron 4 exon 3

JAK2 AMN1 MRC1

9 12 10

exon 17 intron 6 exon 2

In-frame Aligned to intron In-frame

STRN3-JAK2*,#

C120rf35-AMN1* FAM23A-MRC1*

PAKKCA

66 6 5

EBF1 SEMA6A DOCK8 PAX5 ZCCHC7

5 5 9 9 9

exon 15 exon 1 exon 3-4 exon 6 exon 2

PDGFRB FEM1C CBWD2 ZCCHC7 PAZ5

5 5 2 9 9

exon 11 exon 2 intron 10 exon 3 exon 7

In-frame No ORF Inversion Inversion, disrupted ORF Inversion, disrupted ORF

EBF1-PDGFRB*,# SEMA6A-FEM1C#

DOCK8-CBWD2* PAX5-ZCCHC7*** ZCCHC7-PAX5***

PAKVKK

15 6 6 17

NUP214 SEMA6A TPM4 TSHZ2

9 5 19 20

exon 34 exon 1 exon 2 intron 2

ABL1 FEM1C KLF2 SLC35A1

9 5 19 6


In-frame No ORF Inversion

NUP214- ABL1*,# SEMA6A- FEM1C# TSHZ2- SLC35A1* TPM4- KLF2#

PALIBN 53 IGH@ 14 EPOR 19 IGH@-EPOR* PAKYEP 33 BCR 22 exon 1 JAK2 9 exon 15 In-frame BCR-JAK2*,# PAMDRM

8 12 8 10

IGH@ OAZ1 TPM4 SEMA6A SLC2A5

14 19 19 1


CRLF2 KLF2 KLF2 FEM1C BTBD7

X/Y 19 19 5 14


No ORF No ORF Inversion

IGH@-CRLF2** OAZ1-KLF2# TPM4-KLF2# SEMA6A-FEM1C# SLC2A5-BTBD7*

PAKKXB

13

IGH@ BTBD7

14 14

intron 10

CRLF2 SLC2A5

X/Y 1

intron 1

Inversion

IGH@-CRLF2** BTBD7-SLC2A5*

PALETF None PAKHZT IGH@ 14 CRLF2 X/Y IGH@-CRLF2** PALJDL

5 ZNF292 6 exon 1

SYNCRIP 6 exon 2

No ORF

ZNF292- SLC2A5*

PANNGL 16 PAX5 9 exon 5 JAK2 9 exon 19 In-frame PAX5-JAK2* PANSFD 28 ETV6 12 exon 5 ABL1 9 exon 2 In-frame ETV6-ABL1* PANEHF 3

76 RCSD1 ABL1

1 9

exon 3 exon 3

ABL1 RCSD1

9 1

exon 4 exon 4

In-frame RCSD1-ABL1* ABL1-RCSD1*

SJBALL085 NUP214 9

exon 34

ABL1

9

exon 3

In-frame

NUP214- ABL1*,#

SJBALL010 4 RANBP2 2 exon 18 ABL1 9 exon 2 In-frame RANBP2-ABL1*

*detected by deFuse; #detected by Mosaik; **previously identified (Harvey et al., 2010a); ***detected by Trans-ABySS; ORF, open reading frame. Cytokine receptor and kinase-activating fusions highlighted in bold.

Table S4, related to Table 1. Somatic single nucleotide variants (SNVs) and insertion/deletion mutations identified by mRNA-seq Sample ID Gene Chr Position Sequence change Amino acid change PAKHZT JMJD3 chr17 7691628 T>C S433P RAB11B chr19 8373013 G>A A94T JAK2* chr9 5079702 G>A R867Q OFD1 chrX 13696806 T>C S531P;S853P;S993P PAKVKK ILF3 chr19 10655385 G>A R642Q;R646Q NSMAF chr8 59682919 C>T R241H

PAX5** IKZF1**

chr9 chr7

37010775 50435466

C>G +CCTCCCC

G24R S402fs

PAKKXB TP53 chr17 7514718 T>A N345I

ATRX IKZF1*

chrX chr7

76805676 50411913

G>T -G

S1256;S1286;S131; S1324;S1357 L117fs

PALIBN ZNF12 chr7 6697396 G>A H494Y;H529Y;H530Y; H568Y

PHF8 chrX 54065429 C>G R94T;R130T PALJDL VCL chr10 75544616 G>A A1003T;A1071T WDR45L chr17 78167098 C>T D283N;D341N MOV10 chr1 113042873 G>A R785H;R841H

HECA IL7R

chr6 chr5

139529155 40940849

C>T -TTAATTACGC

P105S L242>FPGVC

PAMDRM

LGALS8 JAK2 MKI67 HDAC7

chr1 chr9 chr10 chr12

234768983 5068357 129793424 46477155

G>A +GGACCC -GAG +T

V105M;V106M GPinsI682 S2223_P2224>S L604

PALETF LYSMD4 chr15 98087051 C>A V231F;V232F RAD51C chr17 54129161 C>A D171E DMPK chr19 50975027 G>A P44L ADNP chr20 48942253 G>A S802F TSHZ2 chr20 51304885 C>T P494L

APC FLT3

chr5 chr13

112204169 27506253

T>C +ATTTTGGAAAGTAACATAAGAGATCATATTCATATTGTCTGAAATCAATGTAGAAGTACTCCCAATTTT

L1660P KWEFPRENEYFYVDFREYEYDLinsL604

SJBALL085 IKZF1 chr7 50411801 +GCTCAGG A79fs

SJBALL010

CREBBP

IFITM3

MPST

chr16

chr11

chr22

3739628

310606

35750701

Splice

G>T

G>A

Exon 18

P70T

E167K

*, Identified previously (Mullighan et al., 2009c); **, Identified previously (Mullighan et al., 2009b). Multiple SNVs for each gene represent different isoforms.

Table S5, related to Table 1. Validated somatic insertions/deletions and SNVs for PALJDL detected by WGS

Gene Accession Chr Position* Class Sequence change

Amino acid change Comments

IL7R NM_002185.2 chr5 35910329 insertion A>CCCGGGGGTCTGC L242>FPGVC Confirmed by mRNA-seq data

ZNF468 NM_199132 chr19 58036999 frameshift +GGG,-A A67fs Confirmed by mRNA-seq data

CGNL1 NM_032866 chr15 55608184 missense A>T A1027V Low coverage of locus in mRNA-seq

COL4A1 NM_001845 chr13 109628259 missense C>T V883I Low coverage of locus in mRNA-seq

HECA NM_016217 chr6 139529155 missense C>T P105S Confirmed by mRNA-seq data

MOV10 NM_020963 chr1 113042873 missense G>A R841H Confirmed by mRNA-seq data

OR8K1 NM_001002907 chr11 55870928 missense G>C A280P Low coverage of locus in mRNA-seq

PDZD2 NM_178140 chr5 32019253 missense G>A A238T Low coverage of locus in mRNA-seq

PPFIA2 NM_003625 chr12 80212905 missense C>T R922Q Low coverage of locus in mRNA-seq

ROBO1 NM_133631 chr3 78767718 missense C>A G1051C Low coverage of locus in mRNA-seq

VCL NM_014000 chr10 75544616 missense G>A A1071T Confirmed by mRNA-seq data

WDR45L NM_019613 chr17 78167098 missense C>T D341N Confirmed by mRNA-seq data

*aligned to human reference genome 18.

Table S6, related to Table 1. Validated somatic SNVs for PALETF detected by WGS

Gene mRNA_acc Chr Position* Class Sequence change

Amino acid Change Comments

ADNP NM_015339 chr20 48942253 missense G>A S802F Confirmed by mRNA-seq data

ALPK2 NM_052947 chr18 54335283 missense C>T G1926E Low coverage of locus in mRNA-seq

APC NM_001127511 chr5 112204169 missense T>C L1660P Confirmed by mRNA-seq data

ARSI NM_001012301 chr5 149657296 missense C>T V462M Low coverage of locus in mRNA-seq

ATP7B NM_000053 chr13 51437128 missense A>G S584P Low coverage of locus in mRNA-seq

DMPK NM_001081560 chr19 50975027 missense G>A P44L Confirmed by mRNA-seq data

IL7 NM_000880 chr8 79811310 missense C>T G123E Low coverage of locus in mRNA-seq

KRT1 NM_006121 chr12 51356339 missense C>T G488R Low coverage of locus in mRNA-seq

LYSMD4 NM_152449 chr15 98087051 missense C>A V232F Confirmed by mRNA-seq data

MAGI1 NM_015520 chr3 65322046 missense C>T D1196N Low coverage of locus in mRNA-seq

MYOF NM_013451 chr10 95147017 missense T>G K437N Low coverage of locus in mRNA-seq

MYOF NM_013451 chr10 95147027 missense A>T V434E Low coverage of locus in mRNA-seq

PAPPA2 NM_020318 chr1 174830357 missense G>A G332R Low coverage of locus in mRNA-seq

PDE4B NM_001037341 chr1 66156917 missense C>T S31F Low coverage of locus in mRNA-seq

RAD51C NM_058216 chr17 54129161 missense C>A D171E Confirmed by mRNA-seq data

RHAG NM_000324 chr6 49691341 missense C>T E199K Low coverage of locus in mRNA-seq

SPO11 NM_012444 chr20 55342171 missense C>T S81F; S119F Low coverage of locus in mRNA-seq

TSHZ2 NM_173485 chr20 51304885 missense C>T P494L Confirmed by mRNA-seq data

ZNF280A NM_080740 chr22 21199339 missense T>G T206P Low coverage of locus in mRNA-seq *aligned to human reference genome 18.

Table S7, related to Table 1. Somatic deletions for PALJDL and PALETF detected by WGS

Gene 1 Gene 2 Size Comments Sample ID Gene Chrom Position* Gene Chrom Position (bp) PALJDL NCR** 12 110325422 SH2B3 12 110361365 35,943 Deletes exons 1-2 of SH2B3 ELF1** 13 40448351 NCR 13 40489525 41,171 Deletes exon 1 of ELF1 NCR** 12 63246783 RASSF3 12 63291396 44,613 Deletes exon 1 of RASSF3 CDK6** 7 92281190 CDK6 7 92301324 20,134 Deletes exon 1 of CDK6 ARRDC5

19

4836226

UHRF1

19

4873165

36,939

Deletes ARRDC5 and removes exon 1 of UHRF1

CCNL1** 3 158360591 NCR 3 158375967 15,376 Deletes 5’ UTR of CCNL1 PALETF NCOA6 20 32853566 NCOA6 20 32876123 22,557 Deletes exon 1 of NCOA6 ARL8B**

3

5194887

EDEM1

3

5204877

9,990

Predicted fusion between ARL8B and EDEM1

CDH2** 15 91176886 NCR 15 91259725 82,839 Deletes exons 1-2 of CDH2 Deletions identified by copy number variant analysis using CONSERTING and CREST that were not detected by SNP array. These algorithms also identified all lesions determined by SNP array analysis. NCR, non-coding region; UTR, untranslated region. *aligned to human reference genome 18; **validated by genomic PCR and Sanger sequencing.

Table S8, related to Table 1. Summary of Ph+ cases and Ph-like prediction by PAM for the COG cohorts P9906, AALL0232 and St Jude Children’s Research Hospital Total XV

P9906

N (%) AALL0232_1

N (%) AALL0232_2

N (%) Total XV

N (%) Total cases 207 283 325 342

Ph+ 0 14 (4.9) 21 (6.5) 7 (2.1)

Ph-like 43 (20.8) 40 (14.1) 42 (13) 33 (9.6)

AALL0232_1 and AALL0232_2 are consecutively enrolled groups from the COG high-risk AALL0232 trial.

Table S9, related to Table 1. Description of COG P9906 and AALL0232 cohorts. Provided as an Excel file. Tabulated information outlining PAM prediction and ROSE clustering determining Ph-like cases, B-cell pathway and kinase activating lesions, and kinase expression (PDGFRB, JAK2, ABL1 and CRLF2) by gene expression profiling using U133 Plus 2.0 array. Column definition is listed below: A: Cohort – P9906 or AALL0232 B: Sample ID - *, mRNA-seq index cases; C: TARGET ID - nomenclature is consistent with previous publications (Mullighan et al., 2009b; Mullighan et al., 2009c; Harvey et al., 2010a) D: BCR-ABL1 status – Positive or negative E: PAM prediction – Ph-like or non Ph-like F: PAM Coefficient > 0.5=Ph-like. Samples are ranked in descending order for both cohorts G: ROSE clustering classification H: Group – TCF3-PBX1, ETV6-RUNX1, BCR-ABL1 positive, MLL rearranged, CRLF2 rearranged or unknown (other) I: B-cell pathway lesions in IKZF1, PAX5, EBF1 or CDKN2A deletion J: CRLF2 - Rearrangements in CRLF2 (IGH@-CRLF2 or PR2Y8-CRLF2) detected previously for P9906 (Harvey et al., 2010a), or over-expression indicating a rearrangement in AALL0232. K: JAK mut: - Mutation in JAK genes detected previously for P9906 (Mullighan et al., 2009c) and identified recently for AALL0232. L: Other kinase-activating lesion: Rearrangements and sequence mutations affecting kinase or cytokine signaling detected by mRNA-seq or whole genome sequencing analysis in the current study. M-R: Gene expression profiling of PDGFRB (M), JAK2 (N-P), ABL1 (Q) and CRLF2 (R)

10

Figure S1, related to Figure 1. Summary of COG ALL recruitment and frequency of Ph-like cases. Approximately two-thirds of COG cases are classified as standard risk, and one third are enrolled onto high-risk trials (P9906, AALL0232). Of the high-risk cases, 15-20% are classified Ph-like, determined by PAM. Within this Ph-like group, 50% harbor rearrangements of CRLF2, with 30% of these expressing concomitant JAK mutations, and the other 20% with unidentified lesions. From the current study, all 15 cases subjected to mRNA-seq harbored rearrangements or mutations affecting kinase and cytokine signaling. Recurrence testing of novel fusions in additional P9906 was prevented by limited availability of RNA. Consequently, the subsequent COG trial of high-risk B-ALL (AALL0232) was used, and the EBF1-PDGFRB fusion was detected in 3/40 (8%) of Ph-like cases. We also identified additional ABL1 and JAK2 rearrangements in this cohort using mRNA-seq. Furthermore, IL7R insertion/deletion mutations were detected in 5/42 (12%) of the P9906 Ph-like cases, but notably, not in AALL0232. No somatic SH2B3 sequence mutations were identified.

11

A

B

C

D

12

E

F

G

H

13

Figure S2, related to Figure 2. Additional fusion validation by RT-PCR RT-PCR and sequencing validation of additional fusions identified by mRNA-seq analysis. Representative RT-PCR gel and sequencing for (A) SEMA6A-FEM1C (S/F) from case PAKKCA; (B) the in-frame fusion TPM4-KLF2 (T/K) from case PAKVKK; (C) OAZ1-KLF2 (O/K) from case PAMDRM; (D) the in-frame fusions FAM23A-MRC1 (F/M) and C12orf35-AMN1 (C/A) from case PAKTAL; (E) the interchromosomal translocation TSHZ2-SLC35A1 (T/S) from case PAKVKK; (F) for ZNF292-SYNCRIP (Z/S) from case PALJDL; (G) the interchromosomal translocation DOCK8-CBWD2 (D/C) from case PAKKCA; and (H) the reciprocal inversion PAX5-ZCCHC7 (P/Z) and ZCCHC7-PAX5 (Z/P), which disrupts the open reading frame of PAX5, from case PAKKCA. NTC = non-template control. (I) Inferred log2 ratio copy number data from SNP array showing gain of one DNA copy between NUP214 and ABL1 in PAKVKK. Each vertical red line indicates log ratio copy number state for a single probe set on the 500K SNP array. (J) Sequencing validation for two additional NUP214-ABL1 cases from the P9906 cohort.

15

. Figure S3, related to Figure 3. Paired-end mRNA-seq reads aligning to EBF1 and PDGFRB and FISH confirmation of EBF1-PDGFRB in case PAKKCA (A) Paired-end mRNA-seq reads aligning to EBF1 and PDGFRB on chromosome 5q32 for case PAKKCA. The reads aligning to intron 15 of EBF1 and intron 10 of PDGFRB correspond to the genomic breakpoint, and the reads aligning to exon 15 of EBF1 and exon 11 of PDGFRB correspond to the in-frame fusion point. (B)The bacterial artificial chromosome (BAC) clone on chromosome 5 telomeric of EBF1 (RP11-583A20, red). (C) Two BAC clones flanking PDGFRB on chromosome 5; RP11-1079A8 (centromeric, green) and RP11-759G10 (telomeric, red). (D) PDGFRB break-apart assay using probes in B showing loss of the telomeric probe on one chromosome (arrow), due to the deletion between EBF1 and PDGFRB. (E) Colocalization assay showing the fusion signal (arrow) between the telomeric EBF1 clone (red) and centromeric PDGFRB clone (green), Normal signals are close together, but not fused. The fusion was detected in over 95% of cells analyzed, indicating that the EBF1-PDGFRB fusion is present in the predominant clone at diagnosis.

B

C

D

E

17

Figure S4, related to Figure 4. Multiple JAK2 breakpoints and genomic mapping of BCR-JAK2 in case PAKYEP. Soft-clipped reads showing two JAK2 breakpoints for the BCR-JAK2 (B/J) fusion in case PAKYEP. (A) Reads aligning to BCR are shown in black letters, with soft-clipped reads detected by CREST aligning to JAK2 (highlighted in blue) and represent either exon 15 (blue underline) or exon 17 (green underline) of JAK2. RT-PCR showing two bands corresponding to JAK2 exon 15 fusion (top band) or JAK2 exon 17 fusion (bottom band). Sequencing validation showing BCR fused to JAK2 exon 15 (left) and exon 17 (right). The red amino acid is changed from the wild-type sequence. (B) Bambino view of mRNA-seq split reads showing the genomic breakpoint of BCR at chr22:21905862 to JAK2 intron 14 (soft-clipped reads) and breakpoint of JAK2 at chr9:5066708 to BCR intron 1 (soft-clipped reads). (C) Genomic PCR and sequencing confirming the breakpoint between BCR and JAK2. Note microhomology of 2bp (CA) at the two breakpoints, which can be aligned to either BCR or JAK2.

B

C

18

Figure S5, related to Figure 5. Genomic PCR and Sanger sequencing validation of additional deletions in PALJDL. (A) Genomic PCR gel. (B) Deletion between non-coding region on chr12:63246783 and RASSF3 (chr12;63291396), which deletes exon 1 of RASSF3. (C) Deletion between ELF1 (chr13:40448351) and non-coding region (chr13:40489525) that deletes exon 1 of ELF1. (D) Deletion between chr7:92281190 and chr7: 92301324 which deletes exon 1 of CDK6. Alignment based on human reference genome 18.

A

B

C

D

19

Figure S6, related to Figure 6. Phosphosignaling analysis in non-Ph-like ALL cases. Primary leukemic cells were thawed, treated with or without dasatinib (100nM) and ruxolitinib (1µM) for 1 hr and stained for pSTAT5 and pCRKL according to Experimental Procedures.

20

Figure S7, related to Figure 7. Modeling EBF1-PDGFRB in vitro. (A) Ba/F3 EBF1-PDGRB cells are sensitive to dasatinib and dovitinib. No cytotoxic effects were observed with cells maintained in factor indicating that imatinib specifically targets the activated PDGFRB and ABL1 kinases. Error bars represent mean + SD of three independent experiments. (B) Imatinib inhibits phosphorylation of the EBF1-PDGFRB fusion protein. (C) pAKT and pERK1/2 are constitutively activated in EBF1-PDGFRB expressing cells, and signaling is inhibited by dasatinib (100 nM).

A

B

C

pAKT pERK

pAKT pERK

21

Supplemental Experimental Procedures

Patients samples and gene expression profiling

Ten Ph-like ALL cases from the COG P9906 high-risk B-ALL study (Bowman et al., 2011), three

cases enrolled on the high-risk COG AALL0232 study (ClinicalTrials.gov Identifier

NCT00075725) and two cases treated on the St Jude Children’s Research Hospital Total XV

(Pui et al., 2009) and Total XVI protocols (ClinicalTrials.gov Identifier NCT00137111 and

NCT00549848, respectively) were selected for mRNA-seq based on a similar gene expression

profile to Ph+ ALL, as determined by ROSE clustering (Harvey et al., 2010b) and PAM

(Tibshirani et al., 2002), and the availability of suitable genomic material. This selection included

a range of cases with variable Ph-like expression signature from strongest to weakest (see PAM

coefficient details in Table S9). Cases were initially chosen from P9906, as the description of

Ph-like ALL was first reported in this cohort (Mullighan et al., 2009b). Details of the P9906

cohort, and prior genomic analyses performed in this cohort, have been described previously

(Mullighan et al., 2009b; Mullighan et al., 2009c; Harvery et al., 20010a; Zhang et al., 2011).

All P9906 patients were classified as high-risk based on the presence of central nervous system

or testicular disease, MLL rearrangement, or based on age, sex, and leukocyte count at

diagnosis. BCR-ABL1 and hypodiploid ALL patients, in addition to those who experienced

primary induction failure were excluded. Cases with high hyperdiploid (as defined by trisomy of

chromosomes 4 and 10 on cytogenetic analysis) or ETV6-RUNX1 cases were excluded unless

central nervous system or testicular involvement was present at diagnosis. A total of 207

enrolled cases had suitable material for 500K SNP microarrays and U133 Plus 2.0 gene

expression microarrays (Affymetrix).

WGS of matched non tumor DNA obtained from remission bone marrow at day 29, or at a

subsequent remission timepoint after commencement of remission-induction therapy, was

performed for all cases. WGS of leukemic cell DNA was also performed for two cases that

22

lacked a chromosomal rearrangement identified by mRNA-seq and prior genomic analyses

(Harvey et al., 2010b). Due to limited availability of RNA material from the P9906 cohort,

recurrence testing of the ABL1, JAK2 and PDGFRB rearrangements was performed in a

separate cohort of B-ALL patients enrolled on the COG AALL0232 study.

All AALL0232 B-ALL patients were diagnosed with National Cancer Institute high-risk ALL

based on WBC count >50x109/L or age >10 years at presentation, prior steroid therapy, or the

presence of testicular disease. The average age at diagnosis was 10.0 + 5.8 years. Twenty-

eight cases (9.9%) were hyperdiploid, 20, (7.1%) were ETV6-RUNX1-positive, 17 (6.0%) were

TCF3-PBX1-positive, 14 (4.9%) were BCR-ABL1-positive, 5 (1.7%) harbored MLL

rearrangements and 199 cases (70.3%) lacked a known chromosomal abnormality. All samples

were obtained with patient or parent/guardian provided informed consent under protocols

approved by the Institutional Review Board at each COG institution. The clinical study was

approved by the National Cancer Institute and appropriate Institutional Review Boards. A total of

231 BCR-ABL1-negative patients had available RNA for RT-PCR.

Recurrence screening for each fusion was also performed on 23 JAK2/MPL-negative MPN

samples from the Harvard myeloproliferative disorders study (age range 35-81), including 13

with polycythemia vera, 5 with essential thrombocythemia and 2 with myelofibrosis (Levine et al.,

2005). In addition, 25 CMML samples obtained from the MD Anderson Cancer Centre (age

range 61-88), and 44 pediatric AML samples (16 cases with normal karyotypes, 18 with

miscellaneous or non-recurrent cytogenetic alterations, 5 inv(16) and 5 t(8;21)) from St Jude

Children’s Research Hospital (Radtke et al., 2009) were also included (age range 3-21).

Sequencing for IL7R and SH2B3 mutations was performed using whole genome amplified

leukemic DNA from the P9906 and AALL0232 high-risk ALL cohorts. All samples were obtained

23

with patient or parent/guardian provided informed consent under protocols approved by the

Institutional Review Board at each COG institution.

Gene expression profiling was performed using U133 Plus 2.0 arrays for P9906 (N=207), and

AALL0232 (N=608) (Affymetrix). Expression signals were normalized by MAS 5.0 algorithm.

Probe sets lacking present calls for every sample were excluded, and signal intensities with

values less than 2 were set to 2. Signals were then log2 transformed for subsequent analysis

(Mullighan et al., 2009b). To identify Ph+ and Ph-like cases, we trained PAM using the second

consecutively recruited subgroup of AALL0232 cases (N=325) to detect Ph+ and Ph-like cases

in the first subgroup of AALL0232 cases (N=283). The PAM predictor containing 257 probe sets

(Table S1) was obtained through cross validation analysis at a threshold of 2.2, and correctly

identified 13 of 14 Ph+ AALL0232_1 cases and classified 40/283 (15%) as Ph-like, determined

by a PAM coefficient greater than 0.5. The same training conditions were applied to the P9906

cohort, with 43/203 cases (21%) classified as Ph-like (Table S9). To identify differentially

expressed genes in Ph-like cases, limma (Linear Models for Microarray Analysis) (Smyth, 2004)

with estimation of fold-change and false discovery rate was also performed (Benjamini and

Hochberg, 1995) (Table S2).

mRNA-seq library preparation and sequencing

mRNA-seq was performed as previously described (Morin et al., 2010) with modifications. Total

RNA was extracted from leukemic cells obtained from bone marrow aspirates or peripherial

blood using TRIzol (Life Technologies). Poly(A)+ RNA was enriched from 5-10 µg of DNAse 1-

treated total RNA using the MACS mRNA isolation kit (Miltenyi Biotec). Double-stranded cDNA

was synthesized from the purified poly(A)+ RNA using the Superscript Double-Stranded cDNA

Synthesis kit (Life Technologies) and random hexamer primers (Life Technologies) at a

concentration of 5 μM. The cDNA was fragmented by sonication and a paired-end sequencing

library prepared following the paired-end library preparation protocol (Illumina). For mRNA-seq

24

library sequencing, clusters were generated on the Illumina cluster station and paired-end

sequence reads were generated using v3-v5 sequencing reagents on the Illumina GAIIx and

HiSeq 2000 platforms following the manufacturer's instructions. Read length summary for each

case is provided in below. Image analysis, base-calling and error calibration were performed

using v1.0, v1.3.2, v1.5.0 and v1.6.0 of Illumina's Genome analysis pipeline.

Read summary information for mRNA-seq data and matched constitutional DNA

*, samples sequenced on the HiSeq 2000.

Sample ID Sample type Number of lanes x read length (bp) Total reads

PAKTAL Tumor Normal

1x36, 6x50 14x50

144257180 352797054

PAKKCA Tumor NormaL

4x76 4x76

168403164 198063899

PAKVKK Tumor Normal

1x36, 6x50 14x50

144981646 352797054

PALIBN Tumor Normal

4x76 4x76

166126640 140487148

PAKYEP Tumor Normal

6x76 4x76

228085558 145008762

PAMDRM Tumor Normal

4x76 4x76

177659630 166810882

PAKKXB Tumor Normal

5x76 4x76

233574978 199671920

PALETF Tumor Normal

4x76 4x76

130810808 181507216

PAKHZT Tumor Normal

1x36, 6x50 14x50

175124596 401590800

PALJDL Tumor Normal

4x76 4x76

161808726 171312172

PANNGL Tumor 1/3x100* 92019240 PANSFD Tumor 1/3x100* 85436840 PANEHF Tumor 1/3x100* 159676104 SJBALL085 Tumor 1x75 71651522 SJBALL010 Tumor 1x100 70141484

25

Whole genome shotgun library preparation and sequencing

Illumina paired-end whole genome shotgun libraries were prepared from 1 µg of genomic DNA

from COG P9906 cases PALJDL and PALETF (both tumor and matched remission DNA) as

described (Shah et al., 2009). The resulting libraries were sequenced on the llumina GAIIx

platform using v5 paired-end 36-100bp sequence chemistry following the manufacturer's

instructions. For eight of the nine remaining cases, libraries were prepared from obtained

remission DNA only to aid in identifying inherited variants. Image analysis, base-calling and

error calibration were performed using v1.4.0, v1.5.0 and v.1.8.0 of Illumina's Genome analysis

pipeline.

Detection of SNVs and fusion transcripts from mRNA-seq data

Detection of SNVs and fusion transcripts from mRNA-seq data was performed independently by

BC Genome Sciences Centre (BC) and St. Jude Children’s Research Hospital (SJCRH) using

different approaches as previously described. The results generated from the two institutes

were combined and sent for validation to generate the final candidate fusion list for experimental

validation.

At BC, all reads were aligned to the human reference genome (hg18) or (for mRNA-seq) to a

genome file that was augmented with a set of all exon-exon junction sequences using BWA

version 0.5.4 (Li and Durbin, 2009). mRNA-seq libraries were aligned with an in-house modified

version of BWA that is aware of exon junction reads and considers them when determining

pairing distance in the “sampe” (read pairing) phase of alignment. Candidate SNVs were

identified in the aligned genomic sequence reads and the transcriptome (mRNA-seq) reads

using an approach similar to that have been previously described (Morin et al., 2010). One key

difference in our variant calling in this study was the application of a Bayesian SNV identification

algorithm ('SNVmix') (Goya et al., 2010). The deFuse software (version 0.2.0;

http://compbio.bccrc.ca/software/defuse/) (McPherson et al., 2011) was utilized for the

26

identification of putative fusion transcripts using the hg18 reference genome. Predicted fusion

sequences were subsequently aligned using BLAT and those with numerous high-confidence

alignments were removed. Predicted fusions were further filtered to remove those with less than

2 ‘split reads’ (those that cross the fusion point) and predicted fusions involving adjacent (or

nearby) gene pairs were also removed. The Trans-ABySS pipeline for detecting rearrangements

has been previously described (Robertson et al., 2010).

At SJCRH, Illumina pair-end reads in FASTQ format were aligned against NCBI build 36 of the

human reference genome using Mosaik 0.9 (Marth, 2010). The Mosaik alignment parameters

were: (a) hash size of 15; (b) maximum percentage of the read length that are allowed to be

errors: -mmp 0.05; (c) use the aligned read length instead of the original read length when

counting error; (d) minimum percentage of read length should be aligned: -minp 0.5; (e)

alignment candidate threshold of 35bp; (f) maximum number of hash position of 100; (g)

alignment mode: unique.

Inter- and intra-chromosomal structural variation (SV) detection was carried out using Spanner

(Durbin et al., 2010), which is a classification scheme for structural variants based on distinct

patterns of paired-end read genome coverage. For intra-chromosomal SVs, calls involving

adjacent genes were first filtered out to exclude potential read-through events. The candidate

list was then sorted in descending order by the number of genes between the fusion breakpoints

and higher confidence was given to fusions with larger number of genes separating the

breakpoints. Each SV candidates were also checked to see if the orientations of the supporting

pair-end reads and the orientations of the genes involved were consistent to generate a

potential fusion product.

27

Validation of candidate somatic mutations fusion transcripts

Validation was attempted for each of the candidate point mutations and fusion transcripts

identified in the P9906 and Total XV and XVI cases. For SNV this was accomplished by

designing primers to amplify a 200 to 300 bp region around the targeted variant with one primer

within reach of a single read (<=75 bases). Polymerase chain reactions were set up in 96-well

plates and comprised of 0.5 μM forward primer, 0.5 μM reverse primer, 1-3 ng of gDNA

template, 5X Phusion HF Buffer, 0.2 μM dNTPs, 3% DMSO, and 0.4 units of Phusion DNA

polymerase (New England Biolabs). Reaction plates were cycled on a MJR Peltier

Thermocycler (model PTC-225) with cycling conditions of a denaturation step at 98 °C for 30

sec, followed by 35 cycles of [98°C for 10 sec, 69°C for 15 sec, 72°C for 15 sec] and a final

extension step at 72°C for 10 min. PCR reactions were visualized by SybrGreen (Life

Technologies) in 1.2% agarose (SeaKem LE) gels run for 90min at 170V to assess PCR

success. The resulting amplicons were pooled by patient and template, one for tumor and one

for normal DNA, with equal volumes from each PCR reaction and an indexed Illumina paired-

end sequencing library was constructed from each pool as described (Wiegand et al., 2010).

The resulting library was sequenced using v5 paired-end sequencing reagents on the Illumina

GAiix platform following the manufacturer's instructions. Between the paired 75 base reads a

third 7 base pair read was performed using the following custom sequencing primer to

sequence the hexamer barcode [5’-GATCGGAAGAGCGGTTCAGCAG GAATGCCGAGACCG].

Image analysis, base-calling and error calibration were performed using v1.8.0/ RTA 1.8.70.0 of

Illumina's Genome analysis pipeline. Reads were aligned using BWA, de-multiplexed using their

hexamer sequence, and variants were visually confirmed for validity and somatic status in

integrative genomics viewer (Robinson et al., 2011) (absence from constitutional DNA).

A total of 1257 novel SNVs (candidate mutations) in 88 genes were identified by mRNA-seq

(ranging from 33-270 SNVs per case). PCR was attempted for 1163 of these events that were

28

amenable to PCR and direct amplicon sequencing. After sequencing and alignment, 1048 had

sufficient sequence coverage from both tumor and normal DNA. Of these, 631 variants were

found to be present in the germline and 394 were deemed to be false positives and the

remaining 23 events were somatic mutations. Mutations were annotated on genes using the

Ensembl transcripts (version 54) and those predicted to cause nonsynonymous or nonsense

changes are reported in Table S4.

From the P9906 and Total XV and XVI cases cases profiled by mRNA-seq, 425 distinct putative

insertion/deletion mutations were identified, with a range of 12-100 per case (mean of 38.6 per

case). Of these target loci, a successful PCR was achieved in both tumor and normal DNA in

365 cases. Deep resequencing of these amplicons revealed that 188 of these were present in

the germline, 167 were false calls and 10 were somatic events (Table S4).

Validation of candidate fusion transcripts

To validate fusion transcripts, primers were designed to span 200-300bp either side of the

fusion point and PCR reactions were set up with 0.4 μl cDNA template, 0.2 μM forward primer,

0.2 μM reverse primer, 5X Phusion HF Buffer, 0.2 mM dNTPs and 0.4 units of Phusion DNA

polymerase (New England Biolabs). Reactions were performed on an Eppendorf thermocycler

with cycling conditions consisting of a denaturation step at 98 °C for 1 min, followed by 33

cycles of [98°C for 10 sec, 66°C for 15 sec, 72°C for 15 sec] and a final extension step at 72°C

for 10 min. PCR products were visualized with GelRed (Biotium, Inc.) on a 1.5% agarose gel run

at 110V for 1 hour, purified using the Wizard PCR purification Kit (Promega) and fusion

transcripts were confirmed by direct Sanger sequencing.

Detection and validation of SNV and insertion/deletion sequence mutations from WGS data

Putative sequence variants including SNVs and indels were initially detected by running the

variation detection module of Bambino (Edmonson et al., 2011) as previously described (Zhang

29

et al., 2012). For confirmation of SNVs as somatic, matched tumor and normal DNA were

subject to PCR and Sanger sequencing. For PALJDL, a total of 19 high quality SNVs were

identified using Bambino, 4 of these were located in untranslated regions (UTRs) and three

were silent. The remaining 12 missense mutations were validated by PCR and Sanger

sequencing, with 1 false positive and two with sequencing failure (Table S5). For PALETF, 54

high quality SNVs were detected, with 3 in non-coding regions, 6 silent mutations, 24 UTR

variants and 20 missense mutations, of which 19 were determined as somatic (Table S6).

Sequence analysis of IL7R and SH2B3

PCR of IL7R exon 6 was carried out on whole genome amplified leukemic DNA from the P9906

(N=188) and AALL0232 cohorts (N=248). Exon 2 of SH2B3, a known hotspot for mutations

deletions, was sequenced (Pardanani et al., 2010). Sequencing and subsequent validation was

performed by Beckman Coulter Genomics (Danvers, MA), and sequencing analysis was

conducted as previously reported (Mullighan et al., 2009b). Base calls and quality scores were

determined using the program PHRED (Ewing et al., 1998). Sequence variations including

substitutions and insertion/deletions (indel) were analyzed using the SNPdetector (Zhang et al.,

2005) and the IndelDetector (Zhang et al., 2007) software. A useable read was required to have

at least one 30-bp window in which 90% of the bases have PHRED quality score of at least 30.

Poor quality reads were filtered prior to variation detection. The minimum threshold of

secondary to primary peak ratio for substitution and indel detection was set to be 20% and 10%,

respectively. All sequence variations were annotated using a previously developed variation

annotation pipeline.

Identification of structural variations and copy number variations from WGS data

SVs including inter-chromosomal translocations, intra-chromosomal translocations, inversions,

deletions, and insertions were analyzed by CREST (Wang et al., 2011). CNVs were identified by

evaluating the number of sequence reads aligned at each base using the novel algorithm

30

CONSERTING (COpy Number SEgmentation by Regression Tree In Next-Gen sequencing) as

previously described (Zhang et al., 2012). From this analysis, we identified six somatic deletions

in PALJDL involving SH2B3, RASSF3, ELF1, CDK6, UHRF1 and CCNL1 (Table S7), five of

these were validated by PCR and Sanger sequencing (Figure S5). In addition, the ARL8B-

EDEM1 fusion, and deletions within NCOA6 and CDH2 in PALETF (Table S7) were also

validated (data not shown). All validation and sequencing primers are listed in the tables below.

31

List of primers for fusion validation and genomic mapping.

FUSION Forward Reverse Primer ID Sequence 5' to 3' Primer ID Sequence 5' to 3' BCR-JAK2 Fusion by RT-PCR BCRe1F1 gtgccataagcggcaccggcact JAK2e17R2 tctcctccactgcagatttcccaca Genomic mapping BCRi1F2 tgcttccccacaccgacttcctaaa JAK2i14R2 cagaaatgttttgcgctgctttatc EBF1-PDGFRB Fusion by RT-PCR EBF1e14F2 cacgagcatgaacggatacggctct PDGFRBe13R2 tttcatcgtggcctgagaatggctc Full-length fusion EBF1e1_FL gggggaggagattttccacaagaaaagg PDGFRBe23_FL gggccagccccctacaggaagctat Genomic mapping EBF1i15F1 aacgggaacagcctgcaaggtaagc PDGFRBi10R1 gtactaggaggtccaggcagggctga PDGFRBi10R2 tccttcccctttatctgccacccttc PDGFRBi10R3 aagtggggctcacactggcccttac PDGFRBi10R4 gaggggcttaccttctgccaaagca NUP214-ABL1 NUPe20F1 cagtggccttggaggaaaacccagt ABL1e3R2 tgtagttgcttgggacccagccttg STRN3- JAK2 STRN3e9F1 atgatgagctgccccacatcccttc JAK2e17R3 cggcacatctccacactcccaaaat IGH@-EPOR

IGHJ5F1 EPORe8F1

tttcctgacctccaaaatgcctcca tctggtgctggacaaatggttgctg EPOR_40bpinsR3

IGHV4R1 gggggtggctacgattatagggtca gcaccaactacaacccctccctcaa

RANBP2-ABL1 RANBP2e16F2 tggttctttgcgaaatgcagattca ABL1e3R1 gccatttttggtttgggcttcacac PAX5-JAK2 PAX5e4F1 accaaccagtcccagcttccagtca JAK2e19R3 cggcacatctccacactcccaaaat ETV6-ABL1 ETV6e5F3 cggcactccgtggatttcaaacagt ABL1e3R2 tgtagttgcttgggacccagccttg RCSD1-ABL1 RCSD1e3F3 cagccagtaaaccaacccgaaggaa ABL1e4R3 gcttgttgcgctttggggctgga SEMA6A-FEM1C SEMA6Ae1F2 cccttctccgctcgtcattggagat FEM1Ce2R2 ttgccatcccgagctgcgttaaata OAZ1-KLF2 OAZ1e1F2 acgcagcggaggttttcctggttt KLF2e3R2 gatcgcacagatggcactggaatg TPM4-KLF2 TPM4e1F2 ctcaactccctggaggcggtgaaac KLF2e3R1 gtggcccgtgtgctttcggtagtg FAM23A-MRC1 FAM23Ae2F1 accacactgccctgcctcacctttt MRC1e2R1 tgtttttgatggcactcccaggcata C12orf35-AMN1 C12orf35i4F2 ccttgtgggaatgtcagaaaagtgtca AMN1i6R2 ctgacagccaccacgtccaaccaat ZNF292-SYNCRIP ZNF292e1F2 gacggagcggggtgtgaagatgg SYNCRIPv1e4R2 tggtcccttgtttttctctctgcctgt BTBD7-SLC2A5 BTBD7i10F2 tcgaatgctcctcaagcaggtcactc SLC2A5i1R2 tgggcatgatgtgcacctgtaatcc DOCK8-CBWD2 DOCK8e3F2 accctgtggagccagtggactttga CBWD2i10R2 ctgcgcttcccaagtgaggcaatg TSHZ2-SLC35A TSHZ2i2F1 cacatcggcctcccaaaatgctaga SLC35A1i5R1 tgccagtatgtcatttgctttggtca PAX5-ZCCHC7 PAX5e6F1 ccggaagcagatgcggggagact ZCCHC7e5R2 gggggctggacaggaatacaggaga ZCCHC7-PZX5 ZCCHC7e2F1 tgcccatggtctttcttcttctcttca PAX5e7R1 gcctgtcacaatggggtaggactgc Actin agtgtgacgtggacatccgcaaagac Actin gcttgctgatccacatctgctggaag

32

List of primers for SNV and SV validation

Primer Name Gene Sequence 5’ to 3’ Forward Reverse Single nucleotide variants HS0825_17_7691628

JMJD3

CGCTGACCATTACCAAACTCCC

CAGGCCGGACCCTTCAAC

HS0825_19_8373013 RAB11B TACCGTGGTGCAGTGGGC CTTCCAGACACCTCTCTGATGCC HS0825_9_5079702 JAK2 TTAGGGTAATTTTGGGAGTGTGGAG TTTAAAATAGGTTTCAATGGGCAGC HS0825_X_13696806 OFD1 AAAAGATGGTCCAAGAAGGCTCC TTACAATTTATATCAGGATCAATTTCACAAGTC HS0894_19_10655385 ILF3 CATGCACAACGAAGTGCCC AAAGACAGGGTTACCGGGGTC HS0894_8_59682919 NSMAF AGGCCGTGCCTCCTTTTGTAG GAATGGGAAACCAGATATCACCTTG HS0894_9_37010775 PAX5 CTTTATTTGAAAGATCAAGGGAAGCC TTTCAGGACATGGAGGAGTGAATC HS1533_17_7514718 TP53 CCTGGGCATCCTTGAGTTCC AGTCAGCTGTATAGGTACTTGAAGTGCAG HS1533_X_76805676 ATRX GTCTGTATCTTGGCTTCTTAGATTCTTCAG TGTCAAGTCTGATGTGTGAGATTACCTG HS1535_7_6697396 ZNF12 ATTCATAGGGCTTCTCTCCTGAATG CTCCCAGTTGTCATACCTCACTATCC HS1535_X_54065429 PHF8 CAGGGTTCCTCACCTGTCAAAAG GATTGGAAGAGAAGGATCTGCTGAG HS1536_10_75544616 VCL ACCATAAGCACCCAGCTCAAAATC ACACAGACTTTCCCCTCCTGAGAC HS1536_17_78167098 WDR45L GAGCTGGTCACAGCCACTGAG AGTGCATCCGAGATGTCTACGC HS1536_1_113042873 MOV10 CCTCCTTCCAGGTGGAGAAAATC AGCTCTATTCAAGGCAGCAGGG HS1536_6_139529155 HECA TGCCACTCCCCTGATCTGC TAGCCCTTCTTTGTCCACATGTTC HS1576_1_234768983 LGALS8 ACACGCCTTTCAAAAGAGAAAAGTC CACATTTAGGTGCCTTCCTGGTC HS1537_15_98087051 LYSMD4 TTAAAGTAGACCAAATAAAAGACAGGCAAG TTTAAGGGGATTGACCAGGATATTG HS1537_17_54129161 RAD51C GAGGGAAGTTTTATGGTTGATAGAGTGG AGGCTGTGGCATTTCTCATTTTG HS1537_19_50975027 DMPK AAGAACCGAGGGTCACCAGAAAG GAGCCCTCTGGGCCAATG HS1537_20_48942253 ADNP TCACGGACACACTTCTTCCTTTTG CTTTGAAGAGAAGCCTGAAGAGCC HS1537_20_51304885 TSHZ2 CGAGGACTATGAAGATCCTCTACAAAAAC GGGATCCCATAGGCAAAGGC HS1537_5_112204169 APC CACCTATAAACTTTTCCACAGCTACATCTC TGCAAGAATATCACCTTCCTCTGC HS1536_15_55608184 CGNL1 CTGTAGCTGGAAAACCTCCTCTTTTTAC TCAATCAAGAAATAACCTCACTACACCC HS1536_13_109628259 COL4A1 CAGCCTTCTGCTTGATGTTCCTAAC TAAAGGAGATAAAGGGGCTCAAGGAC HS1536_139529155 HECA_6 AGAAATCGGTATGAGACCGTCTATGTTC GAGCCAGACTTCTTCTTTTTCTCGTC HS1536_16_30435856 ITGAL TCTAGTGGAGATGCAGACATCCAAG ATTCATCCATCCTTACTCTCCATACACC HS1536_1_113042873 MOV10 GATGGTACCTCCTTAATATCCAGACCAC CTTAAGTTGGAATCTCCAGGAGTGAGAG HS1536_11_55870928 OR8K1 ATGTCCATACTCTGTTCTGACACAAATG GTCCAGAAAAAGAGGAAAATCTAAGCAG HS1536_5_32019253 PDZD2 CTTTATCTACCTGATCATGCTGCGTC GTTTGTCATTGCTTGTGGTCAGAG HS1536_12_80212905 PPFIA2 ATTTTGCTCTGTAGACTTAGACACACGG AGCATGTCTCAATTTCAAATCTTGTTTC HS1536_3_78767718 ROBO1 ACTCTTTGACACTGGAAATTTTGAAACC AATTTAAAAGGAATAACCACGCAGAATG HS1536_10_75544616 VCL TTTCTGGAGAAATGGATTGTACTGACC GTATCTATCCAACACAGACTTTCCCCTC

33

Primer Name Gene Sequence 5’ to 3’ Forward Reverse HS1536_17_78167098 WDR45L CTGGTCACAGCCACTGAGTTACCTAC ATGCTCACACCAAGACAGTCAGG HS1536_19_58036999 ZNF468 TAAGAGTGAGCTGCAATTAAAGGATTTG CTCACGTTTGGGAAGTTGAAAATAAG HS1537_20_48942253 ADNP TTCTTCCAAATTTTCAAAACTGTCTGAG GTTAGATGATGATAGTGATTCACCCAGC HS1537_18_54335283 ALPK2 GTCTGATTTTCATCTCGTCTTATCATGG TTAGCTATGCCTCTTTTGAAGGGTTAAG HS1537_5_112204169 APC GAATGTATTATTTCTGCCATGCCAAC TAATGCATTCTGCAAGAATATCACCTTC HS1537_5_149657296 ARSI TTGAGTTTACGGAAAAAGGATCGAAG GGCTAGATGGCTACGACGTGTG HS1537_13_51437128 ATP7B CATGGGAAAAGTTGAAGAATTTTGG TGTTTTGGGGAAGAGCTACTCTTTAGTC HS1537_19_50975027 DMPK TAGGTTCTAAGGCTCGGTCATTCATC TTATTTCCTTCTCCCCTTGTTCTTTAGG HS1537_8_79811310 IL7 ATCATGCTTTTGTATTTGCCCTAACAG TTATGGATTCTGGGTATTCAGAAGACTG HS1537_12_51356339 KRT1 TAAAGAATAATTTGCTCCACCTCAAAGC AGGGTGTCATATTCTTTTTCAGATCTCC HS1537_15_98087051 LYSMD4 ATCAACAGCCAATTGCTAAAGCTG ACAAAGAACTGAAACCCCTTCTGAG HS1537_3_65322046 MAGI1 AGAGGCTGGTAAGACACTGGTGAATAG CCTTTCTCATGTGATTTGGCATATATGTAG HS1537_10_95147017 MYOF GTTCCCCTTCTTTTGTTTATGAGCAG TACACTTTTGCTTTGGTGATTTTACAGG HS1537_1_174830357 PAPPA2 AGAAATCTGTGTCGCTTACTTTATTCCC GAAATAGTGCCCATCCTCAGAGC HS1537_1_66156917 PDE4B AAAAGCTGAGAATCAGCCTCTTCTATTG TCAATCATCTTAGGTATTTCACAGAAAACTC HS1537_17_54129161 RAD51C TCATGAAGAGTTAGACATTTCTGTTGCC TGTGGCATTTCTCATTTTGTAACAGTAG HS1537_6_49691341 RHAG AGAGGAGAATTGAGCCAGCATAAAGTAG AAAGTCACATTGCAAGGAAGTAACTGAC HS1537_20_55342171 SPO11 ACATTTTTCAAGGTATATGGTTCAGTGC TTGAAGAAAGCTTTAACTGGATGAAAAC HS1537_20_51304885 TSHZ2 ACATGATGGTCACAGGTCACTTTCTC ATAGAAACCAGTGGCATCACTTTCC HS1537_18_71042423 ZADH2 GGAGGAGAAGGAGAGACCTATTCATTAAC GGTTTATCTCTGGCTACCAAACTCCTAC HS1537_22_21199339 ZNF280A GCTAAGTAACACGATGGGATTTTCTTTC ATAAAATGAGCTCACCACAAGTTGTTTC SJBALL010_16_3739806 CREBBP GGGGCCATCATGTCTTTTTGTTTGAA ACACCAGAAATTCCACTTACGGCAACA Structural variations HS1536_5_35910359

IL7R

TGCAAAGCACCCTGAGACCCTACCC

CCGTGATCCCACACAATCACCCTCT

HS1536_12_110325422 SH2B3 CCGGGAGCTACGGATCAGTCATGG GAGACCAGCCTGACCAACATGGAGA HS1536_13_40448351 ELF1 TGTGGTCAAGACAGCTGTGGGCAAG CAAGAGGCACACAAATAAGCGGCATAA HS1536_12_632246783 RASSF3 CCTGGATGCCTGAAAAGGTCCTGAA CTAGCACCGGGCTCCTCAGTCCAG HS1536_7_92281190 CDK6 CCCCCTTAAAAACGGCTAAGCAGCA GCAGATGTCTGACCACCCCTTCTCG Recurrence screening

SH2B3F1 SH2B3F2

CGGTGTGTAATGGGGCCTACACCTG GGCCCTGCTCCTTCCAGCACTTTC

GGCGAAAGTGCTGGAAGGAGCAG ACCAGCTGGAAAGCCATCACACCTC

34

Fluorescence in-situ hybridization

Fluorescence in-situ hybridization (FISH) analyses to confirm the PDGFRB rearrangement and

EBF1-PDGFRB fusion, and to detect the rearrangement between the IGH@ and EPOR loci was

performed on cells stored in Carnoy fixative as previously described (Mullighan et al., 2009a;

Harvey et al., 2010a). For PDGFRB rearrangement, a break-apart mixture containing two

bacterial artificial chromosome (BAC) clones consisting of RP11-1079A8 (centromeric to

PDGFRB) labeled with a green probe (Alexa Fluor 488), and a telomeric clone (RP11-759G10)

labeled with a red probe (Alexa Fluor 568) was used to show loss of the telomeric portion of

PDGFRB. An additional mixture was used to show fusion of EBF1 to PDGFRB, and consisted of

clone RP11-583A20 (telomeric to EBF1) labeled with AlexaFluor 568, and clone RP11-1079A8

labeled with Alexa Fluor 488. Disruption of the IGH@ locus was determined using the LSI IGH

Dual Color Break-Apart Rearrangement Probe (Abbott Molecular), which spans 900kb of the

IGHV regions and 250kb of the 3’ region. For EPOR, a probe mixture was used, consisting of 2

BAC clones RP11-1114G9 (centromeric) and RP11-478I13 (telomeric). BAC clones, fluorescent

labels, and nick-translation materials were obtained through Invitrogen. Total BAC DNA was

isolated using a plasmid midi-prep kit (Invitrogen). Slide hybridization and washes were

performed using standard FISH protocols. Slides were counterstained with 4,6- diamidino-2-

phenylindole and analyzed with a Zeiss Axioskop microscope (Zeiss, Germany) equipped with

the appropriate filter combination and a CCD camera, and coupled to the CytoVision image

analysis system. A total of 25 to 200 interphase cells were scored for each probe.

35

Supplemental References

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and

powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B 57, 289-

300.

Durbin, R. M., Abecasis, G. R., Altshuler, D. L., Auton, A., Brooks, L. D., Gibbs, R. A., Hurles, M.

E., and McVean, G. A. (2010). A map of human genome variation from population-scale

sequencing. Nature 467, 1061-1073.

Ewing, B., Hillier, L., Wendl, M. C., and Green, P. (1998). Base-calling of automated sequencer

traces using phred. I. Accuracy assessment. Genome Res 8, 175-185.

Goya, R., Sun, M. G., Morin, R. D., Leung, G., Ha, G., Wiegand, K. C., Senz, J., Crisan, A.,

Marra, M. A., Hirst, M., et al. (2010). SNVMix: predicting single nucleotide variants from next-

generation sequencing of tumors. Bioinformatics 26, 730-736.

Levine, R. L., Wadleigh, M., Cools, J., Ebert, B. L., Wernig, G., Huntly, B. J., Boggon, T. J.,

Wlodarska, I., Clark, J. J., Moore, S., et al. (2005). Activating mutation in the tyrosine kinase

JAK2 in polycythemia vera, essential thrombocythemia, and myeloid metaplasia with

myelofibrosis. Cancer Cell 7, 387-397.

Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler

transform. Bioinformatics 25, 1754-1760.

Radtke, I., Mullighan, C. G., Ishii, M., Su, X., Cheng, J., Ma, J., Ganti, R., Cai, Z., Goorha, S.,

Pounds, S. B., et al. (2009). Genomic analysis reveals few genetic alterations in pediatric acute

myeloid leukemia. Proceedings of the National Academy of Sciences of the United States of

America 106, 12944-12949.

Robinson, J. T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E. S., Getz, G., and

Mesirov, J. P. (2011). Integrative genomics viewer. Nat Biotechnol 29, 24-26.

36

Wiegand, K. C., Shah, S. P., Al-Agha, O. M., Zhao, Y., Tse, K., Zeng, T., Senz, J., McConechy,

M. K., Anglesio, M. S., Kalloger, S. E., et al. (2010). ARID1A mutations in endometriosis-

associated ovarian carcinomas. N Engl J Med 363, 1532-1543.

Zhang, J., Wheeler, D. A., Yakub, I., Wei, S., Sood, R., Rowe, W., Liu, P. P., Gibbs, R. A., and

Buetow, K. H. (2005). SNPdetector: a software tool for sensitive and accurate SNP detection.

PLoS Comput Biol 1, e53.

Zhang, J., Finney, R. P., Rowe, W., Edmonson, M., Yang, S. H., Dracheva, T., Jen, J.,

Struewing, J. P., and Buetow, K. H. (2007). Systematic analysis of genetic alterations in tumors

using Cancer Genome WorkBench (CGWB). Genome Res 17, 1111-1117.

supplemental information genetic alterations activating kinase … · 2012. 8. 13. · supplemental...

Documents