ars.els-cdn.com€¦  · web viewnegative controls included a ntc and -rt sample. ... baits...

29

Click here to load reader

Upload: dangdang

Post on 04-Nov-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

Supplementary Materials and Methods

Patient accrual

Via a non-interventional clinical study, chemo-pretreated and –naïve patients with CRPC,

were recruited between October 2013 and June 2015. All patients had histologically

confirmed prostate adenocarcinoma, castrate levels (<50 ng/dl) of serum testosterone and

bicalutamide withdrawal for at least 6 weeks, conform to EAU guidelines [1]. Due to

shifts in reimbursement criteria and treatment management the studied cohort is

heterogeneous, reflecting current clinical practice. Measures of response and progression,

based on serum PSA levels or radiographic imaging, are defined according to Prostate

Cancer Clinical Trials Working Group 3 criteria and Response Evaluation Criteria in

Solid Tumours version 1.1, respectively [2,3]. Evaluation of clinical criteria is

determined by clinical judgement of the treating physician. A clinical protocol was

reviewed and ethical approval was acquired by the institutional review and ethics board

of GZA Sint-Augustinus. All patients provided a written informed consent document.

Blood collection and processing

Blood collection and processing for circulating tumor cell (CTC) enumeration (within 72

hours) or CTC enrichment for transcriptional analysis (within 8 hours) were performed

using the FDA-cleared CellSearch CTC and Profile technology, respectively, as

described previously [4,5]. Profile-enriched CTC fractions were lysed with 250μL

RNeasy RLT+ buffer (Qiagen BV, The Netherlands) and stored at -80°C until RNA

isolation. Leftover EDTA blood was subjected to red blood cell lysis, to generate white

Page 2: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

blood cell pellets for germline DNA isolation. For genomic analysis of circulating cell-

free DNA (cfDNA), 5 mL of citrate-collected blood was centrifuged within 1 hour,

resulting in 1-1.5 mL of plasma, which was stored at -80°C until cfDNA isolation.

RNA isolation, cDNA synthesis and AR-V preamplification from enriched CTC

fractions

RNA was isolated from CTC lysates with the AllPrep DNA/RNA Micro Kit (Qiagen).

Complementary DNA (cDNA) synthesis and AR-V pre-amplification was performed on

25% of the isolated RNA, using the RevertAidTM H Minus First Strand cDNA and

TaqManTM PreAmp amplification kit, respectively, in a GeneAmp® PCR System 9700

(Life technologies). Pre-amplified cDNA was diluted 15x with 1xTE-buffer.

Full-length AR and AR splice variant (ARV) Multiplex Amplification of Specific

Targets for Resequencing and qRT-PCR validation

Pre-amplified and diluted cDNA was subjected to a 20-cycle multiplex PCR for

amplification, sequence adaptor ligation and indexing of full-length AR and AR splice

variant (ARV) transcripts using Multiplex Amplification of Specific Targets for

Resequencing (MASTR) technology (Multiplicom NV, Belgium). Amplicon libraries

were verified by fragment analysis using Genescan (Applied Biosystems). Pooled

libraries were sequenced on a MiSeq (Illumina) with v2 chemistry (2x251 cycli). qRT-

PCR was used to validate counted AR and ARV transcripts by targeted RNA sequencing

(see Supplementary Table 6). Additionally, 3 housekeeping genes (GUSB, HMBS and

HPRT1) were used to control for sample loading and RNA integrity [5,6]. Epithelial

(EPCAM, CK19) and white blood cell (PTPRC) markers were used to control for

2

Page 3: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

presence of epithelial and WBC content. PCR reactions (40 cycles) were performed using

TaqMan™ Gene Expression Assays with Universal PCR Master Mix No AmpErase

UNG on a 7900HT Fast Real-time PCR System (all Applied Biosystems). A calibrator

(positive control) sample was used in each run to assess inter-run variability. Negative

controls included a NTC and -RT sample. Only samples with a Cq value of <35 for each

of the 3 reference genes, were considered of sufficient quality and quantity.

Isolation of cfDNA, library prep and sequencing strategy

cfDNA isolation was performed using the QiaVac system (Qiagen). Purified cfDNA was

subjected to fragment analysis. Germline DNA from WBCs was isolated using the

QIAamp DNA Blood Mini Kit. Library prep was performed using the ThruPLEX DNA-

seq kit (Rubicon Genomics). 0.1 – 50 ng of cfDNA (median 7.4 ng) and 50 ng of

germline DNA was used to create the libraries. Low-pass whole genome sequencing was

performed on each cfDNA library to enable identification of copy-number variants

(CNVs). The SeqCap EZ system (Roche Nimblegen) was applied for targeted

sequencing. The targeted regions were designed to capture unique regions in the human

genome commonly mutated in prostate cancer, identified through extensive literature

review. Baits towards 112 genes were included in the target regions capturing all coding

exons or mutational hotspots recently reported for prostate and breast cancer (to be

applied in a parallel project) [7]. The entire AR gene, including introns was also included

in the design. The final size of the design, excluding non-unique, repetitive regions was

295 kb (Supplementary Table 3). Targeted sequencing was applied to both cfDNA and

germline DNA. Sequencing was performed on the Illumina 2500 instrument in rapid

3

Page 4: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

mode generating 1x50 bases for low-pass whole genome sequencing and 2x100 bases for

targeted sequencing.

Sequence analysis

For ARV-Seq

Demultiplexed fastq files were trimmed using Skewer software version 0.1.117 [8] with a

sequence length cutoff of 80 nucleotides. Filtered reads were aligned with the Burrow

Wheeler Algorithm (BWA-mem) using an indexed ARV reference transcript FASTA,

encompassing exon and cryptic exon (CE) signature sequences. SAMTOOLS was used to

infer the read counts per ARV. Raw read counts were subsequently normalized and

expressed as the number of ARV transcripts/1000 reads.

For cfDNA-Seq

Adaptors were trimmed using skewer version 0.1.117 [8] and subsequent alignment to

GRCh37 was performed using BWA-mem version 0.7.7 [9]. Quality control metrics were

assessed using Picard version 1.128 (http://broadinstitute.github.io/picard). Reads with

poor mapping quality (<60) were filtered from the low-pass whole genome sequencing

data before identification of CNVs by applying the R-package QDNAseq [10]. Intra-AR

copy-number alterations were also called in AR by applying the CNVkit algorithm on the

targeted sequencing data [11]. Variant calling was performed using VarDict version 1.4.6

[12]. To obtain a set of low-frequency variants (≥1 %) before filtering, a pooled control

of cfDNA from eight healthy individuals below 40 years of age with an average coverage

4

Page 5: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

of 1278X was used as a comparator. Subsequently, germline variants, identified from

targeted sequencing of germline DNA using FreeBayes version 1.0.1 [13], were filtered

out generating a list of potentially somatic variants. The normal DNA of sample 3843-P-

2013537 failed library prep. However, only low-fraction somatic variants were found for

this sample, therefore an alternative approach was applied for determining the circulating

tumor DNA (ctDNA) fraction (described below). Variants were annotated using SNPeff

version 3.3 [14]. The following filteres were applied to minimize false positives:

1) Potential germline variant positions missed by FreeBayes or due to low-germline

sequence coverage were removed (≥ 0.1% frequency in the ExAC database [15])

2) Reoccuring hot-spot variants, not previously reported [7] were removed (alleleic

fraction > 0.05, reoccuring in ≥ 10 samples)

3) Reoccuring variants due to variable error rate in the human genome [16] were

removed (allelic fraction < 0.05, reoccuring in ≥ 2 samples)

3) Variants with allelic fraction ≥ 3% or with ≥ 10 read support were kept

An exception was done for AR, where recently reported hot-spot variants [17] were

retained if supported by ≥2 reads. Only consequence types with potential to affect

protein function were kept to identify putative mutational driving events in each plasma

sample and to calculate ctDNA fraction (e.g. non synonymous coding, frame shift, stop

gained etc). Plasma ctDNA fraction was assessed by combining CNVs and mutations

excluding variation in AR. The ctDNA fraction was estimated for each mutation with 1)

no overlapping deletion: ctDNA fraction = Allelic fractionmutation x 2; 2) an overlapping

deletion: ctDNA fraction = Allelic fractionmutation. As the majority of tumors harbored low-

frequency mutations, the ctDNA fraction was calculated as the average of all mutations

5

Page 6: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

with Alleleic fraction ≥ max(Alleleic fraction) / 2 for each cfDNA sample. Two samples

(4069-P-2014174 and 3843-P-2013537) contained only variants with low allelic fraction

(<0.04) despite presence of CNVs. Therefore, a heteorozygous deletion of chr8p,

assuming ploidy 2 for 4069-P-2014174 and a large deletion on chrXq for 3843-P-

2013537 were used to estimate tumor fraction as described previously [18]. The number

of AR copies was calculated by adjusting the relative copy number for ctDNA fraction

[18]. AR was classified as highly amplified (>10 copies), moderately amplified (2 – 10

copies) and neutral (1 copy).

Identification of intra-AR strutural variation

To identify structural variants supported by discordant and split read evidence, an in-

house structural variant calling algorithm, "svcaller" was implemented. The algorithm

implemented in svcaller is similar to the algorithm underpinning Delly [19], and consists

of read filtering, read clustering, event calling, and event filtering steps. Several

refinements were made to this algorithm, to enable filtering of specific types of false

positive event, and to facilitate visualization of the read and soft-clipping evidence

supporting each putative event. Delly was also run with default parameters on bam files

corresponding to all samples, to examine concordance with svcaller. Delly identified 87%

(62/71) of all structural variant events that svcaller identified within AR. This indicates

that the events identified by svcaller are highly reproducable with the differing algorithm.

AR tandem duplication events are shown for 4213-P-2015142 (Supplementary Fig. 14A)

and AR inversion events are shown for 3542-P-2014235 (Supplementary Fig. 14B),

6

Page 7: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

providing examples of structural variant calls that are identified by svcaller and also

supported by Delly.

Svcaller first filters reads to restrict to discordant read-pairs that support the specified

event type (deletion, inversion, tandem duplication, translocation). For deletion events, a

given read pair is only retained if the reads face each other on the same chromosome, and

are separated by > 1000bp. For inversion events, a read pair is only retained if the reads

both point in the same direction and are on the same chromosome. For tandem

duplication events, a read pair is only retained if the reads face away from each other on

the same chromosome. For translocation events, a read pair is only retained if the reads

are on distinct chromosomes.

After an event-type specific filter has been applied to the input bam file, reads are further

filtered to only retain those read pairs that are part of a larger cluster of read-pairs

supporting a putative event. Only non-duplicate reads are considered for this purpose, and

secondary alignments are excluded. Each read-pair (A_1, A_2) is then only retained if

there exists at least one additional read-pair (B_1, B_2) in which either B_1 or B_2 is

located within 1000bp of A_1, and the other read from (B_1/B_2) is located within

1000bp of A_2, with distance measured from the start alignment position of each read.

Events are then called on the resulting bam file, which only contains read pairs filtered

according to event type and read pair clustering. Clusters are first detected by scanning

over each strand and agglomerating overlapping reads.

7

Page 8: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

Clusters are then paired with one-another: For each cluster C_1, all contributing reads are

retrieved. For each of these reads r_1, the corresponding read-pair r_2 is retrieved, and

C_1 is paired with the cluster that r_2 was previously assigned to, C_2.

Putative events are then identified from these cluster pairs: For each cluster, the set of all

paired clusters is retrieved. For each cluster pair (C_1, C_2), read pairs that match the

cluster pairing are uniquely assigned to a new event. The reads contributing to C_1 are

then used to define event terminus T_1, and reads contributing to C_2 are used to define

event terminus T_2.

Split read evidence is then examined and marked at each of these events: For each event

comprising termini T_1 and T_2, reads from T_1 are retrieved, and all reads with soft-

clipped sequences are obtained from the bwa output fields. The number of read positions

with no call (phred score == 2) is then calculated. If this comprises more than 20% of the

total read sequence length for T_1, then no split-read evidence is stored for the soft-

clipped reads from T_1. Otherwise, each soft-clipped sequence is aligned with the Smith-

Waterman algorithm (match score = 1, mis-match penalty = -5) against the genomic

sequence corresponding to the region spanned by T_2, extending by 100bp either side. If

the largest contiguous section of the resulting alignment is longer than 15bp, then this

alignment region is recorded as split read evidence supporting the event. The union of

these split-read support regions is then recorded. The process is repeated swapping T_1

8

Page 9: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

and T_2. An event is declared to have split-read evidence support if either of the event

termini are supported by soft-clipped alignments from the other terminus.

Filters are then applied on the resulting putative events. Events are only retained if there

are >= 3 reads supporting each event terminus, and if each event terminus has at least one

read with mapping quality >= 19. Events are then filtered to only retain those with at least

one terminus supported by split-read evidence derived from the other terminus.

Output gtf and filtered bam files are then generated to facilitate downstream visualization

of the called events and the read-pairs supporting them. IGV [20] was used for

downstream analysis performed here.

The implementation of the algorithm is available at https://github.com/tomwhi/svcaller.

Clinical outcome and statistical analysis

Prevalence of AR CNVs, structural and splice variants were determined. For the

abiraterone- and enzalutamide-treated patients, the proportion of patients with a PSA

response of ≥30% or ≥50% from baseline PSA levels was determined. Progression–free

survival (PFS) as the time between start of therapy and time to no longer clinically

benefitting (NLCB) or time to PSA progression (PCWG2) was determined [2]. In

abiraterone and enzalutamide-treated patients, the clinical outcomes were related to ARV

status (positive vs negative) using a Fisher’s exact test for categorical data (e.g. PSA

response) and Kaplan-Meier analysis for PFS data. Survival differences were determined

using the log-rank test. Uni-variate Cox regression analysis was performed to assess the

9

Page 10: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

effect of the ARV presence on PFS. All tests were performed in R, with a two-sided p-

value <0.05 as being considered as statistically significant.

References

[1] Heidenreich A, Bastian PJ, Bellmunt J, Bolla M, Joniau S, van der Kwast T, et al. EAU Guidelines on Prostate Cancer. Part II: Treatment of Advanced, Relapsing, and Castration-Resistant Prostate Cancer. European Urology 2014;65:467–79. doi:10.1016/j.eururo.2013.11.002.

[2] Scher HI, Morris MJ, Stadler WM, Higano C, Basch E, Fizazi K, et al. Trial Design and Objectives for Castration-Resistant Prostate Cancer: Updated Recommendations From the Prostate Cancer Clinical Trials Working Group 3. Journal of Clinical Oncology 2016;34:1402–18. doi:10.1200/JCO.2015.64.2702.

[3] Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, et al. New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). European Journal of Cancer 2009;45:228–47. doi:10.1016/j.ejca.2008.10.026.

[4] Peeters DJE, De Laere B, Van den Eynden GG, Van Laere SJ, Rothé F, Ignatiadis M, et al. Semiautomated isolation and molecular characterisation of single or highly purified tumour cells from CellSearch enriched blood samples using dielectrophoretic cell sorting. British Journal of Cancer 2013;108:1358–67. doi:10.1038/bjc.2013.92.

[5] Sieuwerts AM, Kraan J, Bolt-de Vries J, van der Spoel P, Mostert B, Martens JWM, et al. Molecular characterization of circulating tumor cells in large quantities of contaminating leukocytes by a multiplex real-time PCR. Breast Cancer Res Treat 2008;118:455–68. doi:10.1007/s10549-008-0290-0.

[6] Onstenk W, Sieuwerts AM, Kraan J, Van M, Nieuweboer AJM, Mathijssen RHJ, et al. Efficacy of Cabazitaxel in Castration-resistant Prostate Cancer Is Independent of the Presence of AR-V7 in Circulating Tumor Cells. European Urology 2015:1–7. doi:10.1016/j.eururo.2015.07.007.

[7] Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, et al.

10

Page 11: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nature Biotechnology 2015;34:155–63. doi:10.1038/nbt.3391.

[8] Jiang H, Lei R, Ding S-W, Zhu S. Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 2014;15:182. doi:10.1186/1471-2105-15-182.

[9] Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754–60. doi:10.1093/bioinformatics/btp324.

[10] Scheinin I, Sie D, Bengtsson H, van de Wiel MA, Olshen AB, van Thuijl HF, et al. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res 2014;24:2022–32. doi:10.1101/gr.175141.114.

[11] Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol 2016;12:e1004873–18. doi:10.1371/journal.pcbi.1004873.

[12] Lai Z, Markovets A, Ahdesmaki M, Chapman B, Hofmann O, McEwen R, et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res 2016;44:e108. doi:10.1093/nar/gkw227.

[13] Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv 2012;q-bio.GN.

[14] Cingolani P, Platts A, Wang LL, Coon M, Nguyen T. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; …. Fly 2012;6:80–92. doi:10.4161/fly.19695.

[15] Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature Publishing Group 2016;536:285–91. doi:10.1038/nature19057.

[16] Newman AM, Bratman SV, To J, Wynne JF, Eclov NCW, Modlin LA, et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nature Medicine 2014;20:548–54. doi:10.1038/nm.3519.

[17] Lallous N, Volik SV, Awrey S, Leblanc E, Tse R, Murillo J, et al. Functional analysis of androgen receptor mutations that confer anti-androgen resistance identified in circulating cell-free DNA from prostate cancer patients. Genome Biol 2016:1–15. doi:10.1186/s13059-015-0864-1.

[18] Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, et al. Absolute quantification of somatic DNA alterations in human cancer. Nature Biotechnology 2012;30:413–21. doi:10.1038/nbt.2203.

[19] Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 2012;28:i333–9. doi:10.1093/bioinformatics/bts378.

[20] Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nature Publishing Group 2011;29:24–6.

11

Page 12: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

doi:10.1038/nbt.1754.

Supplementary Fig. 1 – Copy-number status of AR. Low-pass whole genome

sequencing was performed to infer copy-number status of AR. Top panel: 3542-P-

2014235 carried a complex amplification on chromosome X affecting AR and

neighbouring regions. Middle panel: 3949-P-2014061 harboured a focal amplification

event affecting one region, including AR. Bottom panel: No amplification could be

detected on chromosome X for 4120-P-2015352. Y-axis: log2 copy-number ratio; X-axis:

position on chromosome X. Green points: Binned regions on chromosome X. Horizontal

lines: Segmentation of binned regions used to infer copy-number alterations. Dashed

vertical lines: location of AR. Colours: Grey, copy number neutral; dark red, moderately

amplified; bright red, high level amplification; dark blue, – deletion.

Supplementary Fig. 2 – AR structure and baited regions. The structure of AR,

displaying the non-repetitive regions possible to profile using in-solution targeted

capture. Bait regions denote regions captured and subsequently profiled. CE – cryptic

exon.

Supplementary Fig. 3 – Circulating tumour DNA mutational landscape. The

mutations and small indels detected directly from cell-free DNA. X-axis: gene names

sorted according to number of detected mutations among all samples. Y-axis: Cell-free

DNA samples profiled. Type of mutation is coloured according to the right legend. Only

12

Page 13: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

mutations with potentially protein altering function are displayed. Brackets mark cell-free

DNA samples originating from the same patient, sampled at different time points. Note:

AR Mutations within and outside hotspots are reported here.

Supplementary Fig. 4 – Schematic display of structural variant detection. Top panel:

The regions A, B and C are directly adjacent in the reference genome. Region B is

deleted in the tumour genome and sequence data is generated by sequencing the DNA of

the tumour. Paired end sequencing is applied which generates sequencing data 100 bp

from each end of each DNA fragment, directed inward (arrows). The dashed line denotes

unknown sequence from each sequenced DNA fragment. Subsequently, mapping is

performed to the reference genome to determine the location of each read from each read-

pair. As region B exist in the reference genome, read-pairs supporting the deletion will:

1) harbour unexpected large distance between read-pairs, visualized through angled

dashed lines 2) partially map to one end of the structural event, visualized by shaded

arrows. Bottom panel: As top panel but displaying an inversion. The reads of read-pairs

supporting the event now point in the same direction with unexpectedly large distance.

Supplementary Fig. 5 – Copy-number alterations for 4120 and 3843. A) The targeted

sequencing data was applied to infer intra-AR deletions for 4120-P-2015352. The AR

exons are displayed to visualize the region of AR affected by the deletion. CE – cryptic

exon. B) Low-pass whole genome sequencing was performed to infer copy-number

alterations on the X chromosome for 3843-P-2013537. Vertical solid lines mark the start

and stop of AR. The arrows denote the 5’ and the 3’ end of the tandem duplication. Y-

13

Page 14: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

axis: log2 copy-number ratio. X-axis: position on chromosome X. Green points: Binned

regions on chromosome X. Horizontal lines: Segmentation of binned regions used to

infer copy-number alterations. Colours: Grey, copy number neutral; dark red, moderately

amplified; bright red, high level amplification; dark blue, – deletion.

Supplementary Fig. 6 – Development and validation of a targeted RNA-Seq assay for ARV

expression analysis. A) ARV qRT-PCR and targeted RNA-Seq assay design, with primer and

probes against unique exon-cryptic exon-specific junctions. B) Representative result from ARV

sequencing in 22Rv1 demonstrating coupled reads (see red tracking) between exon 3 and

sequences within the CE region in intron 3. CE – cryptic exon. C) RNA-seq validation by qRT-

PCR for full-length and splice variants in enriched CTC fractions. r denotes Pearson's correlation

coefficient.

Supplementary Fig. 7 – Multi-level AR profiling in patients harbouring structural variants.

CTC panel: the number of CTCs is expressed per 7.5mL of blood. * denote aborted samples.

CNV panel: AR copy number stratified according to amplification status. Intra-AR panel:

structural variants across the AR gene. Complex rearrangements denote multiple overlapping

variant types within the particular region. Bottom two AR-V panels provide the qualitative and

quantitative overview of ARV expression. Brackets mark samples coming from the same patient.

Supplementary Fig. 8 – Changes in AR splice and structural variants in patients with pre-

and post-abiraterone samples. A) Patient 3542 had 129 circulating tumour cells (CTCs) at

baseline, which were positive for AR splice variants (ARV). Throughout the course of therapy

14

Page 15: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

CTCs and number of ARV transcripts increased. AR gene amplification and complex structural

variants were only inferable in plasma at progressive disease. B) Patient 3885 demonstrated an

increase in CTC number and ARV expression during treatment. AR was highly amplified in both

samples, with more complex intragenic rearrangements at progressive disease. C) Patient 4070

demonstrated a decrease in number of CTCs, with absence of ARV expression. AR remained

moderately amplified, without any structural variants detected pre- and post-treatment. D)

Patient 4174 demonstrated low level ARV expression at the start of therapy, which was

undetected at progressive disease. The AR copy number status shifted from a high-level to

moderately amplified state, with an increase in the number of tandem-duplicated structural

events downstream of exon 1. For each patient, the first and last PSA measurement represents

baseline and progression levels, respectively.

Supplementary Fig. 9 – AR splice variant abundance across ARV-expressing patients.

Boxplot analysis of the number of detected ARV transcripts per 1000 sequenced reads.

Supplementary Fig. 10 – Proposed model of intra-AR structural variation. Due to

the evolutionary pressure of endocrine treatment, pre-existing or spontaneously emerging

clones, expressing non-canonical versions of the androgen receptor will outgrow the

competition. The expressed, non-canonical transcripts are generated as a consequence of

intra-AR variation. In the presence of AR amplifications, full-length, truncated and non-

functional versions of the androgen receptor may exist within the same cell. As the

sequencing data is mapped to the reference genome, multiple variants will be visualized

over AR, as detected in our data and displayed in Fig. 2. In the absence of AR

15

Page 16: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

amplifications, independent clones may harbour different versions of AR. Of note, both

scenarios may occur simultaneously in the same patient.

Supplementary Fig. 11 – Bait design in the context of structural variation. Baits are

designed to capture DNA from non-repetitive unique regions of the genome. If the DNA

of interest matches the reference genome (left panel) or harbour mutations (middle panel)

it will be captured with high efficiency. DNA fragments supporting structural variation is

likely to be captured with low efficiency if the DNA fragment aligns poorly to the bait as

a consequence of the structural variant.

Supplementary Fig. 12 – Theoretical calculation of sequence coverage required to

detect intra-AR structural variation. Calculations were performed in R, to determine

the AR X-fold coverage required to detect structural variants with varying prevalence, in

terms of the fraction of all reads supporting the event. First, a linear model was fit over

the observed samples, with average coverage as the dependent variable and number of

read fragments as the predictor variable. For each observed structural variant event

identified by svcaller, the fraction of total reads supporting the event was computed. This

value _p_ was then used to compute the minimum number of reads _N_ required to

detect such an event with 95% sensitivity. Here, event detection was defined by

observing a number of reads _n_ supporting the event >= 3. _n_ was assumed to follow a

binomial distribution with parameters _p_ and _N_. The linear model relating number of

read fragments to average AR coverage was then used to convert _N_ to a coverage

value. The resulting minimum required coverage and _p_ values were then plotted.

16

Page 17: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

Supplementary Fig. 13 – Coverage and sensitivity to detect intra-AR variation. Top-

panel: Horizontal bars denote sequence coverage for each profiled cfDNA library. Cyan

coloured points mark the coverage needed to detect each structural variant with 95%

sensitivity. The cyan points are visualized for the sample in which it was detected.

Bottom panel: The bottom panel display the cumulative fraction of all detected structural

variants in relation to coverage needed for 95% sensitivity. The coloured inset bars

denote the range of coverage for intra-AR positive (green) and negative (red) samples.

Supplementary Fig. 14 – Comparison of svcaller and Delly output. A) Svcaller

identified three tandem duplication events for 4213-P-2015142 in which at least one

event terminus occurred within the AR region, with all three events also detected by

Delly. Tandem duplication event calls generated by svcaller are denoted by light blue

dashed lines, with soft-clipped sequence support denoted with solid vertical lines at the

event termini. Tandem duplication event calls generated by Delly are denoted by dark

blue rectangles. AR and other gene annotations are displayed in light grey at the bottom

of the panel whilst the chromosome X ideogram and corresponding selected genomic

region is indicated at the top of the panel. Individual read-pairs retained by the tandem

duplication event filter are shown, connected by a grey line and displayed in either

orange- (forward orientation) or blue (reverse orientation) colour. B) Svcaller identified

seven inversion events for 3542-P-2014235 in which at least one event terminus occurred

with the AR region, with all seven events also detected by Delly. Colours as for A),

17

Page 18: ars.els-cdn.com€¦  · Web viewNegative controls included a NTC and -RT sample. ... Baits towards 112 genes were included in the target regions capturing all coding exons or mutational

although read pairs displayed are in this instance those retained by the inversion event

filter.

Supplementary Table 1 – Patient and Sample Characteristics

Supplementary Table 2 – Basic sequencing metrics, AR copy-number status and

circulating tumour DNA fraction for each cell-free DNA sample.

Supplementary Table 3 – HG19 bait coordinates for targeted sequencing.

Supplementary Table 4 – Somatic mutations detected in the circulating tumour DNA by

targeted sequencing.

Supplementary Table 5 – AR and ARV RNA sequencing library sizes (expressed as

number of BWA-mem mapped reads)

Supplementary Table 6 – Primer and hydrolysis probe sequences for targeted AR and

ARV sequencing and qRT-PCR.

18