ars.els-cdn.com€¦ · web viewnegative controls included a ntc and -rt sample. ... baits...
TRANSCRIPT
Supplementary Materials and Methods
Patient accrual
Via a non-interventional clinical study, chemo-pretreated and –naïve patients with CRPC,
were recruited between October 2013 and June 2015. All patients had histologically
confirmed prostate adenocarcinoma, castrate levels (<50 ng/dl) of serum testosterone and
bicalutamide withdrawal for at least 6 weeks, conform to EAU guidelines [1]. Due to
shifts in reimbursement criteria and treatment management the studied cohort is
heterogeneous, reflecting current clinical practice. Measures of response and progression,
based on serum PSA levels or radiographic imaging, are defined according to Prostate
Cancer Clinical Trials Working Group 3 criteria and Response Evaluation Criteria in
Solid Tumours version 1.1, respectively [2,3]. Evaluation of clinical criteria is
determined by clinical judgement of the treating physician. A clinical protocol was
reviewed and ethical approval was acquired by the institutional review and ethics board
of GZA Sint-Augustinus. All patients provided a written informed consent document.
Blood collection and processing
Blood collection and processing for circulating tumor cell (CTC) enumeration (within 72
hours) or CTC enrichment for transcriptional analysis (within 8 hours) were performed
using the FDA-cleared CellSearch CTC and Profile technology, respectively, as
described previously [4,5]. Profile-enriched CTC fractions were lysed with 250μL
RNeasy RLT+ buffer (Qiagen BV, The Netherlands) and stored at -80°C until RNA
isolation. Leftover EDTA blood was subjected to red blood cell lysis, to generate white
blood cell pellets for germline DNA isolation. For genomic analysis of circulating cell-
free DNA (cfDNA), 5 mL of citrate-collected blood was centrifuged within 1 hour,
resulting in 1-1.5 mL of plasma, which was stored at -80°C until cfDNA isolation.
RNA isolation, cDNA synthesis and AR-V preamplification from enriched CTC
fractions
RNA was isolated from CTC lysates with the AllPrep DNA/RNA Micro Kit (Qiagen).
Complementary DNA (cDNA) synthesis and AR-V pre-amplification was performed on
25% of the isolated RNA, using the RevertAidTM H Minus First Strand cDNA and
TaqManTM PreAmp amplification kit, respectively, in a GeneAmp® PCR System 9700
(Life technologies). Pre-amplified cDNA was diluted 15x with 1xTE-buffer.
Full-length AR and AR splice variant (ARV) Multiplex Amplification of Specific
Targets for Resequencing and qRT-PCR validation
Pre-amplified and diluted cDNA was subjected to a 20-cycle multiplex PCR for
amplification, sequence adaptor ligation and indexing of full-length AR and AR splice
variant (ARV) transcripts using Multiplex Amplification of Specific Targets for
Resequencing (MASTR) technology (Multiplicom NV, Belgium). Amplicon libraries
were verified by fragment analysis using Genescan (Applied Biosystems). Pooled
libraries were sequenced on a MiSeq (Illumina) with v2 chemistry (2x251 cycli). qRT-
PCR was used to validate counted AR and ARV transcripts by targeted RNA sequencing
(see Supplementary Table 6). Additionally, 3 housekeeping genes (GUSB, HMBS and
HPRT1) were used to control for sample loading and RNA integrity [5,6]. Epithelial
(EPCAM, CK19) and white blood cell (PTPRC) markers were used to control for
2
presence of epithelial and WBC content. PCR reactions (40 cycles) were performed using
TaqMan™ Gene Expression Assays with Universal PCR Master Mix No AmpErase
UNG on a 7900HT Fast Real-time PCR System (all Applied Biosystems). A calibrator
(positive control) sample was used in each run to assess inter-run variability. Negative
controls included a NTC and -RT sample. Only samples with a Cq value of <35 for each
of the 3 reference genes, were considered of sufficient quality and quantity.
Isolation of cfDNA, library prep and sequencing strategy
cfDNA isolation was performed using the QiaVac system (Qiagen). Purified cfDNA was
subjected to fragment analysis. Germline DNA from WBCs was isolated using the
QIAamp DNA Blood Mini Kit. Library prep was performed using the ThruPLEX DNA-
seq kit (Rubicon Genomics). 0.1 – 50 ng of cfDNA (median 7.4 ng) and 50 ng of
germline DNA was used to create the libraries. Low-pass whole genome sequencing was
performed on each cfDNA library to enable identification of copy-number variants
(CNVs). The SeqCap EZ system (Roche Nimblegen) was applied for targeted
sequencing. The targeted regions were designed to capture unique regions in the human
genome commonly mutated in prostate cancer, identified through extensive literature
review. Baits towards 112 genes were included in the target regions capturing all coding
exons or mutational hotspots recently reported for prostate and breast cancer (to be
applied in a parallel project) [7]. The entire AR gene, including introns was also included
in the design. The final size of the design, excluding non-unique, repetitive regions was
295 kb (Supplementary Table 3). Targeted sequencing was applied to both cfDNA and
germline DNA. Sequencing was performed on the Illumina 2500 instrument in rapid
3
mode generating 1x50 bases for low-pass whole genome sequencing and 2x100 bases for
targeted sequencing.
Sequence analysis
For ARV-Seq
Demultiplexed fastq files were trimmed using Skewer software version 0.1.117 [8] with a
sequence length cutoff of 80 nucleotides. Filtered reads were aligned with the Burrow
Wheeler Algorithm (BWA-mem) using an indexed ARV reference transcript FASTA,
encompassing exon and cryptic exon (CE) signature sequences. SAMTOOLS was used to
infer the read counts per ARV. Raw read counts were subsequently normalized and
expressed as the number of ARV transcripts/1000 reads.
For cfDNA-Seq
Adaptors were trimmed using skewer version 0.1.117 [8] and subsequent alignment to
GRCh37 was performed using BWA-mem version 0.7.7 [9]. Quality control metrics were
assessed using Picard version 1.128 (http://broadinstitute.github.io/picard). Reads with
poor mapping quality (<60) were filtered from the low-pass whole genome sequencing
data before identification of CNVs by applying the R-package QDNAseq [10]. Intra-AR
copy-number alterations were also called in AR by applying the CNVkit algorithm on the
targeted sequencing data [11]. Variant calling was performed using VarDict version 1.4.6
[12]. To obtain a set of low-frequency variants (≥1 %) before filtering, a pooled control
of cfDNA from eight healthy individuals below 40 years of age with an average coverage
4
of 1278X was used as a comparator. Subsequently, germline variants, identified from
targeted sequencing of germline DNA using FreeBayes version 1.0.1 [13], were filtered
out generating a list of potentially somatic variants. The normal DNA of sample 3843-P-
2013537 failed library prep. However, only low-fraction somatic variants were found for
this sample, therefore an alternative approach was applied for determining the circulating
tumor DNA (ctDNA) fraction (described below). Variants were annotated using SNPeff
version 3.3 [14]. The following filteres were applied to minimize false positives:
1) Potential germline variant positions missed by FreeBayes or due to low-germline
sequence coverage were removed (≥ 0.1% frequency in the ExAC database [15])
2) Reoccuring hot-spot variants, not previously reported [7] were removed (alleleic
fraction > 0.05, reoccuring in ≥ 10 samples)
3) Reoccuring variants due to variable error rate in the human genome [16] were
removed (allelic fraction < 0.05, reoccuring in ≥ 2 samples)
3) Variants with allelic fraction ≥ 3% or with ≥ 10 read support were kept
An exception was done for AR, where recently reported hot-spot variants [17] were
retained if supported by ≥2 reads. Only consequence types with potential to affect
protein function were kept to identify putative mutational driving events in each plasma
sample and to calculate ctDNA fraction (e.g. non synonymous coding, frame shift, stop
gained etc). Plasma ctDNA fraction was assessed by combining CNVs and mutations
excluding variation in AR. The ctDNA fraction was estimated for each mutation with 1)
no overlapping deletion: ctDNA fraction = Allelic fractionmutation x 2; 2) an overlapping
deletion: ctDNA fraction = Allelic fractionmutation. As the majority of tumors harbored low-
frequency mutations, the ctDNA fraction was calculated as the average of all mutations
5
with Alleleic fraction ≥ max(Alleleic fraction) / 2 for each cfDNA sample. Two samples
(4069-P-2014174 and 3843-P-2013537) contained only variants with low allelic fraction
(<0.04) despite presence of CNVs. Therefore, a heteorozygous deletion of chr8p,
assuming ploidy 2 for 4069-P-2014174 and a large deletion on chrXq for 3843-P-
2013537 were used to estimate tumor fraction as described previously [18]. The number
of AR copies was calculated by adjusting the relative copy number for ctDNA fraction
[18]. AR was classified as highly amplified (>10 copies), moderately amplified (2 – 10
copies) and neutral (1 copy).
Identification of intra-AR strutural variation
To identify structural variants supported by discordant and split read evidence, an in-
house structural variant calling algorithm, "svcaller" was implemented. The algorithm
implemented in svcaller is similar to the algorithm underpinning Delly [19], and consists
of read filtering, read clustering, event calling, and event filtering steps. Several
refinements were made to this algorithm, to enable filtering of specific types of false
positive event, and to facilitate visualization of the read and soft-clipping evidence
supporting each putative event. Delly was also run with default parameters on bam files
corresponding to all samples, to examine concordance with svcaller. Delly identified 87%
(62/71) of all structural variant events that svcaller identified within AR. This indicates
that the events identified by svcaller are highly reproducable with the differing algorithm.
AR tandem duplication events are shown for 4213-P-2015142 (Supplementary Fig. 14A)
and AR inversion events are shown for 3542-P-2014235 (Supplementary Fig. 14B),
6
providing examples of structural variant calls that are identified by svcaller and also
supported by Delly.
Svcaller first filters reads to restrict to discordant read-pairs that support the specified
event type (deletion, inversion, tandem duplication, translocation). For deletion events, a
given read pair is only retained if the reads face each other on the same chromosome, and
are separated by > 1000bp. For inversion events, a read pair is only retained if the reads
both point in the same direction and are on the same chromosome. For tandem
duplication events, a read pair is only retained if the reads face away from each other on
the same chromosome. For translocation events, a read pair is only retained if the reads
are on distinct chromosomes.
After an event-type specific filter has been applied to the input bam file, reads are further
filtered to only retain those read pairs that are part of a larger cluster of read-pairs
supporting a putative event. Only non-duplicate reads are considered for this purpose, and
secondary alignments are excluded. Each read-pair (A_1, A_2) is then only retained if
there exists at least one additional read-pair (B_1, B_2) in which either B_1 or B_2 is
located within 1000bp of A_1, and the other read from (B_1/B_2) is located within
1000bp of A_2, with distance measured from the start alignment position of each read.
Events are then called on the resulting bam file, which only contains read pairs filtered
according to event type and read pair clustering. Clusters are first detected by scanning
over each strand and agglomerating overlapping reads.
7
Clusters are then paired with one-another: For each cluster C_1, all contributing reads are
retrieved. For each of these reads r_1, the corresponding read-pair r_2 is retrieved, and
C_1 is paired with the cluster that r_2 was previously assigned to, C_2.
Putative events are then identified from these cluster pairs: For each cluster, the set of all
paired clusters is retrieved. For each cluster pair (C_1, C_2), read pairs that match the
cluster pairing are uniquely assigned to a new event. The reads contributing to C_1 are
then used to define event terminus T_1, and reads contributing to C_2 are used to define
event terminus T_2.
Split read evidence is then examined and marked at each of these events: For each event
comprising termini T_1 and T_2, reads from T_1 are retrieved, and all reads with soft-
clipped sequences are obtained from the bwa output fields. The number of read positions
with no call (phred score == 2) is then calculated. If this comprises more than 20% of the
total read sequence length for T_1, then no split-read evidence is stored for the soft-
clipped reads from T_1. Otherwise, each soft-clipped sequence is aligned with the Smith-
Waterman algorithm (match score = 1, mis-match penalty = -5) against the genomic
sequence corresponding to the region spanned by T_2, extending by 100bp either side. If
the largest contiguous section of the resulting alignment is longer than 15bp, then this
alignment region is recorded as split read evidence supporting the event. The union of
these split-read support regions is then recorded. The process is repeated swapping T_1
8
and T_2. An event is declared to have split-read evidence support if either of the event
termini are supported by soft-clipped alignments from the other terminus.
Filters are then applied on the resulting putative events. Events are only retained if there
are >= 3 reads supporting each event terminus, and if each event terminus has at least one
read with mapping quality >= 19. Events are then filtered to only retain those with at least
one terminus supported by split-read evidence derived from the other terminus.
Output gtf and filtered bam files are then generated to facilitate downstream visualization
of the called events and the read-pairs supporting them. IGV [20] was used for
downstream analysis performed here.
The implementation of the algorithm is available at https://github.com/tomwhi/svcaller.
Clinical outcome and statistical analysis
Prevalence of AR CNVs, structural and splice variants were determined. For the
abiraterone- and enzalutamide-treated patients, the proportion of patients with a PSA
response of ≥30% or ≥50% from baseline PSA levels was determined. Progression–free
survival (PFS) as the time between start of therapy and time to no longer clinically
benefitting (NLCB) or time to PSA progression (PCWG2) was determined [2]. In
abiraterone and enzalutamide-treated patients, the clinical outcomes were related to ARV
status (positive vs negative) using a Fisher’s exact test for categorical data (e.g. PSA
response) and Kaplan-Meier analysis for PFS data. Survival differences were determined
using the log-rank test. Uni-variate Cox regression analysis was performed to assess the
9
effect of the ARV presence on PFS. All tests were performed in R, with a two-sided p-
value <0.05 as being considered as statistically significant.
References
[1] Heidenreich A, Bastian PJ, Bellmunt J, Bolla M, Joniau S, van der Kwast T, et al. EAU Guidelines on Prostate Cancer. Part II: Treatment of Advanced, Relapsing, and Castration-Resistant Prostate Cancer. European Urology 2014;65:467–79. doi:10.1016/j.eururo.2013.11.002.
[2] Scher HI, Morris MJ, Stadler WM, Higano C, Basch E, Fizazi K, et al. Trial Design and Objectives for Castration-Resistant Prostate Cancer: Updated Recommendations From the Prostate Cancer Clinical Trials Working Group 3. Journal of Clinical Oncology 2016;34:1402–18. doi:10.1200/JCO.2015.64.2702.
[3] Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, et al. New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1). European Journal of Cancer 2009;45:228–47. doi:10.1016/j.ejca.2008.10.026.
[4] Peeters DJE, De Laere B, Van den Eynden GG, Van Laere SJ, Rothé F, Ignatiadis M, et al. Semiautomated isolation and molecular characterisation of single or highly purified tumour cells from CellSearch enriched blood samples using dielectrophoretic cell sorting. British Journal of Cancer 2013;108:1358–67. doi:10.1038/bjc.2013.92.
[5] Sieuwerts AM, Kraan J, Bolt-de Vries J, van der Spoel P, Mostert B, Martens JWM, et al. Molecular characterization of circulating tumor cells in large quantities of contaminating leukocytes by a multiplex real-time PCR. Breast Cancer Res Treat 2008;118:455–68. doi:10.1007/s10549-008-0290-0.
[6] Onstenk W, Sieuwerts AM, Kraan J, Van M, Nieuweboer AJM, Mathijssen RHJ, et al. Efficacy of Cabazitaxel in Castration-resistant Prostate Cancer Is Independent of the Presence of AR-V7 in Circulating Tumor Cells. European Urology 2015:1–7. doi:10.1016/j.eururo.2015.07.007.
[7] Chang MT, Asthana S, Gao SP, Lee BH, Chapman JS, Kandoth C, et al.
10
Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nature Biotechnology 2015;34:155–63. doi:10.1038/nbt.3391.
[8] Jiang H, Lei R, Ding S-W, Zhu S. Skewer: a fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics 2014;15:182. doi:10.1186/1471-2105-15-182.
[9] Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754–60. doi:10.1093/bioinformatics/btp324.
[10] Scheinin I, Sie D, Bengtsson H, van de Wiel MA, Olshen AB, van Thuijl HF, et al. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res 2014;24:2022–32. doi:10.1101/gr.175141.114.
[11] Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol 2016;12:e1004873–18. doi:10.1371/journal.pcbi.1004873.
[12] Lai Z, Markovets A, Ahdesmaki M, Chapman B, Hofmann O, McEwen R, et al. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res 2016;44:e108. doi:10.1093/nar/gkw227.
[13] Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv 2012;q-bio.GN.
[14] Cingolani P, Platts A, Wang LL, Coon M, Nguyen T. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; …. Fly 2012;6:80–92. doi:10.4161/fly.19695.
[15] Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature Publishing Group 2016;536:285–91. doi:10.1038/nature19057.
[16] Newman AM, Bratman SV, To J, Wynne JF, Eclov NCW, Modlin LA, et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nature Medicine 2014;20:548–54. doi:10.1038/nm.3519.
[17] Lallous N, Volik SV, Awrey S, Leblanc E, Tse R, Murillo J, et al. Functional analysis of androgen receptor mutations that confer anti-androgen resistance identified in circulating cell-free DNA from prostate cancer patients. Genome Biol 2016:1–15. doi:10.1186/s13059-015-0864-1.
[18] Carter SL, Cibulskis K, Helman E, McKenna A, Shen H, Zack T, et al. Absolute quantification of somatic DNA alterations in human cancer. Nature Biotechnology 2012;30:413–21. doi:10.1038/nbt.2203.
[19] Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 2012;28:i333–9. doi:10.1093/bioinformatics/bts378.
[20] Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nature Publishing Group 2011;29:24–6.
11
doi:10.1038/nbt.1754.
Supplementary Fig. 1 – Copy-number status of AR. Low-pass whole genome
sequencing was performed to infer copy-number status of AR. Top panel: 3542-P-
2014235 carried a complex amplification on chromosome X affecting AR and
neighbouring regions. Middle panel: 3949-P-2014061 harboured a focal amplification
event affecting one region, including AR. Bottom panel: No amplification could be
detected on chromosome X for 4120-P-2015352. Y-axis: log2 copy-number ratio; X-axis:
position on chromosome X. Green points: Binned regions on chromosome X. Horizontal
lines: Segmentation of binned regions used to infer copy-number alterations. Dashed
vertical lines: location of AR. Colours: Grey, copy number neutral; dark red, moderately
amplified; bright red, high level amplification; dark blue, – deletion.
Supplementary Fig. 2 – AR structure and baited regions. The structure of AR,
displaying the non-repetitive regions possible to profile using in-solution targeted
capture. Bait regions denote regions captured and subsequently profiled. CE – cryptic
exon.
Supplementary Fig. 3 – Circulating tumour DNA mutational landscape. The
mutations and small indels detected directly from cell-free DNA. X-axis: gene names
sorted according to number of detected mutations among all samples. Y-axis: Cell-free
DNA samples profiled. Type of mutation is coloured according to the right legend. Only
12
mutations with potentially protein altering function are displayed. Brackets mark cell-free
DNA samples originating from the same patient, sampled at different time points. Note:
AR Mutations within and outside hotspots are reported here.
Supplementary Fig. 4 – Schematic display of structural variant detection. Top panel:
The regions A, B and C are directly adjacent in the reference genome. Region B is
deleted in the tumour genome and sequence data is generated by sequencing the DNA of
the tumour. Paired end sequencing is applied which generates sequencing data 100 bp
from each end of each DNA fragment, directed inward (arrows). The dashed line denotes
unknown sequence from each sequenced DNA fragment. Subsequently, mapping is
performed to the reference genome to determine the location of each read from each read-
pair. As region B exist in the reference genome, read-pairs supporting the deletion will:
1) harbour unexpected large distance between read-pairs, visualized through angled
dashed lines 2) partially map to one end of the structural event, visualized by shaded
arrows. Bottom panel: As top panel but displaying an inversion. The reads of read-pairs
supporting the event now point in the same direction with unexpectedly large distance.
Supplementary Fig. 5 – Copy-number alterations for 4120 and 3843. A) The targeted
sequencing data was applied to infer intra-AR deletions for 4120-P-2015352. The AR
exons are displayed to visualize the region of AR affected by the deletion. CE – cryptic
exon. B) Low-pass whole genome sequencing was performed to infer copy-number
alterations on the X chromosome for 3843-P-2013537. Vertical solid lines mark the start
and stop of AR. The arrows denote the 5’ and the 3’ end of the tandem duplication. Y-
13
axis: log2 copy-number ratio. X-axis: position on chromosome X. Green points: Binned
regions on chromosome X. Horizontal lines: Segmentation of binned regions used to
infer copy-number alterations. Colours: Grey, copy number neutral; dark red, moderately
amplified; bright red, high level amplification; dark blue, – deletion.
Supplementary Fig. 6 – Development and validation of a targeted RNA-Seq assay for ARV
expression analysis. A) ARV qRT-PCR and targeted RNA-Seq assay design, with primer and
probes against unique exon-cryptic exon-specific junctions. B) Representative result from ARV
sequencing in 22Rv1 demonstrating coupled reads (see red tracking) between exon 3 and
sequences within the CE region in intron 3. CE – cryptic exon. C) RNA-seq validation by qRT-
PCR for full-length and splice variants in enriched CTC fractions. r denotes Pearson's correlation
coefficient.
Supplementary Fig. 7 – Multi-level AR profiling in patients harbouring structural variants.
CTC panel: the number of CTCs is expressed per 7.5mL of blood. * denote aborted samples.
CNV panel: AR copy number stratified according to amplification status. Intra-AR panel:
structural variants across the AR gene. Complex rearrangements denote multiple overlapping
variant types within the particular region. Bottom two AR-V panels provide the qualitative and
quantitative overview of ARV expression. Brackets mark samples coming from the same patient.
Supplementary Fig. 8 – Changes in AR splice and structural variants in patients with pre-
and post-abiraterone samples. A) Patient 3542 had 129 circulating tumour cells (CTCs) at
baseline, which were positive for AR splice variants (ARV). Throughout the course of therapy
14
CTCs and number of ARV transcripts increased. AR gene amplification and complex structural
variants were only inferable in plasma at progressive disease. B) Patient 3885 demonstrated an
increase in CTC number and ARV expression during treatment. AR was highly amplified in both
samples, with more complex intragenic rearrangements at progressive disease. C) Patient 4070
demonstrated a decrease in number of CTCs, with absence of ARV expression. AR remained
moderately amplified, without any structural variants detected pre- and post-treatment. D)
Patient 4174 demonstrated low level ARV expression at the start of therapy, which was
undetected at progressive disease. The AR copy number status shifted from a high-level to
moderately amplified state, with an increase in the number of tandem-duplicated structural
events downstream of exon 1. For each patient, the first and last PSA measurement represents
baseline and progression levels, respectively.
Supplementary Fig. 9 – AR splice variant abundance across ARV-expressing patients.
Boxplot analysis of the number of detected ARV transcripts per 1000 sequenced reads.
Supplementary Fig. 10 – Proposed model of intra-AR structural variation. Due to
the evolutionary pressure of endocrine treatment, pre-existing or spontaneously emerging
clones, expressing non-canonical versions of the androgen receptor will outgrow the
competition. The expressed, non-canonical transcripts are generated as a consequence of
intra-AR variation. In the presence of AR amplifications, full-length, truncated and non-
functional versions of the androgen receptor may exist within the same cell. As the
sequencing data is mapped to the reference genome, multiple variants will be visualized
over AR, as detected in our data and displayed in Fig. 2. In the absence of AR
15
amplifications, independent clones may harbour different versions of AR. Of note, both
scenarios may occur simultaneously in the same patient.
Supplementary Fig. 11 – Bait design in the context of structural variation. Baits are
designed to capture DNA from non-repetitive unique regions of the genome. If the DNA
of interest matches the reference genome (left panel) or harbour mutations (middle panel)
it will be captured with high efficiency. DNA fragments supporting structural variation is
likely to be captured with low efficiency if the DNA fragment aligns poorly to the bait as
a consequence of the structural variant.
Supplementary Fig. 12 – Theoretical calculation of sequence coverage required to
detect intra-AR structural variation. Calculations were performed in R, to determine
the AR X-fold coverage required to detect structural variants with varying prevalence, in
terms of the fraction of all reads supporting the event. First, a linear model was fit over
the observed samples, with average coverage as the dependent variable and number of
read fragments as the predictor variable. For each observed structural variant event
identified by svcaller, the fraction of total reads supporting the event was computed. This
value _p_ was then used to compute the minimum number of reads _N_ required to
detect such an event with 95% sensitivity. Here, event detection was defined by
observing a number of reads _n_ supporting the event >= 3. _n_ was assumed to follow a
binomial distribution with parameters _p_ and _N_. The linear model relating number of
read fragments to average AR coverage was then used to convert _N_ to a coverage
value. The resulting minimum required coverage and _p_ values were then plotted.
16
Supplementary Fig. 13 – Coverage and sensitivity to detect intra-AR variation. Top-
panel: Horizontal bars denote sequence coverage for each profiled cfDNA library. Cyan
coloured points mark the coverage needed to detect each structural variant with 95%
sensitivity. The cyan points are visualized for the sample in which it was detected.
Bottom panel: The bottom panel display the cumulative fraction of all detected structural
variants in relation to coverage needed for 95% sensitivity. The coloured inset bars
denote the range of coverage for intra-AR positive (green) and negative (red) samples.
Supplementary Fig. 14 – Comparison of svcaller and Delly output. A) Svcaller
identified three tandem duplication events for 4213-P-2015142 in which at least one
event terminus occurred within the AR region, with all three events also detected by
Delly. Tandem duplication event calls generated by svcaller are denoted by light blue
dashed lines, with soft-clipped sequence support denoted with solid vertical lines at the
event termini. Tandem duplication event calls generated by Delly are denoted by dark
blue rectangles. AR and other gene annotations are displayed in light grey at the bottom
of the panel whilst the chromosome X ideogram and corresponding selected genomic
region is indicated at the top of the panel. Individual read-pairs retained by the tandem
duplication event filter are shown, connected by a grey line and displayed in either
orange- (forward orientation) or blue (reverse orientation) colour. B) Svcaller identified
seven inversion events for 3542-P-2014235 in which at least one event terminus occurred
with the AR region, with all seven events also detected by Delly. Colours as for A),
17
although read pairs displayed are in this instance those retained by the inversion event
filter.
Supplementary Table 1 – Patient and Sample Characteristics
Supplementary Table 2 – Basic sequencing metrics, AR copy-number status and
circulating tumour DNA fraction for each cell-free DNA sample.
Supplementary Table 3 – HG19 bait coordinates for targeted sequencing.
Supplementary Table 4 – Somatic mutations detected in the circulating tumour DNA by
targeted sequencing.
Supplementary Table 5 – AR and ARV RNA sequencing library sizes (expressed as
number of BWA-mem mapped reads)
Supplementary Table 6 – Primer and hydrolysis probe sequences for targeted AR and
ARV sequencing and qRT-PCR.
18