high-throughput sequencing in acute lymphoblastic leukemia

16
HAL Id: hal-01404817 https://hal.archives-ouvertes.fr/hal-01404817 Submitted on 3 Feb 2018 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. High-throughput sequencing in acute lymphoblastic leukemia: Follow-up of minimal residual disease and emergence of new clones Mikael Salson, Mathieu Giraud, Aurélie Caillault, Nathalie Grardel, Nicolas Duployez, Yann Ferret, Marc Duez, Ryan Herbert, Tatiana Rocher, Shéhérazade Sebda, et al. To cite this version: Mikael Salson, Mathieu Giraud, Aurélie Caillault, Nathalie Grardel, Nicolas Duployez, et al.. High-throughput sequencing in acute lymphoblastic leukemia: Follow-up of minimal resid- ual disease and emergence of new clones. Leukemia Research, Elsevier, 2017, 53, pp.1-7. 10.1016/j.leukres.2016.11.009. hal-01404817

Upload: others

Post on 25-Mar-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

HAL Id: hal-01404817https://hal.archives-ouvertes.fr/hal-01404817

Submitted on 3 Feb 2018

HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.

High-throughput sequencing in acute lymphoblasticleukemia: Follow-up of minimal residual disease and

emergence of new clonesMikael Salson, Mathieu Giraud, Aurélie Caillault, Nathalie Grardel, Nicolas

Duployez, Yann Ferret, Marc Duez, Ryan Herbert, Tatiana Rocher,Shéhérazade Sebda, et al.

To cite this version:Mikael Salson, Mathieu Giraud, Aurélie Caillault, Nathalie Grardel, Nicolas Duployez, etal.. High-throughput sequencing in acute lymphoblastic leukemia: Follow-up of minimal resid-ual disease and emergence of new clones. Leukemia Research, Elsevier, 2017, 53, pp.1-7.�10.1016/j.leukres.2016.11.009�. �hal-01404817�

High-throughput sequencing in acute lymphoblastic leukemia:Follow-up of minimal residual disease

and emergence of new clones

Mikael Salson1?, Mathieu Giraud1?, Aurelie Caillault2?, Nathalie Grardel2?, Nicolas Duployez2,Yann Ferret2, Marc Duez3, Ryan Herbert1, Tatiana Rocher1, Sheherazade Sebda4,

Sabine Quief4, Celine Villenet4, Martin Figeac4,5, Claude Preudhomme2

Final accepted manuscript, as published in Leukemia Research 53 (2017) 1–7http://dx.doi.org/10.1016/j.leukres.2016.11.009

Highlights

• Repertoire Sequencing (RepSeq) gives a deepinsight of leukemic lymphoid populations.

• High-throughput sequencing identifies newemerging clones at the time of relapse.

• Clones of low abundance at diagnosis may be-come relapsing clones.

• The Vidjil software enables both automatedand interactive analysis of RepSeq data.

1 CRIStAL UMR 9189 CNRS, Universite de Lille and InriaLille, Lille, France; 2 CHU Lille, Department of Hematology,Universite de Lille, France; 3 School of Social and CommunityMedicine, University of Bristol, Bristol, UK; 4 Univ. Lille, plate-forme de genomique fonctionnelle et structurale, F-59000 Lille,France; 5 CHU Lille, Cellule bioinformatique, plateau communde sequencage, F-59000 Lille, France? These authors contributed equally.

Co-corresponding authors: [email protected] (M. Sal-son and M. Giraud), [email protected] (A.Caillault), [email protected] (N. Grardel),[email protected] (C. Preudhomme).

Conflict of interest: None to report.

Cet article est diffuse librement selon les dispositions de l’articleL533-4 du Code de la Recherche cree par la loi pour une RepubliqueNumerique du 7 octobre 2016. This publication is freely dis-tributed according to the French law.

Abstract

Minimal residual disease (MRD) is known to be an in-dependent prognostic factor in patients with acute lym-phoblastic leukemia (ALL). High-throughput sequenc-ing (HTS) is currently used in routine practice for thediagnosis and follow-up of patients with hematologicalneoplasms. In this retrospective study, we examined therole of immunoglobulin/T-cell receptor-based MRD inpatients with ALL by HTS analysis of immunoglobu-lin H and/or T-cell receptor gamma chain loci in bonemarrow samples from 11 patients with ALL, at diagno-sis and during follow-up. We assessed the clinical feasi-bility of using combined HTS and bioinformatics analy-sis with interactive visualization using Vidjil software.We discuss the advantages and drawbacks of HTS formonitoring MRD. HTS gives a more complete insightof the leukemic population than conventional real-timequantitative PCR (qPCR), and allows identification ofnew emerging clones at each time point of the monitor-ing. Thus, HTS monitoring of Ig/TCR based MRD is ex-pected to improve the management of patients with ALL.

Keywords: Acute Lymphoblastic Leukemia,Minimal Residual Disease, Follow-up, RepertoireSequencing, Bioinformatics

1

1 Introduction

1.1 Assessment of Minimal Residual Disease in ALL

Minimal residual disease (MRD) monitoring hasproven to be one of the strongest independent prog-nostic factors in patients with acute lymphoblas-tic leukemia (ALL) [1, 2]. Sequential monitoringof MRD using sensitive and specific methods, suchas quantitative real-time polymerase chain reaction(qPCR) or flow cytometry, has improved the assess-ment of treatment response [3, 4, 5] and is currentlyused for therapeutic stratification and the early de-tection of relapses [6, 7]. These methods allow thedetection of a single leukemic cell among manynormal cells.

MRD monitoring based on immunoglobulin (Ig)and T-cell receptor (TCR) gene recombinations hasbeen standardized, and has become the gold stan-dard method in routine practice [3, 4]. Ig and TCRrecombinations occur in the early stages of B-celland T-cell development. As a consequence, eachlymphoid cell contains unique V(D)J recombina-tions resulting from random coupling between oneof many possible V, (D) and J genes (combinatorialdiversity), as well as imprecise joining of gene seg-ments and the addition of nucleotides to the DNAsequence at splice sites (junctional diversity) [8].Identical recombinations thus reflect the clonal na-ture of a population, rather than being derived fromindependent cells.

Monitoring of Ig/TCR-based MRD in ALL con-sists of step-by-step analysis of V(D)J DNA re-combinations in lymphoblasts, and their subse-quent detection, with very high sensitivity, dur-ing follow-up. Leukemic clonal recombinationscan be amplified by PCR and examined by capil-lary electrophoresis, and can then be isolated bypolyacrylamide gel electrophoresis and finally se-quenced by direct Sanger sequencing. Overall,clonal V(D)J recombinations occur in > 95% of ALLcases. Allele-specific oligonucleotides (ASOs) aredesigned to quantify MRD in follow-up samplesby qPCR. However, this approach is time consum-ing in routine practice, and monitoring is thereforelimited to the major leukemic clones (usually 1–3markers per patient). This strategy can therefore

yield false-negative results.

Failures of this MRD-monitoring strategy can oc-cur as a result of absence of the initial marker, prob-lems with the design of the customized primer sets(for ASO-PCR), clonal evolution during the courseof the disease, or the emergence of a new clone atthe time of relapse [9].

1.2 Next-generation MRD Monitoring by High-Throughput Sequencing

Repertoire sequencing (RepSeq) involves detailedsequencing of a lymphoid population, focusingon the V(D)J recombinations [10]. Several re-cent studies focused on MRD quantification byhigh-throughput sequencing (HTS) in patients withacute and chronic lymphoid disorders [11, 12].Overall, HTS revealed higher sensitivity and preci-sion in bone marrow and peripheral blood samplescompared with multi-parameter flow cytometry orASO-PCR. Notably, low MRD positivity by HTSwas found in samples that were MRD-negative bystandard methods [12]. In patients with ALL, MRDmonitoring by HTS could predict the risk of relapsein both childhood [12, 13] and adult ALL [14], andboth pre- and post-allogeneic stem cell transplanta-tion [15, 16]. One of the benefits of the high sensitiv-ity of HTS was the possibility of detecting MRD inperipheral blood, instead of bone marrow [15]. Theuse of HTS for MRD quantification may also allowthe molecular heterogeneity of the lymphoid reper-toire to be investigated. Interestingly, Kotrova et al.showed that patients with lower immunoglobulinheavy chain (IgH) diversity at day 78 had signifi-cantly lower relapse-free survival rates [13].

1.3 Challenges

HTS is a valuable technique at the time of ALLdiagnosis, and allows the sequencing of clonal re-combinations of multiple Ig/TCR genes by pool-ing several PCR systems in one experiment. It istherefore possible to identify leukemia-specific se-quences from one or more clones in one or morePCR systems in a single step, without the need forSanger sequencing [17].

HTS is an accurate, reliable, fast and relativelyaffordable technology, applicable to many routine

2

practices. It is expected to improve MRD monitor-ing and enable the detection of multiple clones andsubclones simultaneously, faster and with highersensitivity than standard methods based on V(D)Jsequencing. The use of HTS is also expected to re-duce the failure rate of MRD quantification associ-ated with clonal evolution or the emergence of newclones [18, 19, 20].

ASO-PCR is a standardized technique that usu-ally focuses on 1–3 clones, while HTS is an emerg-ing field, able to evaluate millions of sequencesin a single experiment. The study by van Don-gen et al. [1] recently listed the key challenges forhigh-throughput MRD technologies as broad avail-ability, easy implementation, applicability in mostpatients, adequate sensitivity, fast, affordable, andstandardized, with quality-assurance programs. Itis essential to establish standardized protocols tolimit variability among laboratories. These proto-cols should include the use of primer kits designedfor multi-system PCR.

The quantity of data generated also raises severalchallenges. From a practical point of view, labo-ratories must increase their capacities to store andprocess the terabytes of data generated each year.Finally, it is also necessary to develop software forprocessing the data and synthetizing the results inan easily understandable way for its transfer intoclinical practice.

1.4 Repertoire sequencing (RepSeq) analysis software

The high-throughput analysis of B or T cell recep-tor DNA sequences cannot be achieved using regu-lar mapping software. This is because of the speci-ficity of the problem, with a read usually containinga recombination between three genes with dozensof potential insertions and deletions [21]. These re-combinations, non-templated insertions, deletions,and somatic hypermutations need to be integratedinto the design of the algorithm.

RepSeq software usually performs optimizedcomparisons of HTS reads against a germlinedatabase. The international ImMunoGeneTics in-formation system (IMGT R©, http://imgt.org/)has developed several tools for the in-depth anal-ysis of V(D)J recombinations [22, 23, 24, 25]. New

software to deal with HTS data (i.e., millions ofsequences) has recently been developed, including[26], IgBlast [27], Decombinator [28], miTCR [29],Vidjil [30], TCRKlass [31], miXCR [32], and IM-SEQ [33]. Several of these software tools clusterHTS reads into clones allowing relative quantifica-tion. However few of them offer interactive anal-ysis features. The Vidjil open-source platform en-ables an autonomous usage in an immunology orhematology lab, from raw sequence files to analy-sis, annotation and storage [34].

2 Material and Methods

We conducted HTS-based MRD monitoring of IGHand/or T-cell receptor gamma chain (TRG) loci inpatients with ALL. Data were analyzed using theVidjil web application [30], which is directly usableby hematologists without the need for any ded-icated bioinformatics support. This approach al-lowed us to identify new emerging clones at thetime of relapse that were different from the mainclones highlighted at diagnosis. We also assessedthe reproducibility of our approach and comparedour results with conventional ASO-PCR.

2.1 Patient selection

The clonality of 11 pediatric patients (8 B-ALL /3 T-ALL, 5 female / 6 male, 3-19 years) was stud-ied retrospectively in 43 bone-marrow samples (11diagnosis and 32 follow-up time points, see Sup-plementary Material, Table 1). Approval for thisstudy was obtained from the Institutional ReviewBoard of CHRU of Lille (CSTMT093) and informedconsent was obtained from the patients in accor-dance with the Declaration of Helsinki. Written in-formed consent was obtained from each patient ortheir parent or legal guardian.

2.2 DNA extraction, library preparation and HTS

We sequenced TRG (VGf1-10) and/or IGH by HTSin 11 patients with ALL at diagnosis and at sev-eral follow-up points. For every sample, 5 × 106

cells were extracted from bone marrow or blood us-ing a QIAamp R© DNA Mini Kit (Qiagen). DNA

3

was measured using a Nanodrop system R©, and500 ng were amplified by PCR with non-fluorescentBIOMED-2 TRG and/or IGH primers (Supplemen-tary Material, Table 2). A total of 58 (11 diagnosisand 44 follow-up) samples were analyzed.

The samples were initially purified withMinElute R© PCR Purification Kit (Qiagen). Li-braries were prepared using an Ion XpressTM

Plus gDNA and Amplicon Library Kit (Life Tech-nologies). Each library was barcoded using theIon XpressTM Barcode Adapters 1–96 Kit. Theconcentration of each barcoded library was con-trolled using an Agilent 2100 Bioanalyzer (AgilentTechnologies), and the libraries were then pooledand their concentrations checked again with theAgilent 2100 Bioanalyzer.

The libraries were amplified by emulsion PCRusing an Ion OneTouchTM 2 (Life Technologies)with an Ion PGMTM Template OT2 400 Kit (LifeTechnologies). After amplification, the librarieswere enriched using an Ion OneTouchTM ES (LifeTechnologies). The PCR products were verified us-ing an Ion SphereTM Quality Control Kit With aQubit R© 2.0 Fluorometer (Invitrogen). The librarieswere sequenced using an Ion Personal Genome Ma-chine (Ion PGMTM) with an Ion PGMTM Sequencing400 Kit and Ion 318TM Chip Kit (Life Technologies).

2.3 Bioinformatics analysis and visualization

The raw Ion Torrent flow was transformed to de-multiplexed FastQ sequences using the TorrentServer from Life Technologies, allowing 0 error inbarcode splitting.

The reads were processed using Vidjil software(www.vidjil.org, version 2015.01), with parametersto search for multiple loci and incomplete recombi-nations and to avoid spurious hits (-i -e 1 -z100 -r 1 -3 -d). Vidjil gathers the reads intoclones on the basis of their V(D)J recombinations.Two reads belong to the same clone if they sharethe same “window” overlapping the actual CDR3.This 50 bp window is large enough to be highly spe-cific. Window detection is based on a fast bioinfor-matics method using k-mers indexing of germlinegenes [30].

We examined the lymphocyte populations of

the 11 patients using the Vidjil web application,tracking the clones over time. Interactive visu-alization of these populations can be accessed athttp://www.vidjil.org/data/#2016-lr. Vidjil out-puts provided a representative sequence for eachof the most-represented clones. All fragments ofthe representative sequence are guaranteed to beshared by at least 50% of the reads assigned to theclone. V(D)J recombination analysis (determinationof V(D)J germline genes and N region) was onlyperformed at the end of the process.

Technical errors in PCR and sequencing, as wellas somatic mutations, mean that several clones withslight variations may be detected. A manual re-view was performed in the web application, and,when the pattern clearly indicated a sequencing er-ror, clones were merged.

3 Results

Patient follow-up requires the quantification ofclones even at very low concentrations. The detec-tion level depends directly on the depth of the se-quencing (ie. the number of reads obtained). Onaverage we obtained 1.2 million reads per sample(median: 0.98 million) of which 74% could be an-alyzed by Vidjil (median: 77%). Figure 1 showsthe evolution of the main clones for those patientsanalyzed by the Vidjil web application. We dis-cuss the potential clinical applications of this mul-ticlonal analysis by detailing four patient cases.

Patient 010 (TRG). This patient with T-ALL hadone relapse at D413 (after diagnosis) and under-went allogeneic transplantation at D486. The twolast time points (D445 and D475) were after twoNECTAR (nelarabine, etoposide and cyclophos-phamide) cycles, before allogeneic transplantation.The main clone was undetectable by conventionaltechnique initially (D81) and post-NECTAR, butwas detected by HTS even at a very low level (4and 11 reads). The patient remained relapse-free 2years after transplantation.

Patient 013 (IGH). This patient with pre-B (BIII)ALL was treated in the standard-risk (AR1) groupof the EORTC-58081 protocol. The sequence of themain clone of IGH was not identified by conven-tional technique, while HTS analysis showed a con-

4

Diag +60 +120 +246 +274

Diag +48 +90 +263 +308

Diag +47 +615 +711 +837+794+403 +741 Diag +47 +615 +711 +837+794+403 +741

Diag +413+81 +445 +475

Diag +48 +90 +263 +308

0.0001%

0.01%

0.001%

0.0001%

100%

10%

1%

0.1%

0.01%

0.001%

100%

10%

1%

0.1%

0.01%

0.001%

0.0001%

100%

10%

1%

0.1%

0.0001%

100%

10%

0.1%

0.01%

0.001%

0.0001%

100%

10%

1%

0.1%

0.01%

0.001%

1%

100%

10%

1%

0.1%

0.01%

0.001%

0.0001%

010 (TRG)

063 (TRG)

064 (TRG)

013 (IGH)

063 (IGH)

064 (IGH)

V5 0/5/3 J1

V8 4/6/12 J1

V8 29/2/15 JP2

V3-64 0/17/8 D6-19 1/15/8 J6

V3-9 7/3/17 D3-10 6/0/17 J6

V3-11 1/7/0 D6-13 15/0/17 J6

V3-13 1/55/16 J6

V6-1 27/6/3 D4-11 3/3/4 J4V3 0/6/16 JP1

V2 0/2/0 JP1

V10 4/0/8 JP2V2 0/14/2 J1

V2 0/7/0 JP1

V10 3/0/15 JP1

V3-74 1/0/10 D1-26 0/0/2 J6

Figure 1: Top: Follow-up of patient 010 for TRG (left), and of patient 013 for IGH (right). Mid-dle and bottom: Follow-up of patients 063 and 064, both for TRG (left) and IGH (right). The x-axisshows the days post-diagnosis, and the y-axis the concentration of the clones. All the curves were ob-tained after manual inspection and additional clustering of the clones. Red lines are the clones fol-lowed by ASO-PCR, and the background clones are hidden. The complete interactive curves for the11 patients, and the nucleic sequences of the clones and their V(D)J designations, can be accessed athttp://www.vidjil.org/data/#2016-lr.

5

tinuous decrease in the concentration of the mainclone. However, the patient failed to respond toEORTC-58081 induction, and was switched to thehigh-risk (VHR) therapy arm, and allogeneic trans-plantation was planned for D315. The two mainclones at diagnosis both showed decreasing pro-files, but at different rates (see Figure 1 Patient 013).

Patient 063 (TRG+IGH). This patient withBII ALL received a stem cell transplantation atday D183. The clone identified at diagnosisand followed by ASO-PCR, TRGV10*01 -4/0/-8TRGJP2*01, was undetectable after transplantation.However, the patient relapsed with another clonethat was not identified at diagnosis by conventionaltechnique. Interestingly the relapse clone (D308) ofTRG, TRGV2*01 -0/7/-0 TRGJP1*01, was alreadyobservable by HTS at day D90, and was the mainclone at D263 (1.3% of the reads). This TRG clonewas present at diagnosis, but at a concentration twoorders of magnitude below that of the main clone.However other clones maintained a steady concen-tration throughout, while this clone increased inconcentration. There was no similar IGH clone: themain IGH clones at D308 were not detected at di-agnosis, even with HTS. The raw sequence files forthis patient can be accessed in the SupplementaryMaterial.

Patient 064 (TRG+IGH). This patient with BIIALL and complex karyotypic abnormalities hadtwo relapses at days D615 and D837, respectively.The main TRG (TRGV3*01 -0/6/-16 TRGJP1*01)and IGH clones detected (IGHV6-1*01 -27/19/-4IGHJ4*02) showed parallel evolution. These se-quences therefore probably came from the sameclonal population. At the second relapse, at D837,another TRG clone accounted for about 20% of thereads (TRGV2*02 -0/2/-0 TRGJP1*01). This clonewas present at diagnosis and showed noticeablegrowth even at D403. No IGH clone showed similarevolution. Unfortunately, this patient died 4 yearsafter the diagnosis.

The seven other patients show a more simpleclonal evolution. Their cases are described in Sup-plementary Material.

La-2 La-3 La-4 La-5La+0_BC0107100%

10%

1%

0.1%

0.01%

0.001%

0.0001%IGHV4-39*01 -1/50/-11 IGHJ4*02

W-01

TATCAGCAGCCTAAAGGCTGAGGACACCCGACAGGGTATGGACGTCTGGGGCCAAGGGAC

AGAGATGGGGTTATTACGATATTTTGACTGGTTATTATATTTAGCCGGTCATGGTGTACT

IGHV4-59*08 -1/6/-13 IGHJ1*01

Lam (12) (IGH)

Figure 2: Dilution scale, for diagnosis of patient012 (IGH). Horizontal lines are clones from thespiked-in controls.

NEG POS 10−5 10−4 10−3 10−2 10−1NEG

POS

10−5

10−4

10−3

10−2

10−1

qPCR

HT

S

MRDscale

Figure 3: Comparison between clone ratios de-tected by HTS analyzed by Vidjil and qPCR ratios(circles, 32 samples) and theoretical ratios for thescale (boxes, 4 samples, see Figure 2). HTS ratiosare of numbers of reads, without additional cluster-ing to account for potential sequencing errors. TheNEG-NEG point is large because four samples werenegative in both qPCR and HTS.

6

3.1 Quantification scale

To assess the quality of the quantification, it isessential to understand the limits of the technol-ogy. A diagnostic sample was diluted with onepooled peripheral blood lymphocyte sample fromfive healthy donors, to yield concentrations of 1%,0.1% 0.01% and 0.001% of the initial concentrationat diagnosis, respectively. Two sequences for quan-tification control were also added to each samplefor quality assessment. We added two previouslycategorized diagnosis samples to each sample be-fore PCR, to generate two consistent markers alongthe scale. Consistent marker samples were addedto represent 1% and 0.1% of the total DNA, respec-tively.

As expected, the concentration of the main clonedecreased by about one tenth for each dilution (Fig-ures 2 and 3). At a dilution of 10−2, the main clonehad a concentration of 0.05 compared with that atdiagnosis, at 10−3 it was 0.005, at 10−4 it was 0.001,and at 10−5 it was 0.00002. The exact resolution de-pended on the depth of the sequencing. This depthshould be set depending on the application (fewreads are required to identify major clones at diag-nosis, while more are needed to determine the con-centration of clones of interest during follow-up).

3.2 Reproducibility

Comparison between HTS and qPCR. HTS andqPCR do not measure the same parameters. ForHTS, the ratios reported by Vidjil take the numberof analyzed reads as a reference, which ideally cor-responds to the number of rearranged T or B cells inthe studied system. In contrast, qPCR measures theproportion of total cells, which also includes othermononucleic cells.

We compared the results for qPCR and HTS in32 samples (Figure 3). The results for 22 sampleswere within one log of each other by both tech-niques. HTS detected reads in five samples wherethe standard technique detected nothing, includ-ing two samples from patient 010 discussed above.All these clones were detected below the 10−4 ra-tio. The inclusion of a standard of known concen-tration could be used to calibrate these different ra-tios. Care should be taken to avoid contamination

between samples, e.g., by handling and sequencingdiagnosis and follow-up samples from the same pa-tient separately, as in routine practice.

Reproducibilities of PCR and HTS. We assessedthe reproducibility of PCR and HTS by duplicatingthe initial PCR step (PCR A and PCR B). We alsotook a sample from PCR A and labelled it with twodifferent barcodes. This allowed us to assess whichstep was more reproducible. The results for patient014 are shown in Figure 4. The reproducibility wasgenerally good, except at very low concentrations(< 10−4). The spread of distribution at lower con-centrations comes in part from PCR and HTS bi-ases, but also from biological sampling.

However, in PCR duplicates, clones at a concen-tration greater than 10−2 were detected in one sit-uation, but not at all in another. We examined thesequences in those cases in detail (Figure 2), andcompared them to the closest clone. The resultssuggested that the misidentification may have beenlargely caused by errors in the sequencing or PCRprocess (see Supplementary Material).

4 Discusssion

Faham et al [12] have shown what high-throughputsequencing can bring to monitor the leukemicclones along the time by trying to reproduce themeasurements that were made using qPCR. Ac-cordingly to this article showing that both tech-niques are well correlated, we were only interestedto study the contribution of HTS to patient moni-toring and multi-clonality evolution. Indeed, HTSalso offers a new opportunity: looking at the evolu-tion of all the clones. We have shown that severaldistinct clones can have high concentrations at di-agnosis. With HTS it is now possible to follow theirconcentration. More importantly, as relapse can oc-cur with clones of low abundance at diagnosis, HTScould help identifying such relapse ahead.

Another gret interest in HTS has been empha-sized by Kotrova et al [13]. They have identified apossible new stratification method relying on clonaldiversity, a measure that can only be obtained usingHTS.

7

The purpose of this study is to show what high-throughput sequencing could bring compared toASO-PCR. Other studies have shown what ASO-PCR brings [35, 36]. Comparison of HTS with othermethods should be addressed by future studies.

4.1 Management of sequencing errors.

We have shown that some, albeit rare, cases of se-quencing/PCR errors emphasize the need for high-fidelity PCR enzymes, which would reduce therisk of misleading errors in the quantification ofMRD [37]. Moreover, users should indeed be awareof the type of errors that the sequencer may pro-duce [38].

Sequences harboring PCR or sequencing errorscan be clustered manually or automatically usinginteractive software such as the Vidjil web applica-tion [17]. However in some cases, it is difficult todetermine if differences are the result of sequencingor PCR errors, recombinations, or further somaticmutations. For example, see Supplementary Mate-rial.

4.2 Conclusion

HTS provides new opportunities, both for MRDmonitoring and for furthering our knowledge oflymphoid pathologies. Coupled with interactivebioinformatics analysis, HTS gives a more completeinsight into the blastic population at diagnosis, andallows observation of the evolution of this popula-tion, as well as emerging new ones. The develop-ment of standardized protocols with multi-systemPCR and calibration standards could help in devel-oping these techniques.

References

[1] Jacques JM van Dongen, Vincent HJ van derVelden, Monika Bruggemann, and Alberto Or-fao. Minimal residual disease (MRD) diagnos-tics in acute lymphoblastic leukemia (ALL):need for sensitive, fast and standardized tech-nologies. Blood, 125(26):3996–4009, 2015.

[2] Monika Bruggemann, Thorsten Raff, andMichael Kneba. Has MRD monitoring super-seded other prognostic factors in adult ALL?Blood, 120(23):4470–4481, 2012.

[3] J. J. M. van Dongen, T. Szczepanski, and H. J.Adriaansen. Immunobiology of leukemia. InHenderson ES, Lister TA, and Greaves M, ed-itors, Leukemia, pages 85–130. WB Saunders,Philadelphia, 7th edition, 2002.

[4] J. J. M. van Dongen, A. W. Langerak,M. Bruggemann, P. A. S. Evans, M. Hummel,F. L. Lavender, E. Delabesse, F. Davi, E. Schu-uring, R. Garcıa-Sanz, J. H. J. M. van Krieken,J. Droese, D. Gonzalez, C. Bastard, H. E.White, M. Spaargaren, M. Gonzalez, A. Par-reira, J. L. Smith, G. J. Morgan, M. Kneba,and E. A. Macintyre. Design and standard-ization of PCR primers and protocols for de-tection of clonal immunoglobulin and T-cellreceptor gene recombinations in suspect lym-phoproliferations: report of the BIOMED-2concerted action BMH4-CT98-3936. Leukemia,17(12):2257–317, 2003.

[5] T. Kalina, J. Flores-Montero, V. H. J. van derVelden, M. Martin-Ayuso, S. Bottcher, M. Rit-gen, J. Almeida, L. Lhermitte, V. Asnafi,A. Mendonca, R. de Tute, M. Cullen, L. Sedek,M. B. Vidriales, J. J. Perez, J. G. te Marvelde,E. Mejstrikova, O. Hrusak, T. Szczepanski,J. J. M. van Dongen, and A. Orfao. Euroflowstandardization of flow cytometer instrumentsettings and immunophenotyping protocols.Leukemia, 26(9):1986–2010, 2012.

[6] Helene Cave, Jutte van der Werff Ten Bosch,Stefan Suciu, Christine Guidal, Christine Wa-terkeyn, Jacques Otten, Marleen Bakkus, KrisThielemans, Bernard Grandchamp, EtienneVilmer, Brigitte Nelken, Martine Fournier,Patrick Boutard, Emmanuel Lebrun, FrancoiseMechinaud, Richard Garand, Alain Robert,Nicole Dastugue, Emmanuel Plouvier, Eve-lyne Racadot, Alice Ferster, Jan Gyselinck,Odile Fenneteau, Michel Duval, Gabriel Solbu,and Anne-Marie Manel. Clinical significanceof minimal residual disease in childhood acute

8

lymphoblastic leukemia. New England Journalof Medicine, 339(9):591–598, 1998.

[7] Virginie Gandemer, Cecile Pochon, Em-manuel Oger, Jean-Hugues H Dalle, GerardMichel, Claudine Schmitt, Eva Berranger,Claire Galambrun, Helene Cave, Jean-MichelCayuela, Nathalie Grardel, Elizabeth Mac-intyre, Genevieve Margueritte, FrancoiseMechinaud, Pierre Rorhlich, Patrick Lutz,Francois Demeocq, Pascale Schneider, Do-minique Plantaz, Marilyne Poiree, and PierreBordigoni. Clinical value of pre-transplantminimal residual disease in childhood lym-phoblastic leukaemia: the results of the Frenchminimal residual disease-guided protocol.British Journal of Haematology, 165(3):392–401,2014.

[8] Susumu Tonegawa. Somatic generation of an-tibody diversity. Nature, 302(5909):575–581,1983.

[9] Gunter Kerst, Hermann Kreyenberg, Car-men Roth, Catrin Well, Klaus Dietz, ElaineCoustan-Smith, Dario Campana, Ewa Kosciel-niak, Charlotte Niemeyer, Paul G. Schlegel,Ingo Muller, Dietrich Niethammer, and PeterBader. Concurrent detection of minimal resid-ual disease (MRD) in childhood acute lym-phoblastic leukaemia by flow cytometry andreal-time PCR. British Journal of Haematology,128(6):774–782, 2005.

[10] William H. Robinson. Sequencing the func-tional antibody repertoire – diagnostic andtherapeutic discovery. Nature Reviews Rheuma-tology, 11(3):171–182, 2015.

[11] Aaron C. Logan, Hong Gao, Chunlin Wang,Bita Sahaf, Carol D. Jones, Eleanor L Marshall,Ismael Buno, Randall Armstrong, Andrew Z.Fire, Kenneth I Weinberg, Michael Mindrinos,James L. Zehnder, Scott D. Boyd, WenzhongXiao, Ronald W. Davis, and David B. Miklos.High-throughput VDJ sequencing for quantifi-cation of minimal residual disease in chroniclymphocytic leukemia and immune reconsti-tution assessment. Proceedings of the National

Academy of Sciences of the United States of Amer-ica, 108(52):21194–21199, 2011.

[12] Malek Faham, Jianbiao Zheng, Martin Moor-head, Victoria E H Carlton, Patricia Stow,Elaine Coustan-Smith, Ching-Hon Pui, andDario Campana. Deep-sequencing approachfor minimal residual disease detection in acutelymphoblastic leukemia. Blood, 120(26):5173–5180, 2012.

[13] Michaela Kotrova, Katerina Muzikova, Es-ter Mejstrikova, Michaela Novakova, Vio-leta Bakardjieva-Mihaylova, Karel Fiser, JanStuchly, Mathieu Giraud, Mikael Salson,Christiane Pott, Monika Bruggemann, MarcFullgrabe, Jan Stary, Jan Trka, and EvaFronkova. The predictive strength of next gen-eration sequencing MRD detection for relapsecompared with current methods in childhoodALL. Blood, 126(8):1045–1047, 2015.

[14] Olga Sala Torra, Megan Othus, David WWilliamson, Brent L. Wood, Ilan Kirsch, Har-lan Robins, Margaret O’Donnell, Stephen J.Forman, Frederick R. Appelbaum, and Jer-ald P Radich. Minimal Residual Disease detec-tion by Next Generation Sequencing in adultB-cell Acute Lymphoblastic Leukemia (ALL)patients treated on SWOG trial S0333. Blood,124(21):2399–2399, 2014.

[15] Aaron C. Logan, Nikita Vashi, Malek Fa-ham, Victoria Carlton, Katherine Kong, Is-mael Buno, Jianbiao Zheng, Martin Moor-head, Mark Klinger, Bing Zhang, AmnaWaqar, James L. Zehnder, and David B. Mik-los. Immunoglobulin and T cell receptor genehigh-throughput sequencing quantifies mini-mal residual disease in acute lymphoblasticleukemia and predicts post-transplantation re-lapse and survival. Biology of Blood and MarrowTransplantation, 20(9):1307–1313, 2014.

[16] Michael A. Pulsipher, Chris Carlson, BryanLangholz, Donna A. Wall, Kirk R. Schultz,Nancy Bunin, Ilan Kirsch, Julie M. Gastier-Foster, Michael Borowitz, Cindy Desmarais,David Williamson, Michael Kalos, and

9

Stephan A. Grupp. IgH-V(D)J NGS-MRD measurement pre- and early post-allotransplant defines very low- and veryhigh-risk ALL patients. Blood, 125(22):3501–3508, 2015.

[17] Yann Ferret, Aurelie Caillault, SheherazadeSebda, Marc Duez, Nathalie Grardel, Nico-las Duployez, Celine Villenet, Martin Figeac,Claude Preudhomme, Mikael Salson, andMathieu Giraud. Multi-loci diagnosis ofacute lymphoblastic leukemia with high-throughput sequencing and bioinformaticsanalysis. British Journal of Haematology,173(3):413–420, 2016.

[18] A.W. Langerak, M. Bruggemann, N. Darzen-tas, M. Catherwood, G. Cazzaniga, F. Davi,J.J.M van Dongen, P.A.S. Evans, R. Gar-cia Sanz, V. Giudicelli, D. Gonzalez, P.J.T.A.Groenen, M. Hummel, M.-P. Lefranc, E.A.Macintyre, C. Pott, and K. Stamatopoulos.Technical develoments: general principles andavailable technologies. In Second ESLHO Sym-posium, pages 11–19, 2013.

[19] Harlan Robins. Immunosequencing: applica-tions of immune repertoire deep sequencing.Current Opinion in Immunology, 25, 2013.

[20] George Georgiou, Gregory C Ippolito, JohnBeausang, Christian E Busse, Hedda Warde-mann, and Stephen R Quake. The promiseand challenge of high-throughput sequencingof the antibody repertoire. Nature Biotechnol-ogy, 32:158–168, 2014.

[21] Jennifer Benichou, Rotem Ben-Hamo, YoramLouzoun, and Sol Efroni. Rep-Seq: uncov-ering the immunological repertoire throughnext-generation sequencing. Immunology,135(3):183–91, 2012.

[22] Mehdi Yousfi Monod, Veronique Giudicelli,Denys Chaume, and Marie-Paule Lefranc.IMGT/JunctionAnalysis: the first tool for theanalysis of the immunoglobulin and T cell re-ceptor complex V-J and V-D-J JUNCTIONs.Bioinformatics, 20(S1):i379–85, 2004.

[23] Xavier Brochet, Marie-Paule Lefranc, andVeronique Giudicelli. IMGT/V-QUEST: thehighly customized and integrated system forIG and TR standardized V-J and V-D-J se-quence analysis. Nucleic Acids Research,36(S2):W503–W508, 2008.

[24] Marie-Paule Lefranc. IMGT, the InternationalImMunoGeneTics Information System. ColdSpring Harbor Protocols, 2011(6):pdb.top115,2011.

[25] Eltaf Alamyar, Veronique Giudicelli, ShuoLi, Patrice Duroux, and Marie-Paule Lefranc.IMGT/HighV-QUEST: the IMGT R© web por-tal for immunoglobulin (IG) or antibody andt cell receptor (TR) analysis from NGS highthroughput and deep sequencing. ImmunomeResearch, 8(1), 2012.

[26] Ramy Arnaout, William Lee, Patrick Cahill,Tracey Honan, Todd Sparrow, MichaelWeiand, Chad Nusbaum, Klaus Rajewsky,and Sergei B. Koralov. High-resolution de-scription of antibody heavy-chain repertoiresin humans. PLoS ONE, 6(8):e22365, 2011.

[27] Jian Ye, Ning Ma, Thomas L. Madden, andJames M. Ostell. IgBLAST: an immunoglob-ulin variable domain sequence analysis tool.Nucleic Acids Research, 41:W34–W40, 2013.

[28] Niclas Thomas, James Heather, Wilfred Ndi-fon, John Shawe-Taylor, and Benjamin Chain.Decombinator: a tool for fast, efficient gene as-signment in T-cell receptor sequences using afinite state machine. Bioinformatics, 29(5):542–550, 2013.

[29] Dmitriy A Bolotin, Mikhail Shugay, Ilgar ZMamedov, Maria A Turchaninova EkaterinaV Putintseva, Ivan V Zvyagin, Olga V Bri-tanova, and Dmitriy M Chudakov. MiTCR:software for T-cell receptor sequencing dataanalysis. Nature Methods, 10:813–814, 2013.

[30] Mathieu Giraud, Mikael Salson, Marc Duez,Celine Villenet, Sabine Quief, Aurelie Cail-lault, Nathalie Grardel, Christophe Roumier,Claude Preudhomme, and Martin Figeac. Fast

10

multiclonal clusterization of V(D)J recombina-tions from high-throughput sequencing. BMCGenomics, 15(1):409, 2014.

[31] Xi Yang, Di Liu, Na Lv, Fangqing Zhao, FeiLiu, Jing Zou, Yan Chen, Xue Xiao, Jun Wu,Peipei Liu, Jing Gao, Yongfei Hu, Yi Shi, JunLiu, Ruifen Zhang, Chen Chen, Juncai Ma,George F Gao, and Baoli Zhu. TCRklass: Anew k-string-based algorithm for human andmouse TCR repertoire characterization. Jour-nal of Immunology, 194(1), 2014.

[32] Dmitriy A. Bolotin, Stanislav Poslavsky, IgorMitrophanov, Mikhail Shugay, Ilgar Z. Mame-dov, Ekaterina V. Putintseva, and Dmitriy M.Chudakov. MiXCR: software for compre-hensive adaptive immunity profiling. NatureMethods, 12(5):380–381, 2015.

[33] Leon Kuchenbecker, Mikalai Nienen, JochenHecht, Avidan U. Neumann, Nina Babel, KnutReinert, and Peter N. Robinson. IMSEQ– a fast and error aware approach to im-munogenetic sequence analysis. Bioinformat-ics, 31(18):btv309, 2015.

[34] Marc Duez, Mathieu Giraud, Ryan Herbert,Tatiana Rocher, Mikael Salson, and FlorianThonier. Vidjil: A web platform for analysis ofhigh-throughput repertoire sequencing. PLOSOne, 11(11):e0166126, 2016.

[35] Kheira Beldjord, Sylvie Chevret, Vahid As-nafi, Francoise Huguet, Marie-Laure Boul-land, Thibaut Leguay, Xavier Thomas, Jean-Michel Cayuela, Nathalie Grardel, Yves Cha-landon, et al. Oncogenetics and minimal resid-ual disease are independent outcome predic-tors in adult patients with acute lymphoblasticleukemia. Blood, 123(24):3739–3749, 2014.

[36] R Garand, K Beldjord, H Cave, C Fossat,I Arnoux, V Asnafi, Y Bertrand, ML Boul-land, C Brouzes, E Clappier, et al. Flowcytometry and IG/TCR quantitative PCR forminimal residual disease quantitation in acutelymphoblastic leukemia: a French multicenterprospective study on behalf of the FRALLE,EORTC and GRAALL. Leukemia, 27(2):370–376, 2013.

[37] Michael A. Quail, Thomas D. Otto, Yong Gu,Simon R. Harris, Thomas F. Skelly, Jacque-line A. McQuillan, Harold P. Swerdlow, andSamuel O. Oyola. Optimal enzymes for am-plifying sequencing libraries. Nature Methods,9(1):10–11, January 2012.

[38] Michael G. Ross, Carsten Russ, MauraCostello, Andrew Hollinger, Niall J. Lennon,Ryan Hegarty, Chad Nusbaum, and David B.Jaffe. Characterizing and measuring bias in se-quence data. Genome Biology, 14(5):R51, 2013.

Acknowledgements

This work was supported by SIRIC ONCOLille[grant number INCa-DGOS-Inserm 6041]. Wethank the anonymous reviewers for their construc-tive comments on this manuscript.

Author contribution

MS, MG, AC and NG were equal contributors. AC,CP and NG selected the patients. MF, SS, SQ, CVand AC designed the sequencing protocol. MD,MG, MS, RH and TR conceived the Vidjil software.MS, MG, AC, ND, NG, MF, SS and YF analyzed anddiscussed the data. MS, MG, ND and YF draftedthe paper. All authors corrected the paper and ap-proved the final manuscript.

11

10−6 10−5 10−4 10−3 10−2 10−1 100

10−6

10−5

10−4

10−3

10−2

10−1

100

0

0

78%

80%

18%

20 %

Comparing PCR A and PCR B

1 10 100 1,000 10,000

10−6 10−5 10−4 10−3 10−2 10−1 100

10−6

10−5

10−4

10−3

10−2

10−1

100

0

0

87%

91%

9%

8%

Comparing two barcodes in PCR A

1 10 100 1,000 10,000

10−6 10−5 10−4 10−3 10−2 10−1 100

10−6

10−5

10−4

10−3

10−2

10−1

100

0

0

27%

36%

60%

65 %

Comparing PCR A and PCR B

1 10 100 1,000 10,000

10−6 10−5 10−4 10−3 10−2 10−1 100

10−6

10−5

10−4

10−3

10−2

10−1

100

0

0

83%

87%

12%

12%

Comparing two barcodes in PCR A

1 10 100 1,000 10,000

10−6 10−5 10−4 10−3 10−2 10−1 100

10−6

10−5

10−4

10−3

10−2

10−1

100

0

0

44%

48%

50%

50 %

Comparing PCR A and PCR B

1 10 100 1,000 10,000

10−6 10−5 10−4 10−3 10−2 10−1 100

10−6

10−5

10−4

10−3

10−2

10−1

100

0

0

86%

90%

10%

8%

Comparing two barcodes in PCR A

1 10 100 1,000 10,000

Figure 4: Reproducibility of PCR (left column) and sequencing (right column) in patient 014 at diagnosis(top), and at first (middle) and second follow-up points (bottom). Each point represents a set of clonesidentified by bioinformatics processing, before manual clustering. Ratios are numbers of reads for bothexperiments. The size of the point indicates the number of clones sharing the same value on both axes.

Supplementary Material

Raw data on the patient 063

The raw sequence files (.fastq.gz) for the patient063 (TRG+IGH, diagnosis + follow-ups at daysD48, D90, D263 and D308) can be accessed athttp://www.vidjil.org/data/#2016-lr.

Results on the seven other patients

The interactive visualization of the clonalpopulations of the seven patients not de-scribed in the main text can be accessed athttp://www.vidjil.org/data/#2016-lr. In thefollowing we denote by “main clone” the cloneidentified at diagnosis by conventional techniqueand then followed by ASO-PCR.

Patient 009 (TRG). The major clone is stable onthe two follow-up points, above 1%.

Patient 011 (TRG). Two clones have similar evo-lution. Their relative concentration do not evolvemuch from first follow-up (below 10−3).

Patient 012 (IGH). The main clone is below 10−4

in HTS (neg in qPCR) for the two follow-ups but itis still present.

Patient 014 (IGH). The main clone is above 10−4

at first follow-up (but positive below 10−4 withqPCR).

Patient 099 (TRG). Two different clones repre-sent about 90% of the reads at diagnosis. They bothdecrease below 1% at D107.

Patient 100 (TRG+IGH). The major clone in IGHdecreases below 10−3 at D42, and another majorclone at diagnosis (25%) completely disappears atD42. On TRG, all major clones identified at diagno-sis disappear at the follow-up.

Patient 101 (IGH). The two major clones repre-sent about 80% of the reads. Their concentrationboth drop below 10−4 at D44.

Note that the concentrations were not evaluatedby qPCR for the three last patients as the Sangersequencing failed in those cases.

Pitfalls in differentiating between technical bias and bio-logical mutations

The four clones shown in Supplementary Mate-rial, Fig 3, all supported by more than 30,000reads, only differed at two sites with homopoly-meric mutations G / GGGG. However, it was notclear if these represented sequencing errors or bi-ologically distinct clones. Given that Ion Tor-rent sequencers are prone to homopolymeric er-rors, and that the four curves showed parallel evo-lution (Supplementary Material, Fig 1), suggestedthat these were probably sequencing errors in spe-cific sequences, which are a relatively common er-ror with sequencers. However the follow-up pointswere sequenced in two independent runs severalmonths apart. Moreover if these were sequencingerrors, we would also expect to observe stretches oftwo and three Gs. It should also be noted that delet-ing three Gs, as observed in the sequences, did notinduce a frame shift. For MRD follow-up, it maybe interesting to follow the evolution of both theindividual clones and the four sequences as a fam-ily. Based on the reproducibilities of PCR and HTS,we therefore assume that these differences werecaused by PCR errors. However, it is not possi-ble to provide a definite answer without replicatingthe PCR. Further sequencing of the sample usinga high-fidelity enzyme confirmed that these wereartefacts. The Vidjil web application allows follow-up and sample replicates to be analyzed. If thefour sequences were clustered manually, the Vid-jil web application would allow both evolutions tobe plotted. It is important to note that, in practice,clustering such sequences in a single clone does notchange the result by more than one log.

13

Patient System Samples009 TRG D0, D115, D154010 TRG D0, D81, D413, D445, D475011 TRG D0, D61, D92, D119012 IGH D0, D37, D100013 IGH D0, D60, D120, D246, D274014 IGH D0, D38, D93063 TRG IGH D0, D48, D90, D263, D308064 TRG∗ IGH D0, D47, D403, D615, D711,

D741, D794, D837099 TRG D0, D107100 TRG IGH D0, D42101 IGH D0, D44scale IGH D0 + 4 dilutions

Table 1: All patients were sequenced at diagnosis(D0) and at some follow-up time points. The daysindicated for follow-up samples are post diagnosis.∗Note that TRG in patient 064 was sequenced at fivepoints (D0, D47, D615, D711, D837) with the previ-ous protocol, reported in [30].

2010-06-04 +47 +615 +711 +837+794+403 +741100%

10%

1%

0.1%

0.01%

0.001%

0.0001%

IGHV6-1*01 -27/6/-3 IGHD4-11*01 -3/3/-4 IGHJ4*02

IGHV6-1*01 -27/3/-3 IGHD4-11*01 -3/3/-4 IGHJ4*02

IGHV6-1*01 -11/19/-0 IGHD1-26*01 -0/3/-6 IGHJ6*03

IGHV6-1*01 -27/3/-3 IGHD4-11*01 -3/3/-4 IGHJ4*02Leu (64) (multi)

Figure 1: Evolution of the four first IGHV6-1*01/IGHD4-11*01/IGHJ4*02 clones (raw data, be-fore further clustering, sequences on Figure 3) forpatient 064. Compare to Figure 1, bottom right.

14

...VVVVVVNNNNJJJJJJ...clone 10776 5.2% TATCAGCAGCCTAAAGGCTGAGGACACCCGACAGGGTATGGACGTCTGGGGCCAAGGGACpcr_bias1 4102 2.0% TATCAGCAGCCTAAAGGCTGAGGACACCCGACAGGGTATGGACGTCTG---CCAAGGGACCCTpcr_bias2 3116 1.5% ATATCAGCAGCCTAAAGGCTGAGGACACCCGACAGG-TATGGACGTCTG---CCAAGGGACCCT

Figure 2: Main biological clone (IGHV7-40*03 -22/ACAG/-17 IGHJ6*02) and two PCR artefacts seen inonly one PCR replicate (PCR A) in patient 014. No read with the sequence for pcr bias1 and pcr bias2was detected in PCR B.

>IGHV6-1*01 ...GTGACTCCCGAGGACACGG-CTgtgtattactgtgcaagaga>IGHD4-11*01 tgaCTACAGTAACtac>IGHJ4*02 actaCTTTGACTAC-TGGGGCCAGGGAA...

VVVVVVVVVVVVVVVVVVVV==DDDDDDDDNNNJJJJJJJJJJJJJJJJJJJJJJJJclone-001 191798 13.9% GTGACTCCCGAGGAGTGGGGCTACAGTAACCCCCTTTGACTAC-TGGGGCCAGGGAAclone-003 150615 10.9% GTGACTCCCGAGGAGTG---CTACAGTAACCCCCTTTGACTAC-TG---CCAGGGAAclone-004 137576 9.98% GTGACTCCCGAGGAGTGGGGCTACAGTAACCCCCTTTGACTAC-TG---CCAGGGAAclone-006 33350 2.42% GTGACTCCCGAGGAGTG---CTACAGTAACCCCCTTTGACTAC-TGGGGCCAGGGAA

Figure 3: Representative sequences of the largest clones identified as IGHV6-1*01/IGHD4-11*01/IGHJ4*02 in patient 064, aligned with IMGT/GENE-DB relevant germline genes. For each clone,the number of reads and the proportion among all analyzed reads are given. Together, these clonesinclude 571,221 reads, that is 47.6% of reads analyzed as IGH (and 36.7% of all analyzed reads). Twopositions, marked by =, can be assigned to either the V or D segment. The first four clones show all fourpossible combinations of the mutation G / GGGG at two different positions: one at the end of the Vsegment, and another in the middle of the J segment. Further sequencing after PCR using a high-fidelityenzyme showed that clone-001 was the only biological clone, and the others were the result of PCRartefacts.

TRG 1-10 Vgf1 + 5’ GGA AGG CCC CAC AGC RTC TT 3’

(BIOMED-2) Vg10 + 5’ AGC ATG GGT AAG ACA AGC AA 3’

J1J2 − 5’ GTG TTG TTC CAC TGC CAA AGA G 3’

JP1/2 − 5’ TTA CCA GGC GAA GTT ACT ATG AGC 3’

TRG 9 Vg9 + 5’ CGG CAC TGT CAG AAA GGA ATC 3’

(BIOMED-2) J1J2 − 5’ GTG TTG TTC CAC TGC CAA AGA G 3’

JP1/2 − 5’ TTA CCA GGC GAA GTT ACT ATG AGC 3’

IgH FR1 + 5’ AGG TGC AGC TGS WGS AGT CDG G 3’

JHDN − 5’ ACC TGA GGA GAC GGT GAC CAG GGT 3’

Table 2: Forward (+) and reverse (−) primers used for TRG and IGH PCR.

15