second generation noninvasive fetal genome …second generation noninvasive fetal genome analysis...

10
Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends K. C. Allen Chan a,b,1 , Peiyong Jiang a,b,1 , Kun Sun a,b,1 , Yvonne K. Y. Cheng c , Yu K. Tong a,b , Suk Hang Cheng a,b , Ada I. C. Wong a,b , Irena Hudecova a,b , Tak Y. Leung c , Rossa W. K. Chiu a,b,2 , and Yuk Ming Dennis Lo a,b,2 a Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong; b Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong; and c Department of Obstetrics and Gynaecology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong Contributed by Yuk Ming Dennis Lo, October 4, 2016 (sent for review August 24, 2016; reviewed by Mary E. Norton and Erik A. Sistermans) Plasma DNA obtained from a pregnant woman was sequenced to a depth of 270× haploid genome coverage. Comparing the maternal plasma DNA sequencing data with the parental genomic DNA data and using a series of bioinformatics filters, fetal de novo mutations were detected at a sensitivity of 85% and a positive predictive value of 74%. These results represent a 169-fold improvement in the pos- itive predictive value over previous attempts. Improvements in the interpretation of the sequence information of every base position in the genome allowed us to interrogate the maternal inheritance of the fetus for 618,271 of 656,676 (94.2%) heterozygous SNPs within the maternal genome. The fetal genotype at each of these sites was deduced individually, unlike previously, where the inheritance was determined for a collection of sites within a haplotype. These results represent a 90-fold enhancement in the resolution in determining the fetuss maternal inheritance. Selected genomic locations were more likely to be found at the ends of plasma DNA molecules. We found that a subset of such preferred ends exhibited selectivity for fetal- or maternal-derived DNA in maternal plasma. The ratio of the number of maternal plasma DNA molecules with fetal preferred ends to those with maternal preferred ends showed a correlation with the fetal DNA fraction. Finally, this second generation approach for noninvasive fetal whole-genome analysis was validated in a pregnancy diagnosed with cardiofaciocutaneous syndrome with maternal plasma DNA sequenced to 195× coverage. The causative de novo BRAF mutation was success- fully detected through the maternal plasma DNA analysis. noninvasive prenatal testing | massively parallel sequencing | DNA fragmentation patterns T he discovery of cell-free fetal DNA in maternal plasma has enabled the development of noninvasive prenatal testing (NIPT) (1). Over the last few years, NIPT has been implemented globally for the noninvasive prenatal investigation of fetal chro- mosomal aneuploidies (25). With higher depth of sequencing and improved bioinformatics analyses, NIPT has now been extended to the detection of a variety of subchromosomal aberrations (6, 7). Further expanding the applications of NIPT, we showed in 2010 that it was possible to deduce the fetal genome by deep sequencing of maternal plasma (8). Work by Fan et al. (9) and Kitzman et al. (10) confirmed these results. In these previous efforts, the depths of maternal plasma DNA sequencing ranged from 52.7× to 78× haploid human genome coverages (810). There are a number of limitations in these previous studies. For example, Kitzman et al. (10) explored the possibility of detecting fetal de novo mutations on a genome-wide level from the maternal plasma DNA sequencing data. In one variation of bioinformatics analysis, they found 2.5 × 10 7 candidate fetal de novo mutation sites in the plasma DNA sequencing data. Only 39 of these were true fetal de novo mutations. Because the studied fetus had a total of 44 de novo mutations, the positive predictive value (PPV) was 0.000156%, and the sensitivity was 88.6%. With additional refinement in bioinformatics analysis, Kitzman et al. (10) improved the PPV to 0.438%, although the sensitivity was reduced to 38.6%. These data, thus, indicate the enormous challenge of detecting fetal de novo mutations on a genome-wide scale using NIPT. In par- ticular, dramatic improvement in the PPV would be needed for such an approach to be clinically practical. A second area that needs improvement concerns the detection of sequences that the fetus has inherited from its mother. Previous efforts in elucidating the maternal inheritance of the fetus on a genome-wide scale have generally used a haplotype-based strat- egy, which has been referred to as the relative haplotype dosage (RHDO) approach (810). Hence, for a pregnant woman who has two haplotypes in a particular chromosomal region, she would pass one of these onto her fetus. Because her plasma contains a Significance We explored the limit of noninvasive prenatal testing by per- forming genome-wide sequencing of maternal plasma DNA at 195× and 270× haploid genome coverages. Combined with the use of a series of bioinformatics filters, fetal de novo mutations could be detected with a positive predictive value that was two orders of magnitude higher than previously reported. A de novo BRAF mutation was noninvasively detected in a case with car- diofaciocutaneous syndrome. The maternal inheritance of the fetus could be ascertained on a genome-wide level without the use of maternal haplotypes, hence greatly increasing the reso- lution of such analysis. Finally, we showed that certain genomic locations were overrepresented at the ends of plasma DNA fragments with fetal or maternal selectivity. Author contributions: K.C.A.C., R.W.K.C., and Y.M.D.L. designed research; K.C.A.C., P.J., K.S., Y.K.Y.C., Y.K.T., S.H.C., A.I.C.W., I.H., and T.Y.L. performed research; K.C.A.C., P.J., K.S., R.W.K.C., and Y.M.D.L. analyzed data; Y.K.Y.C. and T.Y.L. recruited subjects and ana- lyzed clinical data; and K.C.A.C., P.J., R.W.K.C., and Y.M.D.L. wrote the paper. Reviewers: M.E.N., University of California, San Francisco; and E.A.S., VU University Medical Center. Conflict of interest statement: R.W.K.C. and Y.M.D.L. received research support from Sequenom, Inc. R.W.K.C. and Y.M.D.L. were consultants to Sequenom, Inc. K.C.A.C., R.W.K.C., and Y.M.D.L. hold equities in Sequenom, Inc. K.C.A.C., R.W.K.C., and Y.M.D.L. are founders of Xcelom and Cirina. K.C.A.C., P.J., and R.W.K.C. are consultants to Xcelom. P.J. is a consultant to Cirina. K.C.A.C., P.J., R.W.K.C., and Y.M.D.L. have filed patent applications (PCT/CN2016/073753 and PCT/CN2016/091531) based on the data generated from this work, which have been licensed to Cirina. Freely available online through the PNAS open access option. Data deposition: The sequence data for the subjects studied in this work who had con- sented to data archiving have been deposited in the European Genome-Phenome Archive (EGA), https://www.ebi.ac.uk/ega/, hosted by the European Bioinformatics Institute (EBI; accession no. EGAS00001001882). See Commentary on page 14173. 1 K.C.A.C., P.J., and K.S. contributed equally to this work. 2 To whom correspondence may be addressed. Email: [email protected] or loym@ cuhk.edu.hk. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1615800113/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1615800113 PNAS | Published online October 31, 2016 | E8159E8168 MEDICAL SCIENCES PNAS PLUS SEE COMMENTARY Downloaded by guest on June 14, 2020

Upload: others

Post on 08-Jun-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Second generation noninvasive fetal genome …Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

Second generation noninvasive fetal genome analysisreveals de novo mutations, single-base parentalinheritance, and preferred DNA endsK. C. Allen Chana,b,1, Peiyong Jianga,b,1, Kun Suna,b,1, Yvonne K. Y. Chengc, Yu K. Tonga,b, Suk Hang Chenga,b,Ada I. C. Wonga,b, Irena Hudecovaa,b, Tak Y. Leungc, Rossa W. K. Chiua,b,2, and Yuk Ming Dennis Loa,b,2

aLi Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong; bDepartment of Chemical Pathology,The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong; and cDepartment of Obstetrics and Gynaecology, TheChinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong

Contributed by Yuk Ming Dennis Lo, October 4, 2016 (sent for review August 24, 2016; reviewed by Mary E. Norton and Erik A. Sistermans)

Plasma DNA obtained from a pregnant woman was sequenced to adepth of 270× haploid genome coverage. Comparing the maternalplasma DNA sequencing data with the parental genomic DNA dataand using a series of bioinformatics filters, fetal de novo mutationswere detected at a sensitivity of 85% and a positive predictive valueof 74%. These results represent a 169-fold improvement in the pos-itive predictive value over previous attempts. Improvements in theinterpretation of the sequence information of every base positionin the genome allowed us to interrogate the maternal inheritanceof the fetus for 618,271 of 656,676 (94.2%) heterozygous SNPs withinthe maternal genome. The fetal genotype at each of these sites wasdeduced individually, unlike previously, where the inheritance wasdetermined for a collection of sites within a haplotype. These resultsrepresent a 90-fold enhancement in the resolution in determining thefetus’s maternal inheritance. Selected genomic locations were morelikely to be found at the ends of plasma DNA molecules. We foundthat a subset of such preferred ends exhibited selectivity for fetal- ormaternal-derived DNA in maternal plasma. The ratio of the number ofmaternal plasma DNA molecules with fetal preferred ends to thosewith maternal preferred ends showed a correlationwith the fetal DNAfraction. Finally, this second generation approach for noninvasive fetalwhole-genome analysis was validated in a pregnancy diagnosed withcardiofaciocutaneous syndromewithmaternal plasmaDNA sequencedto 195× coverage. The causative de novo BRAFmutation was success-fully detected through the maternal plasma DNA analysis.

noninvasive prenatal testing | massively parallel sequencing |DNA fragmentation patterns

The discovery of cell-free fetal DNA in maternal plasma hasenabled the development of noninvasive prenatal testing

(NIPT) (1). Over the last few years, NIPT has been implementedglobally for the noninvasive prenatal investigation of fetal chro-mosomal aneuploidies (2–5). With higher depth of sequencing andimproved bioinformatics analyses, NIPT has now been extended tothe detection of a variety of subchromosomal aberrations (6, 7).Further expanding the applications of NIPT, we showed in 2010that it was possible to deduce the fetal genome by deep sequencingof maternal plasma (8). Work by Fan et al. (9) and Kitzman et al.(10) confirmed these results. In these previous efforts, the depthsof maternal plasma DNA sequencing ranged from 52.7× to 78×haploid human genome coverages (8–10).There are a number of limitations in these previous studies. For

example, Kitzman et al. (10) explored the possibility of detectingfetal de novo mutations on a genome-wide level from the maternalplasma DNA sequencing data. In one variation of bioinformaticsanalysis, they found 2.5 × 107 candidate fetal de novo mutationsites in the plasma DNA sequencing data. Only 39 of these weretrue fetal de novo mutations. Because the studied fetus had a totalof 44 de novo mutations, the positive predictive value (PPV)was 0.000156%, and the sensitivity was 88.6%. With additionalrefinement in bioinformatics analysis, Kitzman et al. (10) improved

the PPV to 0.438%, although the sensitivity was reduced to 38.6%.These data, thus, indicate the enormous challenge of detecting fetalde novo mutations on a genome-wide scale using NIPT. In par-ticular, dramatic improvement in the PPV would be needed forsuch an approach to be clinically practical.A second area that needs improvement concerns the detection

of sequences that the fetus has inherited from its mother. Previousefforts in elucidating the maternal inheritance of the fetus on agenome-wide scale have generally used a haplotype-based strat-egy, which has been referred to as the relative haplotype dosage(RHDO) approach (8–10). Hence, for a pregnant woman who hastwo haplotypes in a particular chromosomal region, she wouldpass one of these onto her fetus. Because her plasma contains a

Significance

We explored the limit of noninvasive prenatal testing by per-forming genome-wide sequencing of maternal plasma DNA at195× and 270× haploid genome coverages. Combined with theuse of a series of bioinformatics filters, fetal de novo mutationscould be detected with a positive predictive value that was twoorders of magnitude higher than previously reported. A de novoBRAF mutation was noninvasively detected in a case with car-diofaciocutaneous syndrome. The maternal inheritance of thefetus could be ascertained on a genome-wide level without theuse of maternal haplotypes, hence greatly increasing the reso-lution of such analysis. Finally, we showed that certain genomiclocations were overrepresented at the ends of plasma DNAfragments with fetal or maternal selectivity.

Author contributions: K.C.A.C., R.W.K.C., and Y.M.D.L. designed research; K.C.A.C., P.J., K.S.,Y.K.Y.C., Y.K.T., S.H.C., A.I.C.W., I.H., and T.Y.L. performed research; K.C.A.C., P.J., K.S.,R.W.K.C., and Y.M.D.L. analyzed data; Y.K.Y.C. and T.Y.L. recruited subjects and ana-lyzed clinical data; and K.C.A.C., P.J., R.W.K.C., and Y.M.D.L. wrote the paper.

Reviewers: M.E.N., University of California, San Francisco; and E.A.S., VU UniversityMedical Center.

Conflict of interest statement: R.W.K.C. and Y.M.D.L. received research support fromSequenom, Inc. R.W.K.C. and Y.M.D.L. were consultants to Sequenom, Inc. K.C.A.C., R.W.K.C.,and Y.M.D.L. hold equities in Sequenom, Inc. K.C.A.C., R.W.K.C., and Y.M.D.L. are foundersof Xcelom and Cirina. K.C.A.C., P.J., and R.W.K.C. are consultants to Xcelom. P.J. is aconsultant to Cirina. K.C.A.C., P.J., R.W.K.C., and Y.M.D.L. have filed patent applications(PCT/CN2016/073753 and PCT/CN2016/091531) based on the data generated from thiswork, which have been licensed to Cirina.

Freely available online through the PNAS open access option.

Data deposition: The sequence data for the subjects studied in this work who had con-sented to data archiving have been deposited in the European Genome-Phenome Archive(EGA), https://www.ebi.ac.uk/ega/, hosted by the European Bioinformatics Institute (EBI;accession no. EGAS00001001882).

See Commentary on page 14173.1K.C.A.C., P.J., and K.S. contributed equally to this work.2To whom correspondence may be addressed. Email: [email protected] or [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1615800113/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1615800113 PNAS | Published online October 31, 2016 | E8159–E8168

MED

ICALSC

IENCE

SPN

ASPL

US

SEECO

MMEN

TARY

Dow

nloa

ded

by g

uest

on

June

14,

202

0

Page 2: Second generation noninvasive fetal genome …Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

mixture of her own DNA and that from the fetus, there would be aslight overrepresentation of the haplotype shared by both thepregnant mother and her fetus. This haplotype-based approachhas imposed two limitations to NIPT. First, it requires the eluci-dation of the maternal haplotype using a direct haplotyping ap-proach (11–13), via pedigree analysis (8, 14), or via founderhaplotype analysis in selected populations (15). Second, thishaplotype-based approach has limited the resolution in which thematernal inheritance of the fetus can be determined. In thisregard, the mean length of the haplotype blocks that had beenused in previous efforts ranged from 300 kb to over 1 Mb (8–10).Recently, there is a lot of interest in the fragmentation patterns

of plasma DNA (16–18). Studies showed that plasma DNA frag-mentation sites are located in clusters across the genome that borerelationships with positions of nucleosome arrays and open chro-matin domains. Based on these data, researchers have concludedthat the fragmentation process of plasma DNA is nonrandom. Oneinterpretation of such observations is that plasma DNA fragmentsare cleaved at the accessible parts of the genome. We have gonefurther in this study to explore whether the actual ending sites ofplasma DNA were nonrandom down to a single-base level. Inother words, are plasma DNA molecules repeatedly cut at a se-lected set of genome coordinates? Previous studies could not ad-dress this question, because the sequencing depths of individualsamples were not high enough, and PCR duplicates were discardedbefore additional data analysis. We hypothesized that some of such“duplicates” were not created by PCR. Instead, they were plasmaDNA molecules that had originated from different cells and werecleaved at the same genome coordinates. To investigate this hy-pothesis, we processed some of the studied plasma samples using aPCR-free library preparation protocol.In this study, we aimed to develop a second generation approach

to noninvasively decipher the fetal genome from maternal plasmausing sequencing depth that had hitherto not been achieved inprevious studies (Fig. 1). We hypothesized that the use of ultra-deep genome-wide plasma DNA sequencing and multiparametric

bioinformatics analysis that takes into account the sequence,concentration, and size of the circulating DNA fragments wouldallow one to significantly improve the detection of fetal de novomutations, interrogate fetal inheritance at single-base resolution,and study the end characteristics of plasma DNA on a molecule bymolecule manner. We further illustrated how such detailed level offetal genomic information would be of clinical utility.

ResultsClinical Samples and DNA Sequencing. Pregnant women wererecruited from the Department of Obstetrics and Gynecology,Prince of Wales Hospital with informed consent. Hypotheses werefirst explored and methodologies were developed based on theanalysis of a plasma sample from a pregnant woman at 38 wk ofgestation. These “second generation” fetal genome methodologieswere then applied to the analysis of plasma collected from a sec-ond trimester pregnancy affected by cardiofaciocutaneous syn-drome. Plasma samples from 26 first trimester pregnant womenwere analyzed to further investigate the phenomenon of preferredending sites among plasma DNA molecules.

Key Maternal Plasma DNA Sequencing Metrics. Maternal plasmaDNA of the third trimester case was sequenced using the IlluminaTruSeq PCR-free library preparation protocol to a depth of 270×coverage of a haploid human genome. The maternal blood cells,paternal blood cells, and umbilical cord blood cells were se-quenced to 40×, 45×, and 50× haploid human genome coverages,respectively, using the same sequencing protocol. We analyzedSNPs where both parents were homozygous but for different al-leles and identified 239,816 of such SNPs. For these SNPs, thefetus would be an obligatory heterozygote for the paternal andmaternal alleles. Among these SNPs, the paternal alleles weredetectable in the maternal plasma of all of 239,816 SNPs. Thefetal DNA fraction (F) in maternal plasma was deduced as 31.3%from the aggregated numbers of paternally inherited fetal alleles

1st genera�onaNoninvasive fetal whole genome analysis

2nd genera�on

De Novo Muta�on

Sequencing Depth 53× − 78× 195× − 270×

38.6%0.438%

Sensi�vityPPV

81% − 85%62% − 74%

Maternal InheritanceSNP-based by GRAD*

Single nucleo�deHaplotype-based by RHDO*

300k − over 1M PrincipleResolu�on

Plasma DNA Fragmenta�on

Pa�ern

Plasma DNA preferred end sitesGenomic coordinates

Fetal-derived

DNAfragments

Maternal-derived

DNAfragments

Fetal preferred end site

Maternal preferred end site

1. Nucleosomal pa�ern 2. Size difference between maternal-

and fetal-derived fragments

Size (bp)

Freq

uenc

y (%

)

166

143

Fig. 1. Summary of the key differences between the first and second generation approaches for noninvasive fetal whole-genome analysis. aThe per-formance characteristics quoted for first generation fetal whole-genome analysis were from refs. 8–10. *RHDO represents RHDO analysis, and GRADrepresents GRAD analysis.

E8160 | www.pnas.org/cgi/doi/10.1073/pnas.1615800113 Chan et al.

Dow

nloa

ded

by g

uest

on

June

14,

202

0

Page 3: Second generation noninvasive fetal genome …Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

(p) and the maternal alleles (m) at these SNPs using the formulaF = 2p/(p + m).We next measured the difference in the sizes of fetally and

maternally derived DNA in the maternal plasma at individualSNPs. We compared the difference in size distribution of fragmentscarrying the fetal-specific allele and the allele shared by the fetusand the mother at a particular SNP (Fig. S1). Such size informationwould be useful for determining the likelihood of any variant alleledetected in maternal plasma being a fetal de novo mutation. Themedian difference between fragments carrying the fetal-specificalleles and the shared alleles was 22 bp (Fig. S2A). At 90% of theseSNP loci, the difference was larger than 10 bp (Fig. S2A).

Detection of Fetal de Novo Mutations from Maternal Plasma. Fifty-sixsingle-nucleotide variants were found to be present in the umbilicalcord blood cells and absent in the parents. These 56 single-nucleotide variants were thus deemed de novo mutations possessedby the fetus. Deep genome-wide sequencing is needed for thescreening of fetal de novo mutations from maternal plasma. At thislevel of sequencing, a vast number of sequencing errors would begenerated and hinder the sensitive and specific detection of truefetal de novo mutations. To tackle the challenge, we designed abioinformatics protocol that used a series of steps to filter out thefalse-positive mutations (Fig. 2). The bioinformatics protocol madeuse of steps that identified high-quality base sequences in combi-nation with steps that took into account the known biologicalcharacteristics of cell-free fetal DNA.First, we identified genomic locations where the maternal

plasma DNA sequencing showed at least one sequence variant,whereas the parental DNA analysis showed that both parentsshared the same sequence. Second, we applied a dynamic cutoffalgorithm to determine if the variant detected in the maternal

plasma was more likely to have originated from the fetus or se-quencing errors. In this algorithm, a variant would need to beobserved in a threshold number of sequenced reads so as to qualifyas a candidate de novo mutation (Table S1). This threshold wasdetermined from a binomial probability function based on the totalnumber of times that a particular nucleotide was actually se-quenced and the sequencing error rate, so that the theoreticalprobability of a putative de novo mutation being a sequencingerror would be less than 1 in 3 × 109. Using this dynamic cutoffalgorithm, 60,111 candidate de novo mutations were identified,and these candidate de novo mutations included 54 of 56 (i.e.,96%) de novo mutations present in the cord blood (Fig. 2).Then, we realigned the sequenced reads covering the 60,111

candidate de novo mutation sites to the reference human genomeusing the BOWTIE2 software (19). The initial alignment software,Short Oligonucleotide Alignment Program 2 (SOAP2) (20), andthe realignment software, BOWTIE2, were based on two differentmatching algorithms, namely the Burrows–Wheeler transform al-gorithm and Smith–Waterman-like algorithm, respectively (19, 20).A variant would be removed from being considered as a candidatede novo mutation if the two alignment algorithms do not map thevariant to the same genomic location. This realignment proceduresignificantly reduced the number of false-positive results caused byalignment errors. After this realignment process, the number ofcandidate de novo mutations was reduced to 148 and contained52 (93%) of the de novo mutations detected in the cord blood(Fig. 2). This result is equivalent to a PPV of 35%.To further reduce the number of false positives, we applied a

filter based on the fractional concentration of the variants. Becausethe fetal DNA fraction in the sample was 31.3%, 95% of the var-iants would be expected to be present at a level of ≥10% of thesequenced reads covering a particular nucleotide based on thePoisson distribution. Hence, we filtered out the candidate muta-tions that were present at a fraction of <10%. Eighty-five candidatede novo mutations were present at ≥10% of the sequenced readscovering the nucleotide and included 51 (91%) of the mutationsdetected in the cord blood (Fig. 2). This result represents a PPVof 60%.Next, we included a filter based on the size of the plasma DNA

molecules carrying the candidate de novo mutations. As illustratedin Fig. S2A, the difference between the mean fragment size forfragments carrying the fetal-specific alleles and the shared alleleswas larger than 10 bp for 90% of loci at which the fetus washeterozygous and the mother was homozygous. Thus, we applied asize filter to the candidate de novo mutations, requiring the meansize of the plasma DNA fragments carrying the mutants to be atleast 10 bp shorter than that of the fragments carrying the parentalalleles. Sixty-five candidates passed this size-filtering criterion thatincluded 48 (85%) of the mutations detected in the cord blood(Fig. 2). This result represents a PPV of 74%.

Paternal Inheritance Analysis. For paternal inheritance analysis, wefocused on SNPs that were heterozygous in the father and homo-zygous in the mother. In principle, the presence or absence of thepaternal-specific allele in the maternal plasma would indicate theinheritance of paternal-specific allele or the shared parental alleleby the fetus, respectively. To enhance the accuracy of deducing thepaternal inheritance, we used a binomial mixture model, whichtook into account the fetal DNA concentration, the sequencingerror rate, and the number of times that the paternal-specific allelewas observed in the maternal plasma (details are in Materials andMethods) (21). For 667,586 SNPs for which the mother was ho-mozygous and the father was heterozygous, the overall accuracywas 99.99% for the deduction of the paternal inheritance.

Genome-Wide Relative Allele Dosage Analysis. Previous efforts inprenatally elucidating the maternal inheritance of the fetus on agenome-wide level had all been based on the RHDO approach

Dynamic cutoffVariant frequency higher than sequencing error rate

60,111Detection rate: 54/56 (96%); PPV: 0.089%

Realignment Realign all reads covering putative mutation sites

148Detection rate: 52/56(93%); PPV: 35%

Fractional concentrationVariant frequency compatible with fetal DNA fraction

85Detection rate: 51/56 (91%); PPV: 60%

Plasma DNA fragment sizeMean size of fragments carrying the variant 10 bp shorter than

the mean size of fragments carrying the parental alleles65

Detection rate: 48/56 (85%); PPV: 74%

Fig. 2. Detection of the fetal de novo mutations through the analysis of thesequencing data of maternal plasma DNA. The number in bold in each steprepresents the number of candidate de novo mutations identified after theparticular process.

Chan et al. PNAS | Published online October 31, 2016 | E8161

MED

ICALSC

IENCE

SPN

ASPL

US

SEECO

MMEN

TARY

Dow

nloa

ded

by g

uest

on

June

14,

202

0

Page 4: Second generation noninvasive fetal genome …Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

(8–10). Enabled by the high-quality genome-wide base information,we showed that the maternal inheritance of these heterozygousSNPs could be resolved without the use of the haplotype in-formation of the mother. The dosage of the two maternal alleles inmaternal plasma would be compared to determine, statistically,whether the two alleles are present at the same or different con-centrations. For SNP loci at which one allele was present at signif-icantly higher concentration and at which the two alleles werepresent at statistically nondifferentiable concentrations, the fetuswould be deduced to be homozygous and heterozygous, respectively.We denote this method as genome-wide relative allelic dosage(GRAD) analysis. This method is a genome-wide and sequencing-based version of our previously described relative mutation dosagemethod (22). There were a total of 656,676 heterozygous SNPs inthe mother’s genome. Using GRAD analysis, the maternal in-heritance of the fetus was resolved with statistically significant resultsin 618,271 SNPs (i.e., 94.2%). The allelic counts were insufficient togive statistically significant results in 38,405 SNPs (i.e., 5.8%). Thefetus was deduced to be homozygous at 335,988 (54%) SNPs andheterozygous at 282,283 (46%) SNPs. Among 618,271 SNPs withstatistically significant results, the deduction of maternal inheritancewas correct for 610,084 (98.7%) SNPs compared with the genotypesfrom the analysis of the cord blood cells.The resolution of GRAD analysis would be affected by the se-

quencing depth and the fetal DNA fraction in the maternal plasmasample. We performed computer simulation analyses to determinethe number of SNPs required for achieving one classification ofmaternal inheritance with an overall accuracy of over 95% (Fig.S3). With a sequencing depth of 250×, we would be able to achieveclose to single-nucleotide resolution for the deduction of maternalinheritance when the fetal DNA fraction was above 20%. Whenthe fetal DNA fraction was 10% and a sequencing depth was at300×, maternal inheritance could be deduced in 1 of every 5 SNPswith an accuracy of over 95%, but the resolution of maternal in-heritance would drop to 1 classification per 30 SNPs with a se-quencing depth of 150×.

Preferred End Sequences in Maternal Plasma DNA. Facilitated by thehigh-sequencing depth of a non-PCR–amplified library of thematernal plasma DNA sample, we investigated if certain basepositions in the genome would be preferentially represented at theends of plasma DNA fragments. In this regard, we showed thatplasma DNA fragments carrying the most prevalent 0.5% ofplasma DNA ends represented 3.5% of all DNA fragments. Acrossthe genome, 25% of the fragments had at least one identicalfragment that shared the same ending sites at both ends. If thecleavage or breakage of plasma DNA was completely random, only1.45% of the DNA molecules would be expected to have at leastone counterpart sharing common ends (details are inMaterials andMethods). Had the sequencing been performed using a PCR-amplified library, then 14% of the fragments would have been wronglyassumed to be PCR duplicates and would have been filtered off.We further investigated if there would be differences in the

preferred ending sites for maternal and fetal DNA in the same cell-free DNA sample. To show this phenomenon, informative SNPloci where the mother was homozygous (genotype denoted as AA)and the fetus was heterozygous (genotype denoted as AB) wereidentified. In this illustrative example, the B allele would be fetal-specific, and the A allele would be shared by the mother and thefetus. The fetal-specific and shared reads covering one such in-formative SNP are shown in Fig. 3. For comparison, the se-quencing results of a DNA sample obtained from blood cells andartificially fragmented using sonication are also shown. Clusters ofpreferred ending positions could be observed among the plasmaDNA data. For the plot of the probability of a genomic locationbeing an end of DNA fragments (also referred to as the fragmentend probability), three peaks were observed for each of two groupsof fragments carrying the fetal-specific allele and the allele shared

by the mother. These peaks represent the hotspots for the endpositions of fetal- and maternal-derived DNA in maternal plasma,respectively. In contrast, the fragmentation pattern of the soni-cated DNA seems to be random without such clusters, and thefragment end probability is similar across the region (Fig. 3).We further studied the coordinates that had an increased

probability of being an ending position for plasma DNA fragments.We focused our search based on fragments covering the in-formative SNPs, so that the fragments carrying fetal-specific allelesand alleles shared by the mother and the fetus could be evaluatedseparately. We determined if certain locations within the humangenome had a significantly increased probability of being an endingposition of plasma DNA fragments using a Poisson probabilityfunction (Materials and Methods). A P value of <0.01 had beenchosen to indicate statistical significance. Statistically significantending positions were determined for DNA fragments carrying theshared allele and the fetal-specific allele independently (Fig. 4A).We identified a total of 8,242 (Set A) and 23,857 (Set B) nu-

cleotide positions with a significantly increased chance of being anend for plasma DNA fragments carrying fetal-specific alleles andshared alleles, respectively, covering 10,233 SNPs; 8,909 of thenucleotide positions were observed to be overrepresented amongthe plasma DNA molecules with the fetal-specific allele (Set A) aswell as molecules with the shared allele (Set B). We called thisoverlapping set of ends Set C (Fig. 5A). There were 48,707, 53,901,and 65,205 plasma DNA fragments carrying fetal-specific allelesending on Set A, Set B, and Set C positions, respectively. In otherwords, multiple fetal-specific DNA molecules showed identicalending positions. A median of 6 (range = 6–41) plasma DNAfragments carrying fetal-specific alleles terminated at a Set Aending position. Based on a sequencing depth of 270×, fetal DNAfraction of 31.3%, and the size distribution of fetal DNA frag-ments, it was expected that only 0.29 fragment would end at eachsite if the fragmentation of plasma DNA had been random. Thus,the probability of ending at these sites was 20 times higher thanexpected. There were 54,541, 376,343, and 182,791 plasma DNAfragments carrying shared alleles ending at Set A, Set B, and Set Cpositions, respectively. The genomic coordinates of Set A, Set B,and Set C positions are shown in Dataset S1.Using the same principle, we analyzed the ending positions for

maternal-specific plasma DNA fragments. SNPs that were het-erozygous in the mother (genotype AB) and homozygous in thefetus (genotype AA) were noted, and plasma DNA molecules thatcarried any one of the maternal-specific alleles (B allele) weredeemed to be maternally derived. Among the maternal-specificplasma DNA molecules, 7,527 (Set X) and 18,829 (Set Y) nucle-otide positions showed increased occurrence as an ending positionfor plasma DNA fragments carrying maternal-specific alleles andshared alleles, respectively, covering 9,489 SNPs (Fig. 4B); 10,534positions were observed to be overrepresented among the plasmaDNA molecules with the maternal-specific allele (Set X) as well asmolecules with the shared allele (Set Y). We called this over-lapping set of ends Set Z (Fig. 5B). There were 69,136, 82,413, and121,607 plasma DNA fragments carrying maternal-specific allelesending at Set X, Set Y, and Set Z positions, respectively. A medianof 10 (range = 9–54) plasma DNA fragments carrying maternal-specific alleles terminated at a Set X base position. If the frag-mentation of plasma DNA had been random, it was expected thatonly 0.56 fragment would terminate at each site. Thus, the prob-ability for ending at these sites was 18 times higher than expected.There were 46,554, 245,037, and 181,709 plasma DNA fragmentscarrying shared alleles ending on Set X, Set Y, and Set Z positions,respectively. The genomic coordinates of the Set X, Set Y, and SetZ positions of this case are shown in Dataset S1.

Using Preferred End Positions to Deduce Fetal DNA Fractions in MaternalPlasma in First Trimester Pregnancies. After having identified sets ofmaternal plasma DNA preferred end positions from one third

E8162 | www.pnas.org/cgi/doi/10.1073/pnas.1615800113 Chan et al.

Dow

nloa

ded

by g

uest

on

June

14,

202

0

Page 5: Second generation noninvasive fetal genome …Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

trimester case, we explored if such ends would be detectable insamples of other pregnancies and whether the relative abundanceof plasma DNA ending at these sets of nucleotide positions wouldreflect the fetal DNA fraction. We sequenced the plasma DNA of26 first trimester (10–13 wk) pregnancies, each involving a male

fetus. Sequencing libraries were prepared from these maternalplasma samples using the KAPA Library Preparation Kits (KapaBiosystems), which included a PCR amplification step to enrich theadaptor-ligated molecules. The median number of mapped readsper sample was 16 million (range = 12–22 million). The proportion

Informative SNP

Maternal plasma DNA

48

48

Shared allele

Fetal-specific allele

Relative genomic coordinates (bases)

Fraction of fragments endingon the nucleotide

position (%)

Informative SNP

48

Relative genomic coordinates (bases)

Sonicated maternal blood cell DNA

0-100-200-300 100 200 300 0-100-200-300 100 200 300

Fig. 3. Left shows an illustrative example of the nonrandom fragmentation patterns of plasma DNA carrying a fetal-specific allele and an allele shared by themother and the fetus. In Upper, each horizontal line represents one sequenced DNA fragment. The ends of the DNA fragments represent the ending position ofthe sequenced read. The fragments are sorted according to the coordinate of the left outermost nucleotide (genomic coordinate with the lower numerical value).In Lower, the percentage of fragments ending on a particular position is shown. Upper Right shows the span of all of the sequenced DNA fragments aligned tothe same SNP obtained from the sonicated blood cell DNA of the same pregnant woman. Lower Right shows the percentage of fragments ending on a particularposition. The x axes show the relative genomic coordinates. The SNP of interest is located in the middle (marked by a dotted line).

48

48

18878200 18878400 18878600 18878800

Genomic coordinates (chr22) (bases)

0

Sequencing reads carryingshared allele

Sequencing reads carryingfetal-specific allele

Informative SNP that mother was homozygous and fetus was heterozygous

Significant ending position specific for fragments carrying shared alleles

Significant ending position common for fragments carrying shared and fetal-specific alleles

Significant ending position specific for fragments carrying fetal-specific allelesP

ropo

rtion

of p

lasm

a D

NA

endi

ng a

t a

geno

mic

coo

rdin

ate

(%)

17053400 17053600 17053800 17054000

Genomic coordinates (chr1) (bases)

Sequencing reads carryingshared allele

Sequencing reads carryingmaternal-specific allele

Informative SNP that mother was heterozygous and fetus was homozygous

Significant ending position specific for fragments carrying shared alleles

Significant ending position common for fragments carrying shared and maternal-specific alleles

Significant ending position specific for fragments carrying maternal-specific allelesP

ropo

rtion

of p

lasm

a D

NA

endi

ng a

t a

geno

mic

coo

rdin

ate

(%)

17054200

A B

48

48

0

Fig. 4. Plot of probability of a genomic coordinate being an ending position of maternal plasma DNA fragments across a region with an informative SNP atwhich (A) the mother was homozygous and the fetus was heterozygous and (B) the mother was heterozygous and the fetus was homozygous. (A) Results fornucleotide positions with a significantly increased probability of being an end of plasma DNA fragments carrying a shared allele and a fetal-specific allele areshown in red and blue, respectively. (B) Results for nucleotide positions with a significantly increased probability of being an end of plasma DNA fragmentscarrying a shared allele and a maternal-specific allele are shown in red and blue, respectively. The x axes of the graphs show the relative genomic coordinates.The position of the SNP of interest is marked by a dotted line.

Chan et al. PNAS | Published online October 31, 2016 | E8163

MED

ICALSC

IENCE

SPN

ASPL

US

SEECO

MMEN

TARY

Dow

nloa

ded

by g

uest

on

June

14,

202

0

Page 6: Second generation noninvasive fetal genome …Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

of sequenced reads aligning to chromosome Y was used to calcu-late the actual fetal DNA fraction in each plasma sample. A pos-itive correlation could be observed between the relative abundance[denoted as the fetal/maternal (F/M) ratio] of plasma DNA withrecurrent fetal (Set A) and maternal (Set X) ends and the fetalDNA fraction (R = 0.66, P < 0.001, Pearson correlation) (Fig. 6). Amedian of 248 Set A ends was observed among 26 samples. Amedian of 286 Set X ends was observed among those samples.

Size Distribution of Plasma DNA Fragments Terminating on the Fetal-Specific End Positions. Because fetal DNA in maternal plasma isshorter than the maternal-derived DNA in maternal plasma, wecompared the size distributions of plasma DNA ending on the SetA and Set X positions. For the deeply sequenced third trimestercase, the size distribution for fragments ending at Set A positionswas shorter than those ending at Set X positions (Fig. 7 A and B).Then, we analyzed the pooled sequenced reads from 26 first tri-mester plasma samples used for fetal DNA fraction analysis. Ashorter size distribution was observed for fragments endingat fetal-preferred Set A positions compared with those ending atmaternal-preferred Set X positions (Fig. 8 A and B). These resultswere consistent with the fact that fetal DNA was shorter thanmaternal-derived DNA and further support the hypothesis thatthe end positions derived from one pregnant woman can poten-tially be generalized to other pregnant cases.

To investigate the specificity of these positions, we analyzed thesize distribution of DNA fragments ending on the Set A or Set Xpositions when the genomic coordinates of these positions wereshifted by one to five nucleotides. To quantify the size difference,cumulative frequencies for fragments ending at the two sets ofpositions were plotted against size. The difference in the two cu-mulative frequency curves (denoted as ΔS) would reflect themagnitude of the size difference (Figs. 7B and 8B). For the thirdtrimester sample, the difference in size distribution diminishedwhen the number of shifted nucleotides was increased and nolonger observable when the coordinates were shifted by four nu-cleotides (Fig. 7 C andD). For the pooled sequenced reads from 26first trimester cases, a reduction in the difference in size distribu-tion with nucleotide shift was similarly observed (Fig. 8 C and D).

Fetal Genome Analysis of a Second Trimester Pregnancy Affected byCardiofaciocutaneous Syndrome. To validate the above-mentionedmethodologies, we sequenced a maternal plasma sample collectedat 18 wk of gestation to 195× coverage. This case presented withincreased nuchal translucency in the first trimester. A chorionicvillus sample was obtained at 11 wk of gestation, which showed anormal karyotype. At 14 wk of gestation, a cystic hygroma andmild club foot deformity were detected on ultrasound scan, andhence, microarray analysis was performed on the chorionic villussample to detect gene mutations that were associated withNoonan syndrome and related conditions. The microarray analysisdetected a mutation (c770A > G) on the BRAF (B-Raf proto-oncogene, serine/threonine kinase) gene that resulted in car-diofaciocutaneous syndrome. Because the mutation was absent inthe mother’s and father’s genomes, it was deemed a de novomutation. The fetus was aborted, and placental tissues were col-lected after the procedure.DNA from the maternal buffy coat, paternal buffy coat, and

placental tissues collected after termination was sequenced to 40×,60×, and 60× human genome coverages, respectively. The fetalDNA fraction in the maternal plasma was 24%. A difference of atleast 8 bp was observed between the size of fragments carryingfetal-specific alleles and alleles shared by the fetus and the motherat over 90% of informative SNPs (Fig. S2B). Because the size

8,9098,242 23,857

Preferred ending positions for fragments carrying

shared alleles

Preferred ending positions for fragments carrying

fetal-specific alleles

Ending positions for plasma DNA fragments across a SNP that was homozygous in the mother

and heterozygous in the fetus

Set A Set BSet C

A

10,5347,527 18,829

Preferred ending positions for fragments carrying

shared alleles

Preferred ending positions for fragments carrying

maternal-specific alleles

Ending positions for plasma DNA fragments across a SNP that was heterozygous in the mother

and homozygous in the fetus

Set X Set Z Set Y

B

Fig. 5. Analysis of ending positions for plasma DNA fragments across(A) SNPs that were homozygous in the mother and heterozygous in the fetusand (B) SNPs that were homozygous in the fetus and heterozygous in themother. (A) Set A included preferred ending positions for fragments carryingfetal-specific alleles. Set B included preferred ending positions for fragmentscarrying shared alleles. Set C included preferred ending positions for bothtypes of plasma DNA fragments. (B) Set X included preferred ending posi-tions for fragments carrying maternal-specific alleles. Set Y included pre-ferred ending positions for fragments carrying shared alleles. Set Z includedpreferred ending positions for both types of plasma DNA fragments.

0.7

0.8

0.9

1.0

1.1

0 5 10 15 20 25

Rat

io (F

/M)

Fetal DNA fraction (%)

Pearson’s r = 0.66P-value < 0.001

Fig. 6. Correlation between the relative abundance (ratio F/M) of plasmaDNA molecules with preferred fetal (Set A) and maternal (Set X) ends, andfetal DNA fraction.

E8164 | www.pnas.org/cgi/doi/10.1073/pnas.1615800113 Chan et al.

Dow

nloa

ded

by g

uest

on

June

14,

202

0

Page 7: Second generation noninvasive fetal genome …Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

distribution of plasma DNA fragments carrying the alleles sharedby the fetus and the mother is dependent on the fractional fetalDNA concentration in plasma, the size difference required forfiltering de novo mutation would need to be determined on a caseby case basis. In this case, at least 8-bp size difference between thevariant and the parental alleles was used as the criterion forqualifying the variant as a candidate de novo mutation; 75 can-didate de novo mutations were identified in the plasma, of which47 were confirmed to be present in the placenta. Because a total of58 de novo mutations were detected in the placenta, the plasmaanalysis had a detection rate of 81% and a PPV of 62% (Fig. S4).The de novo BRAF mutation was among 47 variants detected bythe plasma DNA analysis. Using GRAD analysis, the maternalinheritance of the fetus was deduced in 528,008 (68% of 775,456SNPs where the mother was heterozygous) SNPs with an accuracyof 96.8%. Regarding preferred end sites, plasma DNA fragmentscarrying the most prevalent 0.5% of plasma DNA ends repre-sented 2.4% of all DNA fragments. Strikingly, 59% of the mostprevalent 0.5% plasma DNA end sites overlapped with those ofthe third trimester case. Based on the analysis of informativeSNPs, 10,401 fetal-preferred (Set A) (Fig. S5A) and 6,562 ma-ternal-preferred (Set X) (Fig. S5B) end sites were identified. Thepreferred end sites for fragments carrying alleles shared betweenthe mother and the fetus were also identified (Set B, Set C, Set Y,and Set Z) (Fig. S5). The genomic coordinates of each set ofpreferred positions of this case are shown in Dataset S2. In 26 firsttrimester maternal plasma samples, a positive correlation could beobserved between the relative abundance (i.e., the F/M ratio) ofthe plasma DNA molecules with fetal-preferred (Set A) and

maternal-preferred (Set X) ends and the fetal DNA fraction (R =0.66, P < 0.001, Pearson correlation) (Fig. S6).

DiscussionIn this work, we performed ultradeep genome-wide sequencing ofthe plasma DNA obtained from two pregnant women to 195× and270× haploid genome coverages. We believe that these are thedeepest genome-wide sequencings yet reported for plasma DNAobtained from any one pregnancy. Previous efforts in the genome-wide sequencing of plasma DNA obtained from pregnant womenhad attempted sequencing depths from 52.7× to 78× haploid ge-nome coverages (8–10). The depth of sequencing achieved in ouranalyzed case has allowed us to push forward the limit and expandthe applications of NIPT.First, we investigated the possibility of searching for fetal de

novo mutations on a genome-wide scale using the maternalplasma DNA sequencing data. It has been reported that there are∼74 de novo point mutations per person and that the mutationrate increases with paternal age (23). It is now increasingly rec-ognized that de novo mutations play an important role in humangenetic diseases and that diseases associated with de novo muta-tions are not rare collectively (23). Thus, it is clinically relevant tobe able to screen for de novo mutations prenatally. Efforts in thenoninvasive prenatal detection of fetal de novo mutations arelimited to diseases associated with structural abnormalities de-tectable on ultrasound and where those diseases are known to beassociated with hotspot de novo mutations (24). For example,

A B

C D

ΔS

Fetal-preferredMaternal-preferred

Fetal-preferredMaternal-preferred

Freq

uenc

y (%

)

Cum

ula�

ve fr

eque

ncy

(%)

∆S (%

)

∆S (%

)

Size (bp)Size (bp)

Size (bp)Size (bp)

Fig. 7. (A) Plasma DNA size distributions for fragments ending on fetal-pre-ferred ending positions (Set A; blue) and fragments ending on maternal-pre-ferred ending positions (Set X; red). A shorter size distribution was observedfor fragments ending on Set A positions compared with those ending on Set Xpositions. (B) Cumulative plot for the size distributions for the two sets offragments. The difference in the cumulative frequencies of the two sets offragments (ΔS) against fragment size is shown. (C) A plot of ΔS against sizewhen the end coordinates are shifted by 0 to +5 bp with regard to the ge-nomic coordinates of the preferred end. (D) A plot of ΔS against size when theend coordinates are shifted by 0 to −5 bp with regard to the genomic coor-dinates of the preferred end.

A B

C D

ΔS

Fetal-preferredMaternal-preferred

Freq

uenc

y (%

)

Fetal-preferredMaternal-preferred

Cum

ulat

ive

frequ

ency

(%)

ΔS

(%)

ΔS

(%)

Size (bp)Size (bp)

Size (bp)Size (bp)

Fig. 8. (A) Plasma DNA size distributions in a pooled plasma DNA sample from26 first trimester pregnant women for fragments ending on fetal-preferredending positions (Set A; blue) and fragments ending on maternal-preferredending positions (Set X; red). A shorter size distribution was observed forfragments ending on Set A positions compared with those ending on Set Xpositions. (B) Cumulative plot for the size distributions for the two sets offragments. The difference in the cumulative frequencies of the two sets offragments (ΔS) against fragment size is shown. (C) A plot of ΔS against sizewhen the end coordinates are shifted by 0 to +5 bp with regard to the genomiccoordinates of the preferred end. (D) A plot of ΔS against size when the endcoordinates are shifted by 0 to −5 bp with regard to the genomic coordinates ofthe preferred end.

Chan et al. PNAS | Published online October 31, 2016 | E8165

MED

ICALSC

IENCE

SPN

ASPL

US

SEECO

MMEN

TARY

Dow

nloa

ded

by g

uest

on

June

14,

202

0

Page 8: Second generation noninvasive fetal genome …Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

>90% of achondroplasia cases are caused by a de novo mutationin the FGFR3 (fibroblast growth factor receptor 3) gene with noprior familial history of skeletal dysplasia (25). A clinical suspicionfor achondroplasia could, therefore, be followed by maternalplasma DNA analysis with the aim to specifically detect fetalFGFR3 mutations (26).Genome-wide scanning for fetal de novo mutations without a

priori information on potentially affected genes or diseases is ahuge technological challenge. Previous efforts in this area hadreported a PPV of 0.438% at a sensitivity of 38.6% (10). Hence,using the depth of sequencing as a foundation, we used a series ofbioinformatics filters taking into account the sequencing error rate,minimizing the realignment errors, and considering the plasmafraction of a putative mutation as well as the size difference be-tween fetal and maternal DNA in maternal plasma. Using suchfilters, we were able to detect fetal de novo mutations with a PPVof 74% at a sensitivity of 85% for the third trimester case. For thesecond trimester case, we achieved a PPV of 62% and a sensitivityof 81%. The PPVs, in particular, represent a significant step for-ward over previous efforts and are at a level that suggests that, withadditional improvements, such an approach might eventually haveclinical impact. It is noteworthy that the approach successfullydetected the de novo BRAF mutation that caused the craniofa-ciocutaneous syndrome in the second trimester case. For theanalysis of cases with earlier gestational ages and lower fetal DNAfractions, higher sequencing depths would be required to achievesimilar levels of performance. We should perhaps add a cautionarynote that, although we have shown the technological feasibility ofdetecting fetal de novo mutations from maternal plasma, the in-terpretation of the clinical implications of each of these mutationsis a great challenge that would require a well-supported clinicalgenetics infrastructure and continual improvements in our un-derstanding of the human genome.Second, the depth of maternal plasma DNA sequencing and

refined bioinformatics strategies used in this study allow one todetermine the maternal inheritance of the fetus on a genome-widelevel without resorting to maternal haplotype dosage analysis (8).Using GRAD analysis, the maternal inheritance of the fetus couldbe resolved in 94.2% of 656,676 heterozygous SNPs in the ma-ternal genome of the third trimester pregnancy and 68% of775,456 heterozygous SNPs in the genome of the second trimesterpregnant woman. In our previous study on noninvasive fetal ge-nome analysis, the maternal inheritance was deduced usingRHDO analysis (8). The mean size of the haplotype blocks was409 kb. For other studies involving the noninvasive fetal genomeanalysis, the mean size of haplotype blocks with maternal in-heritance determined ranged from 300 kb to over 1 Mb (9, 10, 27).For our previous study (8), the maternal inheritance of 7,332haplotype blocks was successfully determined with an accuracy of99.1%. The present work, thus, represents an ∼90-fold (i.e.,656,676/7,332 and 716,646/7,332 for the third and second trimestercases, respectively) increase in resolution in elucidating the ma-ternal inheritance of the fetus.Third, the deep sequencing of maternal plasma DNA samples

prepared using non–PCR-amplified libraries has allowed us toinvestigate if there might be genomic locations that would bepreferentially represented at the ends of plasma DNA fragmentsand whether such ends would exhibit differences depending ontheir tissue of origin (i.e., from the placenta of the fetus or themother). Our data indicate that such recurrent ends are indeedpresent in plasma DNA. For example, our data from the thirdtrimester case revealed that plasma DNA fragments carrying themost prevalent 0.5% of plasma DNA ends represented 3.5% of allDNA fragments and that 25% of the fragments had at least oneadditional fragment sharing identical ends at both ends. In-terestingly, the preferred ending positions identified in the thirdtrimester case were also detectable in 26 first trimester cases, evenwhen conventional PCR-amplified DNA libraries were sequenced

to a relatively shallow depth. These observations indicate that theending sites that we have identified are indeed preferential plasmaDNA fragmentation sites that are detectable recurrently acrossdifferent samples at different gestational ages and with more thanone library preparation protocol.Furthermore, we have found that there were subsets of such

recurrent plasma DNA ends that were preferentially associatedwith fetal or maternal DNA (Figs. 3, 4, and 5). Using such fetal- ormaternal-preferred ending sites, one could estimate the fetal DNAfraction in a particular sample (Fig. 6 and Fig. S6). This lattercorrelation is particularly striking, because the fetal- and maternal-preferred ending sites were originally identified in the third andsecond trimester cases. However, the abundance of such fetal-specific ending sites correlated with the fetal DNA fractions in theseries of independent first trimester cases.For two cases in the study that were subjected to ultradeep

sequencing (270× and 195× haploid genome coverages), it wasdesirable to use PCR-free library preparations, because theprobability of encountering PCR duplicates increased with thedepth of sequencing. Furthermore, the use of PCR-free librarieswas also the most convincing approach to answer the fundamentalquestion as to whether there were preferred ending sites forplasma DNA, because one could positively identify them withoutthe worry that one was seeing an artefactual result (i.e., PCRduplicates).For using the recurrent ends to deduce the fetal DNA fractions

for the first trimester samples, we had used PCR-amplified se-quencing libraries. However, the depth of sequencing was veryshallow (median of 15 million reads per case; i.e., 0.8× haploidgenome coverage). Hence, one would not expect any of the pre-ferred end sites to appear more than once in a particular sample.Consequently, the filtering step that one normally performed toremove PCR duplicates, leaving one read for subsequent bio-informatic analysis, would not be expected to bias the results. Ourdemonstration of a good correlation between the F/M ratio (basedon preferred ends) and the fetal DNA fraction is a testimony toour belief.The plasma DNA fragments with the fetal-preferred ending

sites had a shorter size distribution than those with the maternal-preferred ending sites (Fig. 7). Most importantly, this size differ-ence was observed in not only the third trimester case in which therecurrent ends were identified but also, 26 independent first tri-mester cases (Fig. 8). It is also interesting to note that, as weshifted the end coordinates by one to five nucleotides on bothsides of the fetal- and maternal-preferred end sites, the size dif-ference rapidly disappeared (Fig. 7 C and D). A similar findingwas also observed from 26 first trimester cases (Fig. 8 C and D).These data suggest that there is a high level of specificity in theprocess of plasma DNA fragmentation to the extent where, insome parts of the genome, the cleavage or breakage occurs pref-erentially at specific base locations.There are active recent discussions on the nucleosomal origin of

plasma DNA. One clue to such an origin came from work on thehigh-resolution size profiling of maternal plasma DNA (8, 28).Such work has revealed that the size profile of the total DNA inmaternal plasma, in which a predominant proportion is from thepregnant mother, contains a dominant population of plasma DNAmolecules with a size of 166 bp. Such data have also shown that,when one focuses on the placentally derived fetal DNA in ma-ternal plasma, one would observe a predominant population witha size of 143 bp. The 166-bp molecules have been postulated torepresent molecules containing the nucleosome core plus thelinker (8). The 143-bp molecules, however, have been postulatedto represent molecules containing the nucleosomal core withoutthe linker. Straver et al. (17) created an artificial “nucleosometrack” by pooling maternal plasma DNA sequencing data frommultiple cases. They found that the frequency of reads withstarting sequences within regions 73 bp upstream and downstream

E8166 | www.pnas.org/cgi/doi/10.1073/pnas.1615800113 Chan et al.

Dow

nloa

ded

by g

uest

on

June

14,

202

0

Page 9: Second generation noninvasive fetal genome …Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

of the deduced nucleosome center showed a positive correlationwith the fetal DNA fraction. Outside of the context of pregnancy,Snyder et al. (16) also explored the nucleosomal footprint ofplasma DNA. They developed a metric that they called the win-dowed protection score, which has been defined as the number ofmolecules spanning a particular genomic window minus thosemolecules with an end point within the window. Conclusionsdrawn from those studies are that plasma DNA cleavages orbreakages cluster around locations in the genome in relation topositions of nucleosome arrays and open chromatin domains.However, as shown in Figs. 7 and 8, the preferred plasma DNAending sites identified by this study are exquisitely sensitive to thegenomic location. We have, therefore, shown that there is anotherlayer of complexity associated with the nonrandom nature ofplasma DNA fragmentation that is beyond factors just related togenome architecture or structural domains. Additional studies areneeded to understand the relationship between the various factorsand mechanisms resulting in the highly orchestrated and location-specific patterns of plasma DNA fragmentation. We also believethat a catalog of such tissue-specific end sites would provide analternative approach for elucidating the origin of plasma DNA inaddition to methods based on methylation analysis (29) and nu-cleosome foot printing (16).In summary, we have developed a second generation approach

that produces noninvasive fetal genomes at high resolution usingmaternal plasma DNA sequencing. We believe that this work hassignificantly pushed forward the limit of NIPT by showing thefeasibility of detecting fetal de novo mutations on a genome-widelevel from maternal plasma. We have also substantially increasedthe resolution in which one could determine the maternal in-heritance of the fetus using a nonhaplotype-based approach.Currently, the potential clinical implementation of this approach islimited by the costs involved in sequencing maternal plasma DNAto the depth reported here (195× and above; e.g., over US $40,000in our center). However, with the continual reduction in the costsof massively parallel sequencing, it is envisioned that this costbarrier would eventually be removed. Finally, our ΔS demonstra-tion of preferred end sites in plasma DNA that exhibit specificity oftheir origin (e.g., the placenta of the fetus) has opened up nu-merous avenues of investigation. In the context of NIPT, it wouldallow a relatively simple approach for estimating the fetal DNAfraction. Additional work would need to determine the relativemerit between this approach and others previously reported [e.g.,using plasma DNA size (28) and nucleosomal profiles (17)]. Ac-tually, these methods could potentially be used in combination forNIPT. It would also be of great interest to explore the presence ofsuch recurrent end positions in clinical and pathological scenariosoutside of the pregnancy context (e.g., cancer).

Materials and MethodsClinical Samples. The study was approved by the Joint Chinese University ofHong Kong and Hospital Authority New Territories East Cluster Clinical Re-search Ethics Committee. For each of the two cases whose plasma DNA sampleswere sequenced to high depths, 20 mL maternal venous peripheral blood wascollected; 10 mL paternal venous peripheral blood was also collected. For thethird trimester pregnant case, 3 mL cord blood was collected after delivery. Forthe case with cardiofaciocutaneous syndrome, the placenta was collectedafter therapeutic abortion. Twenty-six first trimester pregnant women wererecruited for fetal DNA fraction analysis.

Sample Processing. Blood samples were centrifuged at 1,600 × g for 10 min at4 °C. The plasma portion was harvested and recentrifuged at 16,000 × g for10 min at 4 °C to remove the blood cells. The blood cell portion was recentri-fuged at 2,500 × g, and any residual plasma was removed. DNA from the bloodcells and that from maternal plasma were extracted with the blood and bodyfluid protocol of the QIAamp DNA Blood Mini Kit and the QIAamp DSP DNABlood Mini Kit (Qiagen), respectively. DNA from the placenta was extractedwith the QIAamp DNA Mini Kit (Qiagen) according to the manufacturer’stissue protocol.

DNA Library Construction. For the two cases subjected to deep sequencing, DNAlibraries for the genomic DNA samples and the maternal plasma sample wereconstructed with the TruSeq DNA PCR-Free Library Preparation Kit (Illumina)according to the manufacturer’s protocol, except that one-fifth of the indexedadapter was used for plasma DNA library construction. For the construction ofplasma DNA libraries, DNA extracted from 8 mL plasma was used. For thegenomic DNA samples, including the mother’s buffy coat DNA, the father’sbuffy coat DNA, and the cord blood buffy coat DNA, 1 μg sonicated DNA wasused for library construction. The library concentrations ranged from 34 to58 nM in a 20-μL library. For the first trimester, plasma DNA sequencing librarieswere constructed using a KAPA Library Preparation Kit (Kapa Biosystems).

Sequencing of DNA Libraries. The libraries for maternal plasma DNA and ge-nomic DNA were sequenced using the HiSeq2500 and the HiSeq1500 Se-quencing Systems (Illumina), respectively, using a paired end sequencingprotocol. Seventy-five nucleotides were sequenced from each end.

Alignment of Sequencing Data. The paired end sequencing data were analyzedbymeans of the SOAP2 in the paired endmode (20). The paired end reads werealigned to the nonrepeat-masked reference human genome (hg19). Up to twonucleotide mismatches were allowed for the alignment of each end. The ge-nomic coordinates of these potential alignments for the two ends were thenanalyzed to determine whether any combination would allow the two ends tobe aligned to the same chromosome with the correct orientation spanning aninsert size ≤600 bp and mapping to a single location in the reference humangenome. On average, 86% of the reads were aligned to the reference humangenome (hg19). For the detection of fetal de novo mutations from maternalplasma, an additional step of realignment using the BOWTIE2 software wasperformed (19). All sequenced reads covering the candidate mutation sitesand exhibiting the variant alleles were realigned using BOWTIE2.

Binomial Mixture Model Analysis for Paternal Inheritance. To deduce the pa-ternal inheritance of the fetus usingmaternal plasma DNA,we focused on SNPsat which the mother was homozygous (genotype AA) and the father washeterozygous (genotypeAB). At these SNPs, the genotypeof the fetuswouldbeeither AA or AB. Hence, we used a two-component binomial mixture model tofit the observed allelic counts at each SNP locus (21). For each SNP locus i, thecounts for the A and B alleles are denoted as ai and bi, respectively, and Ni isthe total number of reads covering the locus. Binomial mixture model wasthen used to determine if the fetus would be homozygous (AA) or hetero-zygous (AB) for each locus i based on the observed allelic counts ai and bi, thesequencing error rate, and the fetal DNA fraction (21).

GRAD Analysis for Maternal Inheritance. For SNPs in which the mother washeterozygous, we tested whether the two alleles were present in maternalplasma at the same or different concentrations using the sequential probabilityratio test (SPRT). An odds ratio of 20 was used for the calculation of thethreshold for accepting the null or the alternative hypothesis. The null hy-pothesis for each SPRT analysis was the absence of dosage imbalance betweenthe read counts for the twomaternal alleles. The alternative hypothesis was theoverrepresentation of one allele. The calculation of the upper and lowerboundaries of the SPRT curves was as previously described (8).

Dynamic Cutoff Analysis for de Novo Mutation. Dynamic cutoff filtering criteriawere developed to distinguish de novomutations of the fetus from sequencingerrors. The sequencing error rate was assumed to be 0.3% (30). The probabilitythat the same variant would be observed in a number of sequenced readsbecause of sequencing errors (Perr) would be calculated as

Perr =�Ni

bi

�× e×

�e3

�ðbi−1Þ,

where�

Ni

bi

�represents a mathematical combination function Ni!=bi!ðNi −biÞ!, e

represents the sequencing error rate, Ni represents the total number of se-quenced reads covering the locus i, and bi represents the number of sequencedreads carrying the variant. The dynamic cutoff values were determined basedon a theoretical false-positive rate of <1 in 3 × 109. In other words, the the-oretical number of false-positive sites caused by sequencing errors would beless than one for a haploid genome.

Plasma DNA End Position Analysis. To screen for genomic coordinates that werepreferred ending positions for maternally and fetally derived DNA fragments,fetal-specific informative SNP sites (i.e., at which the mother was homozygousand the fetus was heterozygous) and maternal-specific informative SNP sites(i.e., at which the fetus was homozygous and the mother was heterozygous)

Chan et al. PNAS | Published online October 31, 2016 | E8167

MED

ICALSC

IENCE

SPN

ASPL

US

SEECO

MMEN

TARY

Dow

nloa

ded

by g

uest

on

June

14,

202

0

Page 10: Second generation noninvasive fetal genome …Second generation noninvasive fetal genome analysis reveals de novo mutations, single-base parental inheritance, and preferred DNA ends

were identified. For each informative SNP site, all of the sequencing readscovering the informative SNP sites were analyzed.

For the analysis of SNPs where the mother was homozygous (genotype AA)and the fetus was heterozygous (genotype AB), the A allele would be the“shared allele,” and the B allele would be the fetal-specific allele. The numberof sequenced reads carrying the shared allele and the fetal-specific allelewould be counted. In the size distribution of plasma DNA, a peak would beobserved at 166 bp for both the fetally and maternally derived DNA. If thefragmentation of the plasma DNA had been random, the two ends would beevenly distributed across a region 166 bp upstream and 166 bp downstream ofthe informative SNP. A P value would be calculated to determine if a particularposition has a significantly increased probability for being an end for the readscarrying the shared allele or the fetal-specific allele based on a Poisson prob-ability function:

P   value= Poisson�Nactual, Npredict

�,

where Poisson() is the Poisson probability function, Nactual is the actual numberof reads ending at the particular nucleotide, and Npredict is the total number ofreads divided by 166.

A P value of <0.01 was used as a cutoff to define preferred ending positionsfor the reads carrying the fetal-specific allele or the shared allele.

For the SNPs where the mother was heterozygous (AB) and the fetus washomozygous, the reads carrying the maternal-specific allele (B allele) and theshared alleles were analyzed.

Expected Proportion of Fragments Sharing Two Identical Ends. The probabilityof having n fragments sharing the same ending position on one side, P(n), wascalculated using a Poisson probability function assuming that the average sizeof plasma DNA fragments was 166 bp and that the average sequencing depthwas 270×:

PðnÞ= Poisson�n,

270166

�.

The probability of two fragments having the same size (Psize) can be calcu-lated from the size distribution of the plasma DNA fragments:

Psize =X600

size=20

½%ðsizeÞ�2,

where %(size) is the proportion of fragments having the particular size.Therefore, the overall probability of finding two fragments with identical

ends on both sides (Pboth) can be calculated as

Pboth =X∞n=2

PðnÞ×�n2

�× Psize,

where�

n

2

�represents a mathematical combination function n!=2!ðn−2Þ!.

ACKNOWLEDGMENTS. This work was supported by the Hong Kong ResearchGrants Council Theme-Based Research Scheme T12-403/15. Y.M.D.L. wassupported by an endowed chair from the Li Ka Shing Foundation.

1. Lo YM, et al. (1997) Presence of fetal DNA in maternal plasma and serum. Lancet350(9076):485–487.

2. Chiu RW, et al. (2008) Noninvasive prenatal diagnosis of fetal chromosomal aneu-ploidy by massively parallel genomic sequencing of DNA in maternal plasma. ProcNatl Acad Sci USA 105(51):20458–20463.

3. Chiu RW, et al. (2011) Non-invasive prenatal assessment of trisomy 21 by multiplexedmaternal plasma DNA sequencing: Large scale validity study. BMJ 342:c7401.

4. Bianchi DW, et al.; CARE Study Group (2014) DNA sequencing versus standard pre-natal aneuploidy screening. N Engl J Med 370(9):799–808.

5. Norton ME, et al. (2015) Cell-free DNA analysis for noninvasive examination of tri-somy. N Engl J Med 372(17):1589–1597.

6. Srinivasan A, Bianchi DW, Huang H, Sehnert AJ, Rava RP (2013) Noninvasive detectionof fetal subchromosome abnormalities via deep sequencing of maternal plasma. Am JHum Genet 92(2):167–176.

7. Yu SC, et al. (2013) Noninvasive prenatal molecular karyotyping from maternalplasma. PLoS One 8(4):e60968.

8. Lo YM, et al. (2010) Maternal plasma DNA sequencing reveals the genome-widegenetic and mutational profile of the fetus. Sci Transl Med 2(61):61ra91.

9. Fan HC, et al. (2012) Non-invasive prenatal measurement of the fetal genome. Nature487(7407):320–324.

10. Kitzman JO, et al. (2012) Noninvasive whole-genome sequencing of a human fetus.Sci Transl Med 4(137):137ra76.

11. Fan HC, Wang J, Potanina A, Quake SR (2011) Whole-genome molecular haplotypingof single cells. Nat Biotechnol 29(1):51–57.

12. Kitzman JO, et al. (2011) Haplotype-resolved genome sequencing of a Gujarati Indianindividual. Nat Biotechnol 29(1):59–63.

13. Lam KW, et al. (2012) Noninvasive prenatal diagnosis of monogenic diseases by targetedmassively parallel sequencing of maternal plasma: Application to β-thalassemia. ClinChem 58(10):1467–1475.

14. New MI, et al. (2014) Noninvasive prenatal diagnosis of congenital adrenal hyper-plasia using cell-free fetal DNA in maternal plasma. J Clin Endocrinol Metab 99(6):E1022–E1030.

15. Zeevi DA, et al. (2015) Proof-of-principle rapid noninvasive prenatal diagnosis ofautosomal recessive founder mutations. J Clin Invest 125(10):3757–3765.

16. Snyder MW, Kircher M, Hill AJ, Daza RM, Shendure J (2016) Cell-free DNA Comprisesan in vivo nucleosome footprint that informs its tissues-of-origin. Cell 164(1-2):57–68.

17. Straver R, Oudejans CB, Sistermans EA, Reinders MJ (2016) Calculating the fetalfraction for noninvasive prenatal testing based on genome-wide nucleosome profiles.Prenat Diagn 36(7):614–621.

18. Chandrananda D, Thorne NP, Bahlo M (2015) High-resolution characterization ofsequence signatures due to non-random cleavage of cell-free DNA. BMC MedGenomics 8:29.

19. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. NatMethods 9(4):357–359.

20. Li R, et al. (2009) SOAP2: An improved ultrafast tool for short read alignment.Bioinformatics 25(15):1966–1967.

21. Jiang P, et al. (2012) FetalQuant: Deducing fractional fetal DNA concentration frommassively parallel sequencing of DNA in maternal plasma. Bioinformatics 28(22):2883–2890.

22. Lun FM, et al. (2008) Noninvasive prenatal diagnosis of monogenic diseases by digitalsize selection and relative mutation dosage on DNA in maternal plasma. Proc NatlAcad Sci USA 105(50):19920–19925.

23. Veltman JA, Brunner HG (2012) De novo mutations in human genetic disease. Nat RevGenet 13(8):565–575.

24. Chitty LS, et al. (2015) Non-invasive prenatal diagnosis of achondroplasia and tha-natophoric dysplasia: Next-generation sequencing allows for a safer, more accurate,and comprehensive approach. Prenat Diagn 35(7):656–662.

25. Vajo Z, Francomano CA, Wilkin DJ (2000) The molecular and genetic basis of fibro-blast growth factor receptor 3 disorders: The achondroplasia family of skeletal dys-plasias, Muenke craniosynostosis, and Crouzon syndrome with acanthosis nigricans.Endocr Rev 21(1):23–39.

26. Saito H, Sekizawa A, Morimoto T, Suzuki M, Yanaihara T (2000) Prenatal DNA di-agnosis of a single-gene disorder from maternal plasma. Lancet 356(9236):1170.

27. Chen S, et al. (2013) Haplotype-assisted accurate non-invasive fetal whole genomerecovery through maternal plasma sequencing. Genome Med 5(2):18.

28. Yu SC, et al. (2014) Size-based molecular diagnostics using plasma DNA for non-invasive prenatal testing. Proc Natl Acad Sci USA 111(23):8583–8588.

29. Sun K, et al. (2015) Plasma DNA tissue mapping by genome-wide methylation se-quencing for noninvasive prenatal, cancer, and transplantation assessments. Proc NatlAcad Sci USA 112(40):E5503–E5512.

30. Ross MG, et al. (2013) Characterizing and measuring bias in sequence data.Genome Biol 14(5):R51.

E8168 | www.pnas.org/cgi/doi/10.1073/pnas.1615800113 Chan et al.

Dow

nloa

ded

by g

uest

on

June

14,

202

0