prevalence, phenotype and architecture of developmental ...developmental disorders have already been...
TRANSCRIPT
1
Prevalence,phenotypeandarchitectureofdevelopmentaldisorderscausedbydenovomutationTheDecipheringDevelopmentalDisordersStudy
AbbreviationsPTV:Protein-TruncatingVariantDNM:DeNovoMutationDD:DevelopmentalDisorderDDD:DecipheringDevelopmentalDisordersstudy
KeyWordsDenovomutation;DevelopmentalDisease;Seizures;IntellectualDisability;PhenIcons;AverageFaces;ANKRD11;ARID1B;KMT2A;DDX3X;ADNP;MED13L;DYRK1A;EP300;SCN2A;SETD5;KCNQ2;MECP2;SYNGAP1;ASXL3;SATB2;TCF4;CDK13;CREBBP;DYNC1H1;FOXP1;PPP2R5D;PURA;CTNNB1;KAT6A;SMARCA2;STXBP1;EHMT1;ITPR1;KAT6B;NSD1;SMC1A;TBL1XR1;CASK;CHD2;CHD4;HDAC8;USP9X;WDR45;AHDC1;CSNK2A1;GNAI1;GNAO1;HNRNPU;KANSL1;KIF1A;MEF2C;PACS1;SLC6A1;CNOT3;CTCF;EEF1A2;FOXG1;GATAD2B;GRIN2B;IQSEC2;POGZ;PUF60;SCN8A;TCF20;BCL11A;BRAF;CDKL5;NFIX;PTPN11;AUTS2;CHAMP1;CNKSR2;DNM1;KCNH1;NAA10;PPM1D;ZBTB18;ZMYND11;ASXL1;COL4A3BP;KCNQ3;MSL3;MYT1L;PDHA1;PPP2R1A;SMAD4;TRIO;WAC;CHD8;GABRB3;KDM5B;PTEN;QRICH1;SET;ZC4H2;ALG13;SCN1A;SUV420H1;SLC35A2
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
2
AbstractIndividualswithsevere,undiagnoseddevelopmentaldisorders(DDs)areenrichedfordamagingdenovomutations(DNMs)indevelopmentallyimportantgenes.Weexomesequenced4,293familieswithindividualswithDDs,andmeta-analysedthesedatawithpublisheddataon3,287individualswithsimilardisorders.Weshowthatthemostsignificantfactorsinfluencingthediagnosticyieldofdenovomutationsarethesexoftheaffectedindividual,therelatednessoftheirparentsandtheageofbothfatherandmother.Weidentified94genesenrichedfordamagingdenovomutationatgenome-widesignificance(P<7x10-7),including14genesforwhichcompellingdataforcausationwaspreviouslylacking.Wehavecharacterisedthephenotypicdiversityamongthesegeneticdisorders.Wedemonstratethat,atcurrentcostdifferentials,exomesequencinghasmuchgreaterpowerthangenomesequencingfornovelgenediscoveryingeneticallyheterogeneousdisorders.Weestimatethat42%ofourcohortcarrypathogenicDNMs(singlenucleotidevariantsandindels)incodingsequences,withapproximatelyhalfoperatingbyaloss-of-functionmechanism,andtheremainderresultinginaltered-function(e.g.activating,dominantnegative).Weestablishedthatmosthaploinsufficientdevelopmentaldisordershavealreadybeenidentified,butthatmanyaltered-functiondisordersremaintobediscovered.ExtrapolatingfromtheDDDcohorttothegeneralpopulation,weestimatethatdevelopmentaldisorderscausedbyDNMshaveanaveragebirthprevalenceof1in213to1in448(0.22-0.47%oflivebirths),dependingonparentalage.
MaintextApproximately2-5%ofchildrenarebornwithmajorcongenitalmalformationsand/ormanifestsevereneurodevelopmentaldisordersduringchildhood1,2.Whilediversemechanismscancausesuchdevelopmentaldisorders,includinggestationalinfectionandmaternalalcoholconsumption,damaginggeneticvariationindevelopmentallyimportantgeneshasamajorcontribution.SeveralrecentstudieshaveidentifiedasubstantialcausalroleforDNMsnotpresentineitherparent3-15.DespitetheidentificationofmanydevelopmentaldisorderscausedbyDNMs,itisgenerallyacceptedthatmanymoresuchdisordersawaitdiscovery15,andtheoverallcontributionofDNMstodevelopmentaldisordersisnotknown.Moreover,somepathogenicDNMscompletelyablatethefunctionoftheencodedprotein,whereasothersalterthefunctionoftheencodedprotein16;therelativecontributionsofthesetwomechanisticclassesisalsonotknown.Werecruited4,293individualstotheDecipheringDevelopmentalDisorders(DDD)study15.Eachoftheseindividualswasreferredwithsevereundiagnoseddevelopmentaldisordersandmostweretheonlyaffectedfamilymember.Wesystematicallyphenotypedtheseindividualsandsequencedtheexomesoftheseindividualsandtheirparents.Analysesof1,133ofthesetriosweredescribedpreviously15,17.Wegeneratedahighsensitivitysetof8,361candidateDNMsincodingorsplicingsequence(meanof1.95DNMsperproband),whileremovingsystematicerroneouscalls(SupplementaryTable1).1,624genescontainedtwoormoreDNMsinunrelatedindividuals.Twenty-threepercentofindividualshadlikelypathogenicprotein-truncatingormissenseDNMswithintheclinicallycuratedsetofgenesrobustlyassociatedwithdominantdevelopmentaldisorders17.Weinvestigatedfactorsassociatedwithwhetheranindividual
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
3
hadalikelypathogenicDNMinthesecuratedgenes(Figure1A,B).WeobservedthatmaleshadalowerchanceofcarryingalikelypathogenicDNM(P=1.8x10-4;OR0.75,0.65-0.8795%CI),ashasalsobeenobservedinautism18.WealsoobservedincreasedlikelihoodofhavingapathogenicDNMwiththeextentofspeechdelay(P=0.00123),butnototherindicatorsofseverityrelativetotherestofthecohort.Furthermore,thetotalgenomicextentofautozygosity(duetoparentalrelatedness)wasnegativelycorrelatedwiththelikelihoodofhavingapathogenicDNM(P=1.7x10-7),foreverylog10increaseinautozygouslength,theprobabilityofhavingapathogenicDNMdroppedby7.5%,likelyduetoincreasingburdenofrecessivecausation(Figure1C).Nonetheless,6%ofindividualswithautozygosityequivalenttoafirstcousinunionorgreaterhadaplausiblypathogenicDNM,underscoringtheimportanceofconsideringdenovocausationinallfamilies.PaternalagehasbeenshowntobetheprimaryfactorinfluencingthenumberofDNMsinachild19,20,andthusisexpectedtobeariskfactorforpathogenicDNMs.PaternalagewasonlyweaklyassociatedwithlikelihoodofhavingapathogenicDNM(P=0.016).However,focusingontheminorityofDNMsthatweretruncatingandmissensevariantsinknownDD-associatedgeneslimitsourpowertodetectsuchaneffect.Analysingall8,409highconfidenceexonicandintronicautosomalDNMsconfirmedastrongpaternalageeffect(P=1.4x10-10,1.53DNMs/year,1.07-2.0195%CI),aswellashighlightingaweaker,independent,maternalageeffect(P=0.0019,0.86DNMs/year,0.32-1.4095%CI,Figure1D,E),ashasrecentlybeendescribedinwholegenomeanalyses21.WeidentifiedgenessignificantlyenrichedfordamagingDNMsbycomparingtheobservedgene-wiseDNMcounttothatexpectedunderanullmutationmodel22,asdescribedpreviously15.Wecombinedthisanalysiswith4,224publishedDNMsin3,287affectedindividualsfromthirteenexomeorgenomesequencingstudies(SupplementaryTable2)3-14thatexhibitedasimilarexcessofDNMsinourcuratedsetofDD-associatedgenes(SupplementaryFigure1).Wefound93geneswithgenome-widesignificance(P<5×10-7,Figure2),80ofwhichhadpriorevidenceofDD-association(SupplementaryTable3).Wehavedevelopedvisualsummariesofthephenotypesassociatedwitheachgenetofacilitateclinicaluse.Inaddition,wecreatedanonymisedaveragefaceimagesfromindividualswithDNMsingenome-widesignificantgenes(Figure2).Theseimageshighlightfacialdysmorphologiesspecifictocertaingenes.ToassessanyincreaseinpowertodetectnovelDD-associatedgenes,weexcludedindividualswithlikelypathogenicvariantsinknownDD-associatedgenes15,leaving3,158probandsfromourcohort,alongwith2,955probandsfromthemeta-analysisstudies.Inthissubset,fourteengenesforwhichnostatistically-compellingpriorevidenceforDDcausationwasavailableachievedgenome-widesignificance:CDK13,CHD4,CNOT3,CSNK2A1,GNAI1,KCNQ3,MSL3,PPM1D,PUF60,QRICH1,SET,SUV420H1,TCF20,andZBTB18(P<5x10-7,Table1,SupplementaryFigure4).TheclinicalfeaturesassociatedwiththesenewlyconfirmeddisordersaresummarisedinFigure3,SupplementaryFigure2andSupplementaryFigure3.QRICH1wouldnotachievegenome-widesignificancewithoutexcludingindividualswithlikelypathogenicvariantsinDD-associatedgenes.InadditiontodiscoveringnovelDD-associatedgenes,weidentifiedseveralnewdisorderslinkedtoknownDD-associatedgenes,butwithdifferentmodesofinheritanceormolecularmechanisms.WefoundUSP9XandZC4H2hadagenome-widesignificantexcessofDNMsinfemaleprobands,indicatingthesegeneshaveX-linkeddominantmodesofinheritanceinadditiontopreviouslyreportedX-linkedrecessivemode
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
4
ofinheritanceinmales23,24.Inaddition,wefoundtruncatingmutationsinSMC1Awerestronglyassociatedwithanovelseizuredisorder(P=6.5x10-19),whilein-frame/missensemutationsinSMC1Awithdominantnegativeeffects25areaknowncauseofCorneliadeLangeSyndrome(CdLS).IndividualswithtruncatingmutationsinSMC1AlackedthecharacteristicfacialdysmorphologyofCdLS.Wethenexploredtwoapproachesforintegratingphenotypicdataintodiseasegeneassociation:statisticalassessmentofHumanPhenotypeOntology(HPO)termsimilaritybetweenindividualssharingcandidateDNMsinthesamegene(aswedescribedpreviously26)andphenotypicstratificationbasedonspecificclinicalcharacteristics.CombininggeneticevidenceandHPOtermsimilarityincreasedthesignificanceofsomeknownDD-associatedgenes.However,significancedecreasedforalargernumberofgenescausingsevereDDbutassociatedwithnondiscriminatoryHPOterms(SupplementaryFigure5A).Althoughwedidnotincorporatecategoricalphenotypicsimilarityinthegenediscoveryanalysesdescribedabove,thesystematicacquisitionofphenotypicdataonaffectedindividualswithinDDDenabledaggregaterepresentationstobecreatedforeachgeneachievinggenome-widesignificance.Wepresenttheseintheformoficon-basedsummariesofgrowthanddevelopmentalmilestones(PhenIcons),heatmapsoftherecurrentlycodedHPOtermsand,wheresufficientfaceimageswereavailable,ananonymisedaveragefacialrepresentation(SupplementaryFigure3).TwentypercentofindividualshadHPOtermswhichindicatedseizuresand/orepilepsy.Wecomparedanalysiswithinthisphenotypicallystratifiedgroupwithgene-wiseanalysesoftheentirecohort,toseeifitincreasedpowertodetectknownseizure-associatedgenes(SupplementaryFigure5B).Fifteenseizure-associatedgenesweregenome-widesignificantinboththeseizure-onlyandtheentire-cohortanalyses.Nineseizure-associatedgenesweregenome-widesignificantintheentirecohortbutnotintheseizuresubset.Ofthe285individualswithtruncatingormissenseDNMsinknownseizure-associatedgenes,56%ofindividualshadnocodedtermsrelatedtoseizures/epilepsy.Thesefindingssuggestthatthepowerofincreasedsamplesizefaroutweighsspecificphenotypicexpressivityduetothesharedgeneticetiologybetweenindividualswithandwithoutepilepsyinourcohort.Thelargenumberofgenome-widesignificantgenesidentifiedintheanalysesaboveallowsustocompareempiricallydifferentexperimentalstrategiesfornovelgenediscoveryinageneticallyheterogeneouscohort.Wecomparedthepowerofexomeandgenomesequencingtodetectgenome-widesignificantgenes,assumingthatbudgetandnotsamplesarelimiting,underdifferentscenariosofcostratiosandsensitivityratios(Figure4).Atcurrentcostratios(exomecosts30-40%ofagenome)andwithaplausiblesensitivitydifferential(genomedetects5%moreexonicvariantsthanexome27)exomesequencingdetectsmorethantwiceasmanygenome-widesignificantgenes.Theseempiricalestimateswereconsistentwithpowersimulationsforidentifyingdominantloss-of-functiongenes(SupplementaryFigure6).Insummary,whilegenomesequencinggivesgreatestsensitivitytodetectpathogenicvariationinasingleindividual(oroutsideofthecodingregion),exomesequencingismorepowerfulfornoveldiseasegenediscovery(and,analogously,likelydeliverslowercostperdiagnosis).
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
5
Ourprevioussimulationssuggestedthatanalysisofacohortof4,293DDDfamiliesoughttobeabletodetectapproximatelyhalfofallhaploinsufficientDD-associatedgenesatgenome-widesignificance15.Empirically,wehaveidentified47%(50/107)ofhaploinsufficientgenespreviouslyrobustlyassociatedwithneurodevelopmentaldisorders17.Wehypothesisedthatgenetictestingpriortorecruitmentintoourstudymayhavedepletedthecohortofthemostclinicallyrecognisabledisorders.Indeed,weobservedthatthegenesassociatedwiththemostclinicallyrecognisabledisorderswereassociatedwithasignificant,three-foldlowerenrichmentoftruncatingDNMsthanotherDD-associatedgenes(~40-foldenrichmentvs~120-foldenrichment,Figure5A).Removingthesemostrecognisabledisordersfromtheanalysis,weidentified55%(42/76)oftheremaininghaploinsufficientDD-associatedgenes.TheknownDD-associatedhaploinsufficientgenesthatdidnotreachgenome-widesignificancewereclearlyenrichedforthosewithlowermutability,whichwewouldexpecttolowerpowertodetectinouranalyses.WeidentifiedDD-associatedgenes(e.g.NRXN2)withhighmutability,lowclinicalrecognisabilityandyetnosignalofenrichmentforDNMsinourcohort(SupplementaryFigure7).Ouranalysescallintoquestionwhetherthesegenesreallyareassociatedwithhaploinsufficientneurodevelopmentaldisordersandhighlightsthepotentialforwell-poweredgenediscoveryanalysestorefutepriorcredenceregardingdiseasegeneassociations.WeestimatedthelikelyprevalenceofpathogenicmissenseandtruncatingDNMswithinourcohortbyincreasingthestringencyofcalledDNMsuntiltheobservedsynonymousDNMsequatedthatexpectedunderthenullmutationmodel(SupplementaryFigure8A),thenquantifyingtheexcessofobservedmissenseandtruncatingDNMsacrossallgenes(Figure5B).Weobservedanexcessof576truncatingand1,220missensemutations,suggesting41.8%(1,796/4,293)ofthecohorthasapathogenicDNM.ThisestimateofthenumberofexcessmissenseandtruncatingDNMsinourcohortisrobusttovaryingthestringencyofDNMcalling(SupplementaryFigure8B).ThevastmajorityofsynonymousDNMsarelikelytobebenign,asevidencedbythembeingdistributeduniformly(Figure5C)amonggenesirrespectiveoftheirtoleranceoftruncatingvariationinthegeneralpopulation(asquantifiedbytheprobabilityofbeingLoF-intolerant(pLI)metric28).Bycontrast,missenseandtruncatingDNMsaresignificantlyenrichedingeneswiththehighestprobabilitiesofbeingintolerantoftruncatingvariation(Figure5D).Only51%(923/1,796)oftheseexcessmissenseandtruncatingDNMsarelocatedinDD-associateddominantgenes,withtheremainderlikelytoaffectgenesnotyetassociatedwithDDs.AmuchhigherproportionoftheexcesstruncatingDNMs(71%)thanmissenseDNMs(42%)affectedknownDD-associatedgenes.ThissuggeststhatwhereasmosthaploinsufficientDD-associatedgeneshavealreadybeenidentified,manyDD-associatedgenescharacterisedbypathogenicmissenseDNMsremaintobediscovered.Understandingthemechanismofactionofamonogenicdisorderisanimportantprerequisitefordesigningtherapeuticstrategies29.Wesoughttoestimatetherelativeproportionofaltered-functionandloss-of-functionmechanismsamongtheexcessDNMsinourcohort,byassumingthatthevastmajorityoftruncatingmutationsoperatebyaloss-of-functionmechanismandusingtwoindependentapproachestoestimatetherelativecontributionofthetwomechanismsamongtheexcessmissenseDNMs(Methods).First,weusedtheobservedratiooftruncatingandmissenseDNMswithinhaploinsufficientDD-associatedgenestoestimatetheproportionoftheexcessmissenseDNMsthatlikelyactby
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
6
loss-of-function(Figure5C).Thisapproachestimatedthat47%(42-51%95%CI)ofexcessmissenseandtruncatingDNMsoperatebyloss-of-function,and53%byaltered-function.Second,wetookadvantageofthedifferentpopulationgeneticcharacteristicsofknownaltered-functionandloss-of-functionDD-associatedgenes.Specifically,weobservedthatthesetwoclassesofDD-associatedgenesaredifferentiallydepletedoftruncatingvariationinindividualswithoutovertdevelopmentaldisorders(pLImetric28).WemodelledtheobservedpLIdistributionofexcessmissenseDNMsasamixtureofthepLIdistributionsofknownaltered-functionandloss-of-functionDD-associatedgenes(Figure5E,F),andestimatedthat63%(50-76%95%CI)ofexcessmissenseDNMslikelyactbyaltered-functionmechanisms.IncorporatingthetruncatingDNMsoperatingbyaloss-of-functionmechanism,thisapproachestimatedthat57%(48-66%95%CI)ofexcessmissenseandtruncatingDNMsoperatebyloss-of-functionand43%byaltered-function.WeestimatedthebirthprevalenceofmonoallelicdevelopmentaldisordersbyusingthegermlinemutationmodeltocalculatetheexpectedcumulativegermlinemutationrateoftruncatingDNMsinhaploinsufficientDD-associatedgenesandscalingthisupwardsbasedonthecompositionofexcessDNMsintheDDDcohortdescribedabove(seeMethods),correctingfordisordersthatareunder-representedinourcohortasaresultofpriorgenetictesting(e.g.clinically-recognisabledisordersandlargepathogenicCNVsidentifiedbypriorchromosomalmicroarrayanalysis).Thisgivesameanprevalenceestimateof0.34%(0.31-0.3795%CI),or1in295births.Byfactoringinthepaternalandmaternalageeffectsonthemutationrate(Figure1)wemodelledage-specificestimatesofbirthprevalence(Figure6)thatrangefrom1in448(bothmotherandfatheraged20)to1in213(bothmotherandfatheraged45).Insummary,wehaveshownthatdenovomutationsaccountforapproximatelyhalfofthegeneticarchitectureofseveredevelopmentaldisorders,andaresplitroughlyequallybetweenloss-of-functionandaltered-function.WhereasmosthaploinsufficientDD-associatedgeneshavealreadybeenidentified,currentlymanyactivatinganddominantnegativeDD-associatedgeneshaveeludeddiscovery.Thiselusivenesslikelyresultsfromthesedisordersbeingindividuallyrarer,beingcausedbyarelativelysmallnumberofmissensemutationswithineachgene.Discoveryoftheremainingdominantdevelopmentaldisordersrequireslargerstudiesandnovel,morepowerful,analyticalstrategiesfordisease-geneassociationthatleveragegene-specificpatternsofpopulationvariation,specificallytheobserveddepletionofdamagingvariation.TheintegrationofaccurateandcompletequantitativeandcategoricalphenotypicdataintotheanalysiswillimprovethepowertoidentifyultrarareDDwithdistinctiveclinicalpresentations.Wehaveestimatedthemeanbirthprevalenceofdominantmonogenicdevelopmentaldisorderstobearound1in295,whichisgreaterthanthecombinedimpactoftrisomies13,18and2130andhighlightsthecumulativepopulationmorbidityandmortalityimposedbytheseindividuallyraredisorders.
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
7
MethodsFamilyrecruitmentAt24clinicalgeneticscenterswithintheUnitedKingdom(UK)NationalHealthServiceandtheRepublicofIreland,4,293patientswithsevere,undiagnoseddevelopmentaldisordersandtheirparents(4,125families)wererecruitedandsystematicallyphenotyped.ThestudyhasUKResearchEthicsCommitteeapproval(10/H0305/83,grantedbytheCambridgeSouthResearchEthicsCommitteeandGEN/284/12,grantedbytheRepublicofIrelandResearchEthicsCommittee).Familiesgaveinformedconsentforparticipation.Clinicaldata(growthmeasurements,familyhistory,developmentalmilestones,etc.)werecollectedusingastandardrestricted-termquestionnairewithinDECIPHER31,anddetaileddevelopmentalphenotypesfortheindividualswereenteredusingHumanPhenotypeOntology(HPO)terms32.Salivasamplesforthewholefamilyandblood-extractedDNAsamplesfortheprobandswerecollected,processedandqualitycontrolledaspreviouslydescribed15.ExomesequencingGenomicDNA(approximately1μg)wasfragmentedtoanaveragesizeof150base-pairs(bp)andsubjectedtoDNAlibrarycreationusingestablishedIlluminapaired-endprotocols.Adaptor-ligatedlibrarieswereamplifiedandindexedviapolymerasechainreaction(PCR).Aportionofeachlibrarywasusedtocreateanequimolarpoolcomprisingeightindexedlibraries.EachpoolwashybridizedtoSureSelectribonucleicacid(RNA)baits(AgilentHumanAll-ExonV3PluswithcustomELIDC0338371andAgilentHumanAll-ExonV5PluswithcustomELIDC0338371)andsequencetargetswerecapturedandamplifiedinaccordancewiththemanufacturer'srecommendations.Enrichedlibrariesweresubjectedto75-basepaired-endsequencing(IlluminaHiSeq)followingthemanufacturer'sinstructions.
Alignmentandcallingsinglenucleotidevariants,insertionsanddeletionsMappingofshort-readsequencesforeachsequencinglaneletwascarriedoutusingtheBurrows-WheelerAligner(BWA;version0.59)33backtrackalgorithmwiththeGRCh371000GenomesProjectphase2reference(alsoknownashs37d5).Sample-levelBAMimprovementwascarriedoutusingtheGenomeAnalysisToolkit(GATK;version3.1.1)34andSAMtools(version0.1.19)35.Thisconsistedofarealignmentofreadsaroundknownanddiscoveredindelsfollowedbybasequalityscorerecalibration(BQSR),withbothstepsperformedusingGATK.Lastly,SAMtoolscalmdwasappliedandindexeswerecreated.KnownindelsforrealignmentweretakenfromtheMillsDevineand1000GenomesProjectGoldsetandthe1000GenomesProjectphaselow-coverageset,bothpartoftheGATKresourcebundle(version2.2).KnownvariantsforBQSRweretakenfromdbSNP137,alsopartoftheGATKresourcebundle.Finally,singlenucleotidevariants(SNVs)andindelswerecalledusingtheGATKHaplotypeCaller(version3.2.2);thiswasruninmultisamplecallingmodeusingthecompletedataset.GATKVariantQualityScoreRecalibration(VQSR)wasthencomputedonthewholedatasetandappliedtotheindividual-samplevariantcallingformat(VCF)files.DeNovoGear(version0.54)36wasusedtodetectSNV,insertionanddeletiondenovomutations(DNMs)fromchildandparentalexomedata(BAMfiles).
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
8
VariantannotationVariantsintheVCFwereannotatedwithminorallelefrequency(MAF)datafromavarietyofdifferentsources.TheMAFannotationsusedincludeddatafromfourdifferentpopulationsofthe1000GenomesProject37(AMR,ASN,AFRandEUR),theUK10Kcohort,theNHLBIGOExomeSequencingProject(ESP),theNon-FinnishEuropean(NFE)subsetoftheExomeAggregationConsortium(ExAC)andaninternalallelefrequencygeneratedusingunaffectedparentsfromthecohort.VariantsintheVCFwereannotatedwithEnsemblVariantEffectPredictor(VEP)38basedonEnsemblgenebuild76.ThetranscriptwiththemostsevereconsequencewasselectedandallassociatedVEPannotationswerebasedonthepredictedeffectofthevariantonthatparticulartranscript;wheremultipletranscriptssharedthesamemostsevereconsequence,thecanonicalorlongestwasselected.Weincludedanadditionalconsequenceforvariantsatthelastbaseofanexonbeforeanintron,wherethefinalbaseisaguanine,sincethesevariantsappeartobeasdamagingasasplicedonorvariant26.WecategorizedvariantsintothreeclassesbyVEPconsequence:
1. protein-truncatingvariants(PTV):splicedonor,spliceacceptor,stopgained,frameshift,initiatorcodon,andconservedexonterminusvariant.
2. missensevariants:missense,stoplost,inframedeletion,inframeinsertion,codingsequence,andproteinalteringvariant.
3. silentvariants:synonymous.DenovomutationfilteringWefilteredcandidateDNMcallstoreducethefalsepositiveratebutmaximizesensitivity,basedonpriorresultsfromexperimentalvalidationbycapillarysequencingofcandidateDNMs15.CandidateDNMswereexcludedifnotcalledbyGATKinthechild,orcalledineitherparent,oriftheyhadamaximumMAFgreaterthan0.01.CandidateDNMswereexcludedwhentheforwardandreversecoveragedifferedbetweenreferenceandalternativealleles,definedasP<10-3fromaFisher’sexacttestofcoveragefromorientationbyallelesummedacrossthechildandparents.CandidateDNMswerealsoexcludediftheymettwoofthethreefollowingthreecriteria:1)anexcessofparentalalternativealleleswithinthecohortattheDNMsposition,definedasP<10-3underaone-sidedbinomialtestgivenanexpectederrorrateof0.002andthecumulativeparentaldepth;2)anexcessofalternativealleleswithinthecohortinDNMsinagene,definedasP<10-3underaone-sidedbinomialtestgivenanexpectederrorrateof0.002andthecumulativedepth,or3)bothparentshadoneormorereadssupportingthealternativeallele.If,afterfiltering,morethanonevariantwasobservedinagivengeneforaparticulartrio,onlythevariantwiththehighestpredictedfunctionalimpactwaskept(proteintruncating>missense>silent).SourcecodeforfilteringcandidateDNMscanbefoundhere:https://github.com/jeremymcrae/denovoFilter
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
9
DenovomutationvalidationForcandidateDNMsofinterest,primersweredesignedtoamplify150-250bpproductscenteredaroundthesiteofinterest.Defaultprimer3designsettingswereusedwiththefollowingadjustments:GCclamp=1,humanmispriminglibraryused.Site-specificprimersweretailedwithIlluminaadaptersequences.PCRproductsweregeneratedwithJumpStartAccuTaqLADNApolymerase(SigmaAldrich),using40nggenomicDNAastemplate.AmpliconsweretaggedwithIlluminaPCRprimersalongwithuniquebarcodesenablingmultiplexingof96samples.BarcodeswereincorporatedusingKapaHiFimastermix(KapaBiosystems).SampleswerepooledandsequenceddownonelaneoftheIlluminaMiSeq,using250bppairedendreads.Anin-houseanalysispipelineextractedthereadcountpersiteandclassifiedinheritancestatuspervariantusingamaximumlikelihoodapproach.IndividualswithlikelypathogenicvariantsWepreviouslyscreened1,133individualsforvariantsthatcontributetotheirdisorder15,17.Allcandidatevariantsinthe1,133individualswerereviewedbyconsultantclinicalgeneticistsforrelevancetotheindividuals’phenotypes.Mostdiagnosablepathogenicvariantsoccurreddenovoindominantgenes,butasmallproportionalsooccurredinrecessivegenesorunderotherinheritancemodes.DNMswithindominantDD-associatedgeneswereverylikelytobeclassifiedasthepathogenicvariantfortheindividuals’disorder.Duetothetimerequiredtoreviewindividualsandtheircandidatevariants,wedidnotconductasimilarreviewintheremainderofthe4,293individuals.InsteadwedefinedlikelypathogenicvariantsascandidateDNMsfoundinautosomalandX-linkeddominantDD-associatedgenes,orcandidateDNMsfoundinhemizygousDD-associatedgenesinmales.1,136individualsinthe4,293cohorthadvariantseitherpreviouslyclassifiedaspathogenic15,17,orhadalikelypathogenicDNM.Gene-wiseassessmentofDNMsignificanceGene-specificgermlinemutationratesfordifferentfunctionalclasseswerecomputed15,22forthelongesttranscriptintheunionoftranscriptsoverlappingtheobservedDNMsinthatgene.Weevaluatedthegene-specificenrichmentofPTVandmissenseDNMsbycomputingitsstatisticalsignificanceunderanullhypothesisoftheexpectednumberofDNMsgiventhegene-specificmutationrateandthenumberofconsideredchromosomes22.WealsoassessedclusteringofmissenseDNMswithingenes15,asexpectedforDNMsoperatingbyactivatingordominantnegativemechanisms.WedidthisbycalculatingsimulateddispersionsoftheobservednumberofDNMswithinthegene.TheprobabilityofsimulatingaDNMataspecificcodonwasweightedbythetrinucleotidesequence-context15,22.Thisallowedustoestimatetheprobabilityoftheobserveddegreeofclusteringgiventhenullmodelofrandommutations.Fisher’smethodwasusedtocombinethesignificancetestingofmissense+PTVDNMenrichmentandmissenseDNMclustering.WedefinedageneassignificantlyenrichedforDNMsifthePTVenrichmentP-valueorthecombinedmissenseP-valuelessthan7×10-7,whichrepresentsaBonferonnicorrectedP-valueof0.05adjustedfor4×18500tests(2×consequenceclassestested×proteincodinggenes).
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
10
CompositefacegenerationFamiliesweregiventheoptiontohavephotographsoftheaffectedindividual(s)uploadedwithinDECIPHER31.UsingimagesofindividualswithDNMsinthesamegenewegeneratedde-identifiedrealisticaveragefaces(compositefaces).Facesweredetectedusingadiscriminatelytraineddeformablepartmodeldetector39.Theannotationalgorithmidentifiedasetof36landmarksperdetectedface40andwastrainedonamanuallyannotateddatasetof3100images41.TheaveragefacemeshwascreatedbytheDelaunaytriangulationoftheaverageconstellationoffaciallandmarksforallpatientswithasharedgeneticdisorder.Theaveragingalgorithmissensitivetoleft-rightfacialasymmetriesacrossmultiplepatients.Forthispurpose,weuseatemplateconstellationoflandmarksbasedontheaverageconstellationsof2000healthyindividuals41.Foreachpatient,wealigntheconstellationoflandmarkstothetemplatewithrespecttothepointsalongthemiddleofthefaceandcomputetheEuclideandistancesbetweeneachlandmarkanditscorrespondingpaironthetemplate.Thefacesaremirroredsuchthatthehalfofthefacewiththegreaterdifferenceisalwaysonthesameside.Thedatasetusedforthisworkmaycontainmultiplephotosforonepatient.Toavoidbiasingtheaveragefacemeshtowardstheseindividuals,wecomputedanaveragefaceforeachpatientandusethesepersonalaveragestocomputethefinalaverageface.Finally,toavoidanyimageinthecompositedominatingfromvarianceinilluminationbetweenimages,wenormalisedtheintensitiesofpixelvalueswithinthefacetoanaveragevalueacrossallfacesineachaverage.Thecompositefaceswereexaminedmanuallytoconfirmsuccessfulablationofanyindividuallyidentifiablefeatures.AssessingpowerofincorporatingphenotypicinformationWepreviouslydescribedamethodtoassessphenotypicsimilaritybyHPOtermsamonggroupsofindividualssharinggeneticdefectsinthesamegene26.Weexaminedwhetherincorporatingthisstatisticaltestimprovedourabilitytoidentifydominantgenesatgenome-widesignificance.Pergene,wetestedthephenotypicsimilarityofindividualswithDNMsinthegene.WecombinedthephenotypicsimilarityP-valuewiththegenotypicP-valuepergene(theminimumP-valuefromtheDDD-onlyandmeta-analysis)usingFisher’smethod.WeexaminedthedistributionofdifferencesinP-valuebetweentestswithoutthephenotypicsimilarityP-valueandteststhatincorporatedthephenotypicsimilarityP-value.Many(854,20%)oftheDDDcohortexperienceseizures.Weinvestigatedwhethertestingwithinthesubsetofindividualswithseizuresimprovedourabilitytofindassociationsforseizurespecificgenes.Alistof102seizure-associatedgeneswascuratedfromthreesources,agenepanelforOhtaharasyndrome,acurrentlyusedclinicalgenepanelforepilepsyandapanelderivedfromDD-associatedgenes17.TheP-valuesfromtheseizuresubsetwerecomparedtoP-valuesfromthecompletecohort.AssessingpowerofexomevsgenomesequencingWecomparedtheexpectedpowerofexomesequencingversusgenomesequencingtoidentifydiseasegenes.WithintheDDDcohort,55dominantDD-associatedgenesachievegenome-widesignificancewhentestingforenrichmentofDNMswithingenes.Wedidnot
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
11
incorporatemissenseDNMclusteringduetothelargecomputationalrequirementsforassessingclusteringinmanyreplicates.Weassumedacostof1,000USDperindividualforgenomesequencing.Weallowedthecostofexomesequencingtovaryrelativetogenomesequencing,from10-100%.Wecalculatedthenumberoftriosthatcouldbesequencedunderthesescenarios.EstimatesoftheimprovedpowerofgenomesequencingtodetectDNMsinthecodingsequencearearound1.05-fold27andweincreasedthenumberoftriosby1.0–1.2-foldtoallowthis.Wesampledasmanyindividualsfromourcohortasthenumberoftriosandcountedwhichofthe55DD-associatedgenesstillachievedgenome-widesignificanceforDNMenrichment.Weran1000simulationsofeachconditionandobtainedthemeannumberofgenome-widesignificantgenesforeachcondition.AssociationswithpresenceoflikelypathogenicdenovomutationsWetestedwhetherphenotypeswereassociatedwiththelikelihoodofhavingalikelypathogenicDNM.Categoricalphenotypes(e.g.sexcodedasmaleorfemale)weretestedbyFisher’sexacttestwhilequantitativephenotypes(e.g.durationofgestationcodedinweeks)weretestedwithlogisticregression,usingsexasacovariate.WeinvestigatedwhetherhavingautozygousregionsaffectedthelikelihoodofhavingadiagnosticDNM.Autozygousregionsweredeterminedfromgenotypesineveryindividual,toobtainthetotallengthperindividual.WefittedalogisticregressionforthetotallengthofautozygousregionsonwhetherindividualshadalikelypathogenicDNM.ToillustratetherelationshipbetweenlengthofautozygosityandtheoccurrenceofalikelypathogenicDNM,wegroupedtheindividualsbylengthandplottedtheproportionofindividualsineachgroupwithaDNMagainstthemedianlengthofthegroup.TheeffectsofparentalageonthenumberofDNMswereassessedusing8,409highconfidence(posteriorprobabilityofDNM>0.5)unphasedcodingandnoncodingDNMsin4,293individuals.APoissonmultipleregressionwasfitonthenumberofDNMsineachindividualwithbothmaternalandpaternalageatthechild’sbirthascovariates.Themodelwasfitwiththeidentitylinkandallowedforoverdispersion.Thismodelusedexome-basedDNMs,andtheanalysiswasscaledtothewholegenomebymultiplyingthecoefficientsbyafactorof50,basedon~2%ofthegenomebeingwellcoveredinourdata(exons+introns).ExcessofdenovomutationsbyconsequenceWeidentifiedthethresholdforposteriorprobabilityofDNMatwhichthenumberofobservedcandidatesynonymousDNMsequalledthenumberofexpectedsynonymousDNMs.CandidateDNMswithscoresbelowthisthresholdwereexcluded.WealsoexaminedthelikelysensitivityandspecificityofthisthresholdbasedonvalidationresultsforDNMswithinapreviouspublication15inwhichcomprehensiveexperimentalvalidationwasperformedon1,133triosthatcompriseasubsetofthefamiliesanalysedhere.ThenumbersofexpectedDNMspergenewerecalculatedperconsequencefromexpectedmutationratespergeneandthe2,407maleand1,886femalesinthecohort.Wecalculated
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
12
theexcessofDNMsformissenseandPTVsastheratioofnumbersofobservedDNMsversusexpectedDNMs,aswellasthedifferenceofobservedDNMsminusexpectedDNMs.AscertainmentbiaswithindominantneurodevelopmentalgenesWeidentified150autosomaldominanthaploinsufficientgenesthataffectedneurodevelopmentwithinourcurateddevelopmentaldisordergeneset.Genesaffectingneurodevelopmentwereidentifiedwheretheaffectedorgansincludedthebrain,orwhereHPOphenotypeslinkedtodefectsinthegeneincludedeitheranabnormalityofbrainmorphology(HP:0012443)orcognitiveimpairment(HP:0100543)term.The150geneswereclassifiedforeaseofclinicalrecognitionofthesyndromefromgenedefectsbytwoconsultantclinicalgeneticists.Geneswereratedfrom1(leastrecognisable)to5(mostrecognisable).Categories1and2contained5and22genesrespectively,andsowerecombinedinlateranalyses.Theremainingcategorieshadmorethan33genespercategory.Theratioofobservedloss-of-functionDNMstoexpectedloss-of-functionDNMswascalculatedforeachrecognisabilitycategory,alongwith95%confidenceintervalsfromaPoissondistributiongivenobservedcounts.Proportionofdenovomutationswithloss-of-functionmechanismTheobservedexcessofmissense/inframeindelDNMsiscomposedofamixtureofDNMswithloss-of-functionmechanismsandDNMswithaltered-functionmechanisms.WefoundthattheexcessofPTVDNMswithindominanthaploinsufficientDD-associatedgeneshadagreaterskewtowardsgeneswithhighintoleranceforloss-of-functionvariantsthantheexcessofmissenseDNMsindominantnon-haploinsufficientgenes.Webinnedgenesbytheprobabilityofbeingloss-of-functionintolerant28constraintdecileandcalculatedtheobservedexcessofmissenseDNMsineachbin.Wemodelledthisbinneddistributionasatwo-componentmixturewiththecomponentsrepresentingDNMswithaloss-of-functionorfunction-alteringmechanism.Weidentifiedtheoptimalmixingproportionfortheloss-of-functionandaltered-functionDNMsfromthelowestgoodness-of-fit(fromasplinefittedtothesum-of-squaresofthedifferencesperdecile)tomissense/inframeindelsinallgenesacrossarangeofmixtures.TheexcessofDNMswithaloss-of-functionmechanismwascalculatedastheexcessofDNMswithaVEPloss-of-functionconsequence,plustheproportionoftheexcessofmissenseDNMsattheoptimalmixingproportion.Weindependentlyestimatedtheproportionsofloss-of-functionandaltered-function.WecountedPTVandmissense/inframeindelDNMswithindominanthaploinsufficientgenestoestimatetheproportionofexcessDNMswithaloss-of-functionmechanism,butwhichwereclassifiedasmissense/inframeindel.WeestimatedtheproportionofexcessDNMswithaloss-of-functionmechanismasthePTVexcessplusthePTVexcessmultipliedbytheproportionofloss-of-functionclassifiedasmissense.PrevalenceofdevelopmentaldisordersfromdominantdenovomutationsWeestimatedthebirthprevalenceofmonoallelicdevelopmentaldisordersbyusingthegermlinemutationmodel.Wecalculatedtheexpectedcumulativegermlinemutationrate
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
13
oftruncatingDNMsin238haploinsufficientDD-associatedgenes.WescaledthisupwardsbasedonthecompositionofexcessDNMsintheDDDcohortusingtheratioofexcessDNMs(n=1816)toDNMswithindominanthaploinsufficientDD-associatedgenes(n=412).Around10%ofDDsarecausedbydenovoCNVs42,43,whichareunderrepresentedinourcohortasaresultofpriorgenetictesting.Ifincluded,theexcessDNMinourcohortwouldincreaseby21%,thereforewescaledtheprevalenceestimateupwardsbythisfactor.Mothersaged29.9andfathersaged29.5havechildrenwith77DNMspergenomeonaverage20.WecalculatedthemeannumberofDNMsexpectedunderdifferentcombinationsofparentalages,givenourestimatesoftheextraDNMsperyearfromoldermothersandfathers.Wescaledtheprevalencetodifferentcombinationsofparentalagesusingtheratioofexpectedmutationsatagivenagecombinationtothenumberexpectedatthemeancohortparentalages.
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
14
References1. Sheridan,E.etal.Riskfactorsforcongenitalanomalyinamultiethnicbirthcohort:
ananalysisoftheBorninBradfordstudy.Lancet382,1350-9(2013).2. Ropers,H.H.Geneticsofearlyonsetcognitiveimpairment.AnnuRevGenomicsHum
Genet11,161-87(2010).3. DeLigt,J.etal.Diagnosticexomesequencinginpersonswithsevereintellectual
disability.TheNewEnglandJournalofMedicine367,1921-9(2012).4. DeRubeis,S.etal.Synaptic,transcriptionalandchromatingenesdisruptedin
autism.Nature515,209-215(2014).5. Epi4KConsortium&EpilepsyPhenome/GenomeProject.Denovomutationsin
epilepticencephalopathies.Nature501,217-21(2013).6. EuroEPINOMICS-RESConsortium,EpilepsyPhenome/GenomeProject&Epi4K
Consortium.DenovomutationsinsynaptictransmissiongenesincludingDNM1causeepilepticencephalopathies.AmJHumGenet95,360-70(2014).
7. Fromer,M.etal.Denovomutationsinschizophreniaimplicatesynapticnetworks.Nature506,179-184(2014).
8. Gilissen,C.etal.Genomesequencingidentifiesmajorcausesofsevereintellectualdisability.Nature511,344-7(2014).
9. Iossifov,I.etal.Thecontributionofdenovocodingmutationstoautismspectrumdisorder.Nature515,216-221(2014).
10. Iossifov,I.etal.DeNovoGeneDisruptionsinChildrenontheAutisticSpectrum.Neuron74,285-299(2012).
11. O’Roak,B.J.etal.Sporadicautismexomesrevealahighlyinterconnectedproteinnetworkofdenovomutations.Nature485,1-7(2012).
12. Rauch,A.etal.Rangeofgeneticmutationsassociatedwithseverenon-syndromicsporadicintellectualdisability:anexomesequencingstudy.Lancet380,1674-82(2012).
13. Sanders,S.J.etal.Denovomutationsrevealedbywhole-exomesequencingarestronglyassociatedwithautism.Nature485,237-41(2012).
14. Zaidi,S.etal.Denovomutationsinhistone-modifyinggenesincongenitalheartdisease.Nature498,220-3(2013).
15. TheDecipheringDevelopmentalDisordersStudy.Large-scalediscoveryofnovelgeneticcausesofdevelopmentaldisorders.Nature519,223-228(2015).
16. Wilkie,A.O.Themolecularbasisofgeneticdominance.JMedGenet31,89-98(1994).
17. Wright,C.F.etal.GeneticdiagnosisofdevelopmentaldisordersintheDDDstudy:ascalableanalysisofgenome-wideresearchdata.TheLancet(2014).
18. Jacquemont,S.etal.Ahighermutationalburdeninfemalessupportsa"femaleprotectivemodel"inneurodevelopmentaldisorders.AmJHumGenet94,415-25(2014).
19. Kong,A.etal.Rateofdenovomutationsandtheimportanceoffather'sagetodiseaserisk.Nature488,471-5(2012).
20. Rahbari,R.etal.Timing,ratesandspectraofhumangermlinemutation.NatGenet48,126-33(2016).
21. Wong,W.S.etal.Newobservationsonmaternalageeffectongermlinedenovomutations.NatCommun7,10486(2016).
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
15
22. Samocha,K.E.etal.Aframeworkfortheinterpretationofdenovovariationinhumandisease.NatureGenetics46,944-950(2014).
23. Hirata,H.etal.ZC4H2mutationsareassociatedwitharthrogryposismultiplexcongenitaandintellectualdisabilitythroughimpairmentofcentralandperipheralsynapticplasticity.AmJHumGenet92,681-95(2013).
24. Homan,C.C.etal.MutationsinUSP9XareassociatedwithX-linkedintellectualdisabilityanddisruptneuronalcellmigrationandgrowth.AmJHumGenet94,470-8(2014).
25. Liu,J.etal.SMC1AexpressionandmechanismofpathogenicityinprobandswithX-LinkedCorneliadeLangesyndrome.HumMutat30,1535-42(2009).
26. Akawi,N.etal.Discoveryoffourrecessivedevelopmentaldisordersusingprobabilisticgenotypeandphenotypematchingamong4,125families.NatureGenetics47,1363-1369(2015).
27. Meynert,A.M.,Ansari,M.,FitzPatrick,D.R.&Taylor,M.S.Variantdetectionsensitivityandbiasesinwholegenomeandexomesequencing.BMCBioinformatics15,247(2014).
28. Lek,M.etal.Analysisofprotein-codinggeneticvariationin60,706humans.bioRxivX,XX-XX(2015).
29. Boycott,K.M.,Vanstone,M.R.,Bulman,D.E.&Mackenzie,A.E.Rare-diseasegeneticsintheeraofnext-generationsequencing:discoverytotranslation.NatureReviewsGenetics14,681-91(2013).
30. Springett,A.etal.CongenitalAnomalyStatistics2011:EnglandandWales.(2013).31. Bragin,E.etal.DECIPHER:databasefortheinterpretationofphenotype-linked
plausiblypathogenicsequenceandcopy-numbervariation.NucleicAcidsRes42,D993-D1000(2014).
32. Köhler,S.etal.Clinicaldiagnosticsinhumangeneticswithsemanticsimilaritysearchesinontologies.AmericanJournalofHumanGenetics85,457-464(2009).
33. Li,H.&Durbin,R.FastandaccurateshortreadalignmentwithBurrows-Wheelertransform.Bioinformatics25,1754-1760(2009).
34. McKenna,A.etal.TheGenomeAnalysisToolkit:aMapReduceframeworkforanalyzingnext-generationDNAsequencingdata.GenomeRes20,1297-303(2010).
35. Li,H.etal.TheSequenceAlignment/MapformatandSAMtools.Bioinformatics25,2078-2079(2009).
36. Ramu,A.etal.DeNovoGear:denovoindelandpointmutationdiscoveryandphasing.NatureMethods10,985-7(2013).
37. Abecasis,G.R.etal.Anintegratedmapofgeneticvariationfrom1,092humangenomes.Nature491,56-65(2012).
38. McLaren,W.etal.DerivingtheconsequencesofgenomicvariantswiththeEnsemblAPIandSNPEffectPredictor.Bioinformatics26,2069-70(2010).
39. Felzenszwalb,P.F.,Girshick,R.B.,McAllester,D.&Ramanan,D.Objectdetectionwithdiscriminativelytrainedpart-basedmodels.IEEEtransactionsonpatternanalysisandmachineintelligence32,1627-45(2010).
40. Xiong,X.&DelaTorre,F.SupervisedDescentMethodandItsApplicationstoFaceAlignment.in2013IEEEConferenceonComputerVisionandPatternRecognition
(CVPR)532-539(IEEE,Portland,OR,2013).41. Ferry,Q.etal.Diagnosticallyrelevantfacialgestaltinformationfromordinary
photos.eLife3,e02020-e02020(2014).
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
16
42. Cooper,G.M.etal.Acopynumbervariationmorbiditymapofdevelopmentaldelay.NatGenet43,838-46(2011).
43. Sagoo,G.S.etal.ArrayCGHinpatientswithlearningdisability(mentalretardation)andcongenitalanomalies:updatedsystematicreviewandmeta-analysisof19studiesand13,926subjects.GenetMed11,139-46(2009).
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
17
AcknowledgmentsWethankthefamiliesfortheirparticipationandpatience.WearegratefultotheExomeAggregationConsortiumformakingtheirdataavailable.TheDDDstudypresentsindependentresearchcommissionedbytheHealthInnovationChallengeFund(grantHICF-1009-003),aparallelfundingpartnershipbetweentheWellcomeTrustandtheUKDepartmentofHealth,andtheWellcomeTrustSangerInstitute(grantWT098051).Theviewsexpressedinthispublicationarethoseoftheauthor(s)andnotnecessarilythoseoftheWellcomeTrustortheUKDepartmentofHealth.ThestudyhasUKResearchEthicsCommitteeapproval(10/H0305/83,grantedbytheCambridgeSouthResearchEthicsCommitteeandGEN/284/12,grantedbytheRepublicofIrelandResearchEthicsCommittee).TheresearchteamacknowledgesthesupportoftheNationalInstitutesforHealthResearch,throughtheComprehensiveClinicalResearchNetwork.TheauthorswishtothanktheSangerHumanGenomeInformaticsteam,theSampleManagementteam,theIlluminaHigh-Throughputteam,theNewPipelineGroupteam,theDNApipelinesteamandtheCoreSequencingteamfortheirsupportingeneratingandprocessingthedata.D.R.F.isfundedthroughanMRCHumanGeneticsUnitprogramgranttotheUniversityofEdinburgh.FinallywegratefullyacknowledgethecontributionoftwoesteemedDDDclinicalcollaborators,JohnTolmieandLouiseBrueton,whodiedinthecourseofthestudy.
AuthorContributionsJeremyFMcRae1,StephenClayton1,TomasWFitzgerald1,JoannaKaplanis1,ElenaPrigmore1,DianaRajan1,AlejandroSifrim1,StuartAitken2,NadiaAkawi1,MohsanAlvi3,KirstyAmbridge1,DanielMBarrett1,TanyaBayzetinova1,PhilipJones1,WendyDJones1,DanielKing1,NetravathiKrishnappa1,LauraEMason1,TarjinderSingh1,AdrianRTivey1,MunazaAhmed4,UrujAnjum5,HayleyArcher6,RuthArmstrong7,JanaAwada1,MeenaBalasubramanian8,SiddharthBanka9,DianaBaralle4,AngelaBarnicoat10,PaulBatstone11,DavidBaty12,ChrisBennett13,JonathanBerg12,BirgittaBernhard14,APaulBevan1,MariaBitner-Glindzicz10,EdwardBlair15,MoiraBlyth13,DavidBohanna16,LouiseBourdon14,DavidBourn17,LisaBradley18,AngelaBrady14,SimonBrent1,CaroleBrewer19,KateBrunstrom10,DavidJBunyan4,JohnBurn17,NatalieCanham14,BruceCastle19,KateChandler9,ElenaChatzimichali1,DeirdreCilliers15,AngusClarke6,SusanClasper15,JillClayton-Smith9,VirginiaClowes14,AndreaCoates13,TrevorCole16,IrinaColgiu1,AmandaCollins4,MoragNCollinson4,FionaConnell20,NicolaCooper16,HelenCox16,LaraCresswell21,GarethCross22,YanickCrow9,MariellaD'Alessandro11,TabibDabir18,RosemarieDavidson23,SallyDavies6,DylandeVries1,JohnDean11,CharuDeshpande20,GemmaDevlin19,AbhijitDixit22,AngusDobbie13,AlanDonaldson24,DianDonnai9,DeirdreDonnelly18,CarinaDonnelly9,AngelaDouglas25,SofiaDouzgou9,AlexisDuncan23,JacquelineEason22,SianEllard19,IanEllis25,FrancesElmslie5,KarenzaEvans6,SarahEverest19,TinaFendick20,RichardFisher17,FrancesFlinter20,NicolaFoulds4,AndrewFry6,AlanFryer25,CarolGardiner23,LorraineGaunt9,NeetiGhali14,RichardGibbons15,HarinderGill26,JudithGoodship17,DavidGoudie12,EmmaGray1,AndrewGreen26,PhilipGreene2,LynnGreenhalgh25,SusanGribble1,RachelHarrison22,LucyHarrison4,VictoriaHarrison4,RoseHawkins24,LiuHe1,StephenHellens17,AlexHenderson17,SarahHewitt13,LucyHildyard1,EmmaHobson13,SimonHolden7,MurielHolder14,SusanHolder14,GeorginaHollingsworth10,TessaHomfray5,MervynHumphreys18,JaneHurst10,BenHutton1,StuartIngram8,MelitaIrving20,LilyIslam16,AndrewJackson2,JoannaJarvis16,LucyJenkins10,DianaJohnson8,ElizabethJones9,DraganaJosifova20,ShelaghJoss23,BeckieKaemba21,SandraKazembe21,RosemaryKelsell1,BronwynKerr9,HelenKingston9,Usha
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
18
Kini15,EstherKinning23,GailKirby16,ClaireKirk18,EmmaKivuva19,AlisonKraus13,DhavendraKumar6,V.KAjithKumar10,KatherineLachlan4,WayneLam2,AnneLampe2,CarolineLangman20,MelissaLees10,DerekLim16,CherylLongman23,GordonLowther23,SallyALynch26,AlexMagee18,EddyMaher2,AlisonMale10,SaharMansour5,KarenMarks5,KatherineMartin22,UnaMaye25,EmmaMcCann27,VivienneMcConnell18,MerielMcEntagart5,RuthMcGowan11,KirstenMcKay16,ShaneMcKee18,DominicJMcMullan16,SusanMcNerlan18,CatherineMcWilliam11,SarjuMehta7,KayMetcalfe9,AnnaMiddleton1,ZosiaMiedzybrodzka11,EmmaMiles9,ShehlaMohammed20,TaraMontgomery17,DavidMoore2,SianMorgan6,JennyMorton16,HoodMugalaasi6,VictoriaMurday23,HelenMurphy9,SwatiNaik16,AndreaNemeth15,LouiseNevitt8,RuthNewbury-Ecob24,AndrewNorman16,RosieO'Shea26,CarolineOgilvie20,Kai-RenOng16,Soo-MiPark7,MichaelJParker8,ChiragPatel16,JoanPaterson7,StewartPayne14,DanielPerrett1,JuliePhipps15,DanielaTPilz23,MartinPollard1,CarolinePottinger27,JoannaPoulton15,NormanPratt12,KatrinaPrescott13,SuePrice15,AbigailPridham15,AnnieProcter6,HellenPurnell15,OliverQuarrell8,NicolaRagge16,RahelehRahbari1,JoshRandall1,JuliaRankin19,LucyRaymond7,DebbieRice12,LeemaRobert20,EileenRoberts24,JonathanRoberts7,PaulRoberts13,GillianRoberts25,AlisonRoss11,ElisabethRosser10,AnandSaggar5,ShalakaSamant11,JulianSampson6,RichardSandford7,AjoySarkar22,SusannSchweiger12,RichardScott10,IngridScurr24,AnnSelby22,AnnekeSeller15,CherylSequeira14,NoraShannon22,SabaSharif16,CharlesShaw-Smith19,EmmaShearing8,DebbieShears15,EamonnSheridan13,IngridSimonic7,RoldanSingzon14,ZaraSkitt9,AudreySmith13,KathSmith8,SarahSmithson24,LindaSneddon17,MirandaSplitt17,MirandaSquires13,FionaStewart18,HelenStewart15,VolkerStraub17,MohnishSuri22,VivienneSutton25,GaneshJawaharSwaminathan1,ElizabethSweeney25,KateTatton-Brown5,CatTaylor8,RohanTaylor5,MarkTein16,IKarenTemple4,JennyThomson13,MarcTischkowitz7,SusanTomkins24,AudreyTorokwa4,BeckyTreacy7,ClaireTurner19,PeterTurnpenny19,CarolynTysoe19,AnthonyVandersteen14,VinodVarghese6,PradeepVasudevan21,ParthibanVijayarangakannan1,JulieVogt16,EmmaWakeling14,SarahWallwark7,JonathonWaters10,AstridWeber25,DianaWellesley4,MargoWhiteford23,SaraWidaa1,SarahWilcox7,EmilyWilkinson1,DeniseWilliams16,NicolaWilliams23,LouiseWilson10,GeoffWoods7,ChristopherWragg24,MichaelWright17,LauraYates17,MichaelYau20,ChrisNellåker28,29,30,MichaelJParker31,HelenVFirth1,7,32,CarolineFWright1,32,DavidRFitzPatrick1,2,32,JeffreyCBarrett1,32,MatthewEHurles1,32
1WellcomeTrustSangerInstitute,WellcomeTrustGenomeCampus,Hinxton,Cambridge,CB101SA,UK
2MRCHumanGeneticsUnit,MRCIGMM,UniversityofEdinburgh,WesternGeneralHospital,Edinburgh,EH42XU,UK
3DepartmentofEngineeringScience,UniversityofOxford,ParksRoad,Oxford,OX13PJ,UK4WessexClinicalGeneticsService,UniversityHospitalSouthampton,PrincessAnneHospital,CoxfordRoad,Southampton,SO165YA,UKandWessexRegionalGeneticsLaboratory,SalisburyNHSFoundationTrust,SalisburyDistrictHospital,OdstockRoad,Salisbury,Wiltshire,SP28BJ,UKandFacultyofMedicine,UniversityofSouthampton,Building85,LifeSciencesBuilding,HighfieldCampus,Southampton,SO171BJ,UK
5SouthWestThamesRegionalGeneticsCentre,StGeorge'sHealthcareNHSTrust,StGeorge's,UniversityofLondon,CranmerTerrace,London,SW170RE,UK
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
19
6InstituteOfMedicalGenetics,UniversityHospitalOfWales,HeathPark,Cardiff,CF144XW,UKandDepartmentofClinicalGenetics,Block12,GlanClwydHospital,Rhyl,Denbighshire,LL185UJ,UK
7EastAnglianMedicalGeneticsService,Box134,CambridgeUniversityHospitalsNHSFoundationTrust,CambridgeBiomedicalCampus,Cambridge,CB20QQ,UK
8SheffieldRegionalGeneticsServices,SheffieldChildren'sNHSTrust,WesternBank,Sheffield,S102TH,UK
9ManchesterCentreforGenomicMedicine,StMary'sHospital,CentralManchesterUniversityHospitalsNHSFoundationTrust,ManchesterAcademicHealthScienceCentre,ManchesterM139WL,UK
10NorthEastThamesRegionalGeneticsService,GreatOrmondStreetHospitalforChildrenNHSFoundationTrust,GreatOrmondStreetHospital,GreatOrmondStreet,London,WC1N3JH,UK
11NorthofScotlandRegionalGeneticsService,NHSGrampian,DepartmentofMedicalGeneticsMedicalSchool,Foresterhill,Aberdeen,AB252ZD,UK
12EastofScotlandRegionalGeneticsService,HumanGeneticsUnit,PathologyDepartment,NHSTayside,NinewellsHospital,Dundee,DD19SY,UK
13YorkshireRegionalGeneticsService,LeedsTeachingHospitalsNHSTrust,DepartmentofClinicalGenetics,ChapelAllertonHospital,ChapeltownRoad,Leeds,LS74SA,UK
14NorthWestThamesRegionalGeneticsCentre,NorthWestLondonHospitalsNHSTrust,TheKennedyGaltonCentre,NorthwickParkAndStMark'sNHSTrustWatfordRoad,Harrow,HA13UJ,UK
15OxfordRegionalGeneticsService,OxfordRadcliffeHospitalsNHSTrust,TheChurchillOldRoad,Oxford,OX37LJ,UK
16WestMidlandsRegionalGeneticsService,BirminghamWomen'sNHSFoundationTrust,BirminghamWomen'sHospital,Edgbaston,Birmingham,B152TG,UK
17NorthernGeneticsService,NewcastleuponTyneHospitalsNHSFoundationTrust,InstituteofHumanGenetics,InternationalCentreforLife,CentralParkway,NewcastleuponTyne,NE13BZ,UK
18NorthernIrelandRegionalGeneticsCentre,BelfastHealthandSocialCareTrust,BelfastCityHospital,LisburnRoad,Belfast,BT97AB,UK
19PeninsulaClinicalGeneticsService,RoyalDevonandExeterNHSFoundationTrust,ClinicalGeneticsDepartment,RoyalDevon&ExeterHospital(Heavitree),GladstoneRoad,Exeter,EX12ED,UK
20SouthEastThamesRegionalGeneticsCentre,Guy'sandStThomas'NHSFoundationTrust,Guy'sHospital,GreatMazePond,London,SE19RT,UK
21LeicestershireGeneticsCentre,UniversityHospitalsofLeicesterNHSTrust,LeicesterRoyalInfirmary(NHSTrust),Leicester,LE15WW,UK
22NottinghamRegionalGeneticsService,CityHospitalCampus,NottinghamUniversityHospitalsNHSTrust,TheGables,HucknallRoad,NottinghamNG51PB,UK
23WestofScotlandRegionalGeneticsService,NHSGreaterGlasgowandClyde,InstituteOfMedicalGenetics,YorkhillHospital,Glasgow,G38SJ,UK
24BristolGeneticsService(Avon,Somerset,GloucsandWestWilts),UniversityHospitalsBristolNHSFoundationTrust,StMichael'sHospital,StMichael'sHill,Bristol,BS28DT,UK
25MerseysideandCheshireGeneticsService,LiverpoolWomen'sNHSFoundationTrust,DepartmentofClinicalGenetics,RoyalLiverpoolChildren'sHospitalAlderHey,EatonRoad,Liverpool,L122AP,UK
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
20
26NationalCentreforMedicalGenetics,OurLady'sChildren'sHospital,Crumlin,Dublin12,Ireland
27DeptartmentofClinicalGenetics,Block12,GlanClwydHospital,Rhyl,Denbighshire,Wales,LL185UJ,UK
28NuffieldDepartmentofObstetrics&Gynaecology,UniversityofOxford,Level3,Women'sCentre,JohnRadcliffeHospital,Oxford,OX39DU,UK
29InstituteofBiomedicalEngineering,DepartmentofEngineeringScience,UniversityofOxford,OldRoadCampusResearchBuilding,Oxford,OX37DQ,UK
30BigDataInstitute,UniversityofOxford,Rooseveltdrive,Oxford,OX37LF,UK31TheEthoxCentre,NuffieldDepartmentofPopulationHealth,UniversityofOxford,OldRoadCampus,Oxford,OX37LF,UK
32Theseauthorsjointlysupervisedthiswork.Patientrecruitmentandphenotyping:M.Ahmed,U.A.,H.A.,R.A.,M.Balasubramanian,S.
Banka,D.Baralle,A.Barnicoat,P.B.,D.Baty,C.Bennett,J.Berg,B.B.,M.B-G.,E.B.,M.Blyth,D.Bohanna,L.Bourdon,D.Bourn,L.Bradley,A.Brady,C.Brewer,K.B.,D.J.B.,J.Burn,N.Canham,B.C.,K.C.,D.C.,A.Clarke,S.Clasper,J.C-S.,V.C.,A.Coates,T.C.,A.Collins,M.N.C.,F.C.,N.Cooper,H.C.,L.C.,G.C.,Y.C.,M.D.,T.D.,R.D.,S.Davies,J.D.,C.Deshpande,G.D.,A.Dixit,A.Dobbie,A.Donaldson,D.Donnai,D.Donnelly,C.Donnelly,A.Douglas,S.Douzgou,A.Duncan,J.E.,S.Ellard,I.E.,F.E.,K.E.,S.Everest,T.F.,R.F.,F.F.,N.F.,A.Fry,A.Fryer,C.G.,L.Gaunt,N.G.,R.G.,H.G.,J.G.,D.G.,A.G.,P.G.,L.Greenhalgh,R.Harrison,L.Harrison,V.H.,R.Hawkins,S.Hellens,A.H.,S.Hewitt,E.H.,S.Holden,M.Holder,S.Holder,G.H.,T.H.,M.Humphreys,J.H.,S.I.,M.I.,L.I.,A.J.,J.J.,L.J.,D.Johnson,E.J.,D.Josifova,S.J.,B.Kaemba,S.K.,B.Kerr,H.K.,U.K.,E.Kinning,G.K.,C.K.,E.Kivuva,A.K.,D.Kumar,V.A.K.,K.L.,W.L.,A.L.,C.Langman,M.L.,D.L.,C.Longman,G.L.,S.A.L.,A.Magee,E.Maher,A.Male,S.Mansour,K.Marks,K.Martin,U.M.,E.McCann,V.McConnell,M.M.,R.M.,K.McKay,S.McKee,D.J.M.,S.McNerlan,C.M.,S.Mehta,K.Metcalfe,Z.M.,E.Miles,S.Mohammed,T.M.,D.M.,S.Morgan,J.M.,H.Mugalaasi,V.Murday,H.Murphy,S.N.,A.Nemeth,L.N.,R.N-E.,A.Norman,R.O.,C.O.,K-R.O.,S-M.P.,M.J.Parker,C.Patel,J.Paterson,S.Payne,J.Phipps,D.T.P.,C.Pottinger,J.Poulton,N.P.,K.P.,S.Price,A.Pridham,A.Procter,H.P.,O.Q.,N.R.,J.Rankin,L.Raymond,D.Rice,L.Robert,E.Roberts,J.Roberts,P.R.,G.R.,A.R.,E.Rosser,A.Saggar,S.Samant,J.S.,R.Sandford,A.Sarkar,S.Schweiger,R.Scott,I.Scurr,A.Selby,A.Seller,C.S.,N.S.,S.Sharif,C.S-S.,E.Shearing,D.S.,E.Sheridan,I.Simonic,R.Singzon,Z.S.,A.Smith,K.S.,S.Smithson,L.S.,M.Splitt,M.Squires,F.S.,H.S.,V.Straub,M.Suri,V.Sutton,E.Sweeney,K.T-B.,C.Taylor,R.T.,M.Tein,I.K.T.,J.T.,M.Tischkowitz,S.T.,A.T.,B.T.,C.Turner,P.T.,C.Tysoe,A.V.,V.V.,P.Vasudevan,J.V.,E.Wakeling,S.Wallwark,J.W.,A.W.,D.Wellesley,M.Whiteford,S.Wilcox,D.Williams,N.W.,L.W.,G.W.,C.W.,M.Wright,L.Y.,M.Y.,H.V.F.,D.R.F.
Sampleanddataprocessing:S.Clayton,T.W.F.,E.P.,D.Rajan,K.A.,D.M.B.,T.B.,P.J.,N.K.,
L.E.M.,A.R.T.,A.P.B.,S.Brent,E.C.,I.C.,E.G.,S.G.,L.Hildyard,B.H.,R.K.,D.P.,M.P.,J.Randall,G.J.S.,S.Widaa,E.Wilkinson
Validationexperiments:J.F.M.,E.P.,D.Rajan,A.Sifrim,N.K.,C.F.W.Studydesign:M.J.Parker,H.V.F.,C.F.W.,D.R.F.,J.C.B.,M.E.H.
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
21
Methoddevelopmentanddataanalysis:J.F.M.,S.Clayton,T.W.F.,J.K.,E.P.,D.Rajan,A.
Sifrim,S.A.,N.A.,M.Alvi,P.J.,W.D.J.,D.King,T.S.,J.A.,D.d.V.,L.He,R.R.,G.J.S.,P.Vijayarangakannan,C.N.,H.V.F.,C.F.W.,D.R.F.,J.C.B.,M.E.H.
Datainterpretation:J.F.M.,H.V.F.,C.F.W.,D.R.F.,J.C.B.,M.E.H.Writing:J.F.M.,C.F.W.,D.R.F.,M.E.H.Experimentalandanalyticalsupervision:M.J.Parker,H.V.F.,C.F.W.,D.R.F.,J.C.B.,M.E.H.ProjectSupervision:M.E.H.
AuthorInformationExomesequencingdataareaccessibleviatheEuropeanGenome-phenomeArchive(EGA)underaccessionEGAS00001000775.DetailsofDD-associatedgenesareavailableatwww.ebi.ac.uk/gene2phenotype.M.E.H.isaco-founderof,andholdssharesin,CongenicaLtd,ageneticsdiagnosticcompany.CorrespondenceandrequestsformaterialsshouldbeaddressedtoM.E.H([email protected]).
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
22
TablesTable1:Genesachievinggenome-widesignificantstatisticalevidencewithoutpreviouscompellingevidenceforbeingdevelopmentaldisordergenes.Thenumbersofunrelatedindividualswithindependentdenovomutations(DNMs)aregivenforproteintruncatingvariants(PTV)andmissensevariants.Ifanyadditionalindividualswereinothercohorts,thatnumberisgiveninbrackets.TheP-valuereportedistheminimumP-valuefromthetestingoftheDDDdatasetorthemeta-analysisdataset.ThesubsetprovidingtheP-valueisalsolisted.MutationsareconsideredclusterediftheP-valueproximityclusteringofDNMsislessthan0.01.
Gene Missense PTV P-value Test ClusteringCDK13 10 1 3.2x10-19 DDD YesGNAI1 7(1) 1 2.1x10-13 DDD NoCSNK2A1 7 0 1.4x10-12 DDD YesPPM1D 0 5(1) 6.3x10-12 Meta NoCNOT3 5 2(1) 5.2x10-11 DDD YesMSL3 0 4 2.2x10-10 DDD NoKCNQ3 4(3) 0 3.4x10-10 Meta YesZBTB18 1(1) 4 1.4x10-9 DDD NoPUF60 4(1) 3 2.6x10-9 DDD NoTCF20 1 5 2.7x10-9 DDD NoSUV420H1 0(2) 2(3) 2.9x10-9 Meta NoCHD4 8(1) 1 7.6x10-9 DDD NoSET 0 3 1.2x10-7 DDD NoQRICH1 0 3(1) 3.6x10-7 Meta No
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
23
SupplementaryTablesTableprovidedinexternalspreadsheet.SupplementaryTable1:Tableofdenovomutations(DNM)inthe4,293DDDindividuals.Thetableincludessex,chromosome,position,referenceandalternatealleles,HGNCsymbol,VEPconsequence,posteriorprobabilityofDNMandvalidationstatuswhereavailable.IndividualIDsareavailableonrequest.Thislistexcludesthesitesthatfailedvalidations,butincludessitesthatpassedvalidation(confirmed),sitesthatwereuncertain(uncertain),andsitesthatwerenottestedbysecondaryvalidation(NA).GenomepositionsaregivenasGRCh37coordinates.
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
24
SupplementaryTable2:Detailsofcohortsusedinmeta-analyses.Thisincludesnumbersofindividualsbysexandpublicationdetails.
Phenotype Year Male Female Note CitationIntellectualdisability 2012 47 53 DeLigt,etal.3
Autismspectrumdisorder 2012 314 29 subsetofIossifov,etal.9 Iossifov,etal.10
Autismspectrumdisorder 2012 151 58 subsetofIossifov,etal.10 O’Roak,etal.11
Intellectualdisability 2012 19 32 Rauch,etal.12
Autismspectrumdisorder 2012 157 68 subsetofIossifov,etal.9 Sanders,etal.13
Seizures 2013 156 108subsetofEuroEPINOMICS-RESConsortium,etal.6
Epi4KConsortiumandEpilepsyPhenome/GenomeProject5
Congenitalheartdisease 2013 220 142 Zaidi,etal.14
Seizures 2014 54 38 EuroEPINOMICS-RESConsortium,etal.6
Schizophrenia 2014 308 317 Fromer,etal.7
Intellectualdisability 2014 0 0 subsetofDeLigt,etal.3 Gilissen,etal.8
Autismspectrumdisorder(normalIQ) 2014 1099 74CountsareforindividualswithIQ>=70.
Iossifov,etal.9
Autismspectrumdisorder 2014 446 112 ProbandswithIQ<70. Iossifov,etal.9
Autismspectrumdisorder 2014 1192 253Countsareextrapolatedfromthesexratioofindividualswithdenovomutations.
DeRubeis,etal.4
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
25
Tableprovidedinexternalspreadsheet.SupplementaryTable3:Geneswithgenome-widesignificantstatisticalevidencetobedevelopmentaldisordergenes.Thenumbersofunrelatedindividualswithindependentdenovomutations(DNMs)aregivenforproteintruncatingvariants(PTV)andmissensevariants.Ifanyadditionalindividualswereinothercohorts,thatnumberisgiveninbrackets.TheP-valuereportedistheminimumP-valuefromthetestingoftheDDDdatasetorthemeta-analysisdataset.ThesubsetprovidingtheP-valueisalsolisted.MutationsareconsideredclusterediftheP-valueproximityclusteringofDNMsislessthan0.01.
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
26
Figures
Figure1:Associationofphenotypeswithpresenceoflikelypathogenicdenovomutations(DNMs).A)Oddsratiosand95%confidenceintervals(CI)forbinaryphenotypes.PositiveoddsratiosareassociatedwithincreasedriskofpathogenicDNMswhenthephenotypeispresent.P-valuesaregivenforaFisher’sExacttest.B)Betacoefficientsand95%CIfromlogisticregressionofquantitativephenotypesversuspresenceofapathogenicDNM.Allphenotypesasidefromlengthofautozygousregionswerecorrectedforgenderasacovariate.Thedevelopmentalmilestones(agetoachievefirstwords,walkindependently,sitindependentlyandsocialsmile)werelog-scaledbeforeregression.Thegrowthparameters(height,birthweightandoccipitofrontalcircumference(OFC))wereevaluatedasabsolutedistancefromthemedian.C)RelationshipbetweenlengthofautozygousregionschanceofhavingapathogenicDNM.Theregressionlineisplottedasthedarkgrayline.The95%confidenceintervalfortheregressionisshadedgray.Theautozygositylengthsexpectedunderdifferentdegreesofconsanguineousunionsareshownasverticaldashedlines.n,numberofindividualsineachautozygositygroup.D)RelationshipbetweenageoffathersatbirthofchildandnumberofhighconfidenceDNMs.n,numberofhighconfidenceDNMs.E)RelationshipbetweenageofmothersatbirthofchildandnumberofhighconfidenceDNMs.n,numberofhighconfidenceDNMs.
1st c
ousi
n
3rd
cous
in
2nd
cous
in
107 108
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
prop
ortio
n w
ith p
atho
geni
c de
nov
o m
utat
ion
summed length of autozygosity (bp)
C
>0-10 (n=3165)
20-100 (n=129)
100-1000 (n=203)
Autozygous length (Mb)
10-20 (n=745)
A B
C D E
20 30 40 50
1.5
2.0
2.5
3.0
Father's age (years)
high
con
fiden
ce m
utat
ions
(n)
high
con
fiden
ce m
utat
ions
(n)
20 30 4025Mother's age (years)
35
1.5
2.0
2.5
3.0
0.6 0.8 1.0 1.2 1.4
assisted reproduction P = 0.584abnormal scan P = 0.071
bleeding P = 0.346
feeding problems
male sex
maternal illness P = 0.278
P = 0.0358
P = 0.000182
Odds ratio
neonatal intensive care P = 0.190
Pos
t -na
tal
Pre
-nat
al
-0.2 0.0 0.2 0.4Beta
autozygosity lengthmother's age
father's agegestation
age at assessment
OFCbirthweight
heightphenotypic terms (n)
social smilesat independently
walked independentlyfirst words
P = 1.7 x 10-7P = 0.0626P = 0.0164P = 0.164P = 0.0248
P = 0.147P = 0.715P = 0.699P = 0.0444
P = 0.307P = 0.399P = 0.0274P = 0.00123
Dev
elop
men
tal
mile
ston
esG
row
thA
ge
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
27
Figure2:Genesexceedinggenome-widesignificance.ManhattanplotofcombinedP-valuesacrossalltestedgenes.Thereddashedlineindicatesthethresholdforgenome-widesignificance(P<7x10-7).GenesexceedingthisthresholdhaveHGNCsymbolslabelled.CompositefacialimagesfromindividualswithDNMsinselectedgenesareincludedforthesixmost-significantlyassociatedgenes.
AHDC1
POGZ
GATAD2B
KDM5BKCNH1
ZBTB18
HNRNPU
MYT1L
BCL11A
SCN2A
SCN1A
SATB2
KIF1AITPR1
SETD5
BRPF1
SLC6A1
CTNNB1
FOXP1
TBL1XR1
TRIO BTF3COL4A3BP
MEF2C
PURA
NSD1
SYNGAP1
PPP2R5D
ARID1B
CDK13
AUTS2
GNAI1
BRAF
KAT6A
KCNQ3 PUF60
SMARCA2
STXBP1
DNM1
SET
EHMT1WAC
KAT6B
PTEN
PACS1
SUV420H1
KMT2A
CHD4
GRIN2B
SCN8APTPN11
MED13L
LRRC43 CHAMP1
CHD8
FOXG1
DYNC1H1GABRB3
CHD2CREBBP
GNAO1
CTCF
ANKRD11
CHD3
KANSL1
PPM1D
ASXL3
SMAD4
TCF4
NFIXPPP2R1A
CNOT3
CSNK2A1
ASXL1
ADNP
KCNQ2
EEF1A2
DYRK1A
EP300
TCF20MSL3
CDKL5
PDHA1
CNKSR2
USP9X
DDX3X
CASK
SLC35A2
WDR45
IQSEC2SMC1A
ZC4H2
HDAC8ALG13
NAA10
MECP2
-log 10(P)
0
10
20
30
40
50
60
70
1 2 3 4 5 6 7 8 910
1112
13141516171819202122X
Chromosome
SYNGAP1 ARID1B KMT2A DDX3XANKRD11ADNP
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
28
Figure3:Phenotypicsummaryofgeneswithoutpreviouscompellingevidence.Phenotypesaregroupedbytype.ThefirstgroupindicatescountsofindividualswithDNMspergenebysex(m:male,f:female),andbyfunctionalconsequence(nsv:nonsynonymousvariant,PTV:protein-truncatingvariant).Thesecondgroupindicatesmeanvaluesforgrowthparameters:birthweight(bw),height(ht),weight(wt),occipitofrontalcircumference(OFC).ValuesaregivenasstandarddeviationsfromthehealthypopulationmeanderivedfromALSPACdata.Thethirdgroupindicatesthemeanageforachievingdevelopmentalmilestones:ageoffirstsocialsmile,ageoffirstsittingunassisted,ageoffirstwalkingunassistedandageoffirstspeaking.Valuesaregiveninmonths.ThefinalgroupsummarisesHumanPhenotypeOntology(HPO)-codedphenotypespergene,ascountsofHPO-termswithindifferentclinicalcategories.
Mutations Growth Development Clinical features
-2 0 22 4 6 8 10 12 mild moderate severe 0 10 20 30
2 1 1
3 1 2
3 2 1
4 3 1
4 3 1
5 2 3
5 2 3
7 3 4
7 4 3
7 4 3
8 6 2
8 4 4
9 2 7
12 11 1
0 2
0 3
0 3
0 4
4 0
1 4
0 5
2 5
4 3
5 2
7 0
8 0
8 1
11 1
0.98 1.82 0.88 1.93
-0.62 -1.15 -0.09 -0.66
-0.73 -2.36 -1.88 -3.6
-0.73 -1.47 -0.17 0.59
0.3 0.24 0.11 -2.96
0.75 -0.75 -0.66 -2.73
-1.37 -2.64 -2.55 -2.53
0.07 0.87 1.06 -0.33
-0.82 -2.66 -1.89 -1.59
-0.34 -1.82 -0.99 -0.78
0.53 -0.98 -0.4 -2.57
-0.06 -1.43 -0.92 -2.18
-0.87 -0.37 0.24 -0.15
-0.49 -2.01 -1.05 -1.67
1.75 10 19
3 11.5 27
2.5 10 22
3 18 23.5
1.75 18 21
7.75 10.5 23
3.25 12 24
2.75 8 19
1.5 12 23
1.88 11.5 23.5
3.25 10 30
1.75 11.5 30
3 11.5 24
1.75 12 24
18
36
24
30
48
30
22
30
24
45
117.5
30
21
22
4 0 2 0 2 0 0
2 0 0 0 2 2 0
3 0 7 2 4 0 0
6 0 2 0 4 0 0
3 0 0 0 5 0 0
4 0 0 5 5 0 0
8 0 13 2 6 5 2
7 0 5 2 11 2 0
11 2 10 4 6 2 4
10 0 13 4 12 2 2
16 0 11 2 12 0 2
14 0 4 4 10 0 0
14 11 7 3 10 3 4
36 13 18 10 22 3 0
n f m nsv PTV bw ht wt OFC smile sit walk speak face heart skelskinhair
teeth
neurodev
eye abdo
CDK13
CHD4
CSNK2A1
GNAI1
CNOT3
PUF60
TCF20
PPM1D
ZBTB18
KCNQ3
MSL3
QRICH1
SET
SUV420H1
CDK13
CHD4
CSNK2A1
GNAI1
CNOT3
PUF60
TCF20
PPM1D
ZBTB18
KCNQ3
MSL3
QRICH1
SET
SUV420H1
probands (n) Z-score delayed development terms (n)
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
29
Figure4:Powerofgenomeversusexomesequencingtodiscoverdominantgenesassociatedwithdevelopmentaldisorders.Thepowerwasestimatedatthreedifferentfixedbudgets(1million(M)USD,2Mand3M)andarangeofrelativesensitivitiesforgenomesversusexomestodetectdenovomutations.Thenumberofgenesidentifiablebyexomesequencingareshadedblue,whereasthenumberofgenesidentifiablebygenomesequencingareshadedgreen.indicateTheregionswhereexomesequencingcosts30-40%ofgenomesequencingareshadedwithagreybackground,whichcorrespondstothepricedifferentialin2016.
Relative cost of exome to genome0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
40
30
20
10
0
$1M $2M $3M
1.201.151.101.051.00
exome
genome sensitivity
Dom
inan
t gen
es a
t gen
omew
ide
sign
ifica
nce
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
30
Figure5:Excessofdenovomutations(DNMs).A)Enrichmentratiosofobservedtoexpectedloss-of-functionDNMsbyclinicalrecognisabilityfordominanthaploinsufficientneurodevelopmentalgenesasjudgedbytwoconsultantclinicalgeneticists.B)EnrichmentofDNMsbyconsequencenormalisedrelativetothenumberofsynonymousDNMs.C)ProportionofexcessDNMswithloss-of-functionoraltered-functionmechanisms.ProportionsarederivedfromnumbersofexcessDNMsbyconsequence,andnumbersofexcesstruncatingandmissenseDNMsindominanthaploinsufficientgenes.D)EnrichmentratiosofobservedtoexpectedDNMsbypLIconstraintquantileforloss-of-function,missenseandsynonymousDNMs.CountsofDNMsineachlowerandupperhalfofthequantilesareprovided.E)NormalisedexcessofobservedtoexpectedDNMsbypLIconstraintquantile.ThisincludesmissenseDNMswithinallgenes,loss-of-functionincludingmissenseDNMsindominanthaploinsufficientgenesandmissenseDNMsindominantnonhaploinsufficientgenes(geneswithdominantnegativeoractivatingmechanisms).F)ProportionofexcessmissenseDNMswithaloss-of-functionmechanism.ThereddashedlineindicatestheproportioninobservedexcessDNMsattheoptimalgoodness-of-fit.Thehistogramshowsthefrequenciesofestimatedproportionsfrom1000permutations,assumingtheobservedproportioniscorrect.
Freq
uenc
y
250
200
150
100
50
0
Proportion of missense as loss-of-function0.2 0.3 0.4 0.5
Consequencesynonymous missense loss-of-function
0.0
0.5
1.0
1.5
2.0
2.5
Enr
ichm
ent (
obse
rved
/exp
ecte
d)
n=968excess=576
n=3853excess=1220
n=1236excess=-5
Clinical recognisability
0
40
80
120
160
Enr
ichm
ent (
obse
rved
/exp
ecte
d)
Mild ModerateLow High
Cryptic DistinctiveA B
D E
0.0
0.2
0.4
0.0
0.2
0.4
0.0
0.1
0.2
0.3
0.0
- 0.2
0.2
- 0.4
0.4
- 0.6
0.6
- 0.7
0.7
- 0.8
0.8
- 0.9
0.9
- 1.0
Nor
mal
ised
enr
ichm
ent (
obse
rved
- ex
pect
ed) all genes
missense
haploinsufficient genesloss-of-function
nonhaploinsufficient genesmissense
constraint quantileLOW HIGH
F
0
2
4
6
8n=189 n=777
loss-of-function
0
2 n=1461 n=2354missense
Enr
ichm
ent (
obse
rved
/exp
ecte
d)
0.0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
constraint quantileLOW HIGH
n=589 n=558synonymous
01
1220
576
PTV Missense
381 325
PTV Missense
576955
LoF
265
Alteredfunction
Inferred mechanism of excess DNMs
Excess DNMs DNMs in HI genesC
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
31
Figure6:Prevalenceoflivebirthswithdevelopmentaldisorderscausedbydominantdenovomutations(DNMs).Theprevalencewithinthegeneralpopulationisprovidedaspercentageforcombinationsofparentalages,extrapolatedfromthematernalandpaternalratesofDNMs.DistributionsofparentalageswithintheDDDcohortandtheUKpopulationareshownatthematchingparentalaxis.
0.06
0.00
20 25 30 35 40 45Paternal age (years)
20
25
30
35
40
45
Mat
erna
l age
(yea
rs)
0.24 0.26 0.28 0.29 0.31 0.33 0.35 0.37 0.39
0.25 0.27 0.29 0.31 0.32 0.34 0.36 0.38 0.40
0.26 0.28 0.30 0.32 0.34 0.35 0.37 0.39 0.41
0.27 0.29 0.31 0.33 0.35 0.36 0.38 0.40 0.42
0.28 0.30 0.32 0.34 0.36 0.38 0.39 0.41 0.43
0.29 0.31 0.33 0.35 0.37 0.39 0.40 0.42 0.44
0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.43 0.45
0.31 0.33 0.35 0.37 0.39 0.41 0.43 0.45 0.46
0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.47
0.00
0.03
0.06
Den
sity
UKDDD
0.03
Density
UKDDD
0.30
0.35
0.40
0.45
0.25
Prevalence (%)
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
32
SupplementaryFigures
SupplementaryFigure1:Proportionofindividualswithadenovomutation(DNM)likelytobepathogenic.TheseonlyincludedindividualswithproteinalteringorproteintruncatingDNMsindominantorX-linkeddominantdevelopmentaldisorder(DD)associatedgenes,ormaleswithDNMsinhemizygousDD-associatedgenes.TheproportionsgivenareforthoseindividualswithanyDNMsratherthanthetotalnumberofindividualsineachsubset.CohortsincludedintheDNMmeta-analysesareshadedblue.
inte
llect
ual d
isabi
lity
DDD
epile
psy
autis
m s
pect
rum
diso
rder
norm
al IQ
aut
ism s
pect
rum
diso
rder
schi
zoph
reni
a
cong
enita
l hea
rt di
seas
e
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
Pro
porti
on w
ith li
kely
pat
hoge
nic
de n
ovo
mut
atio
n
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
33
SupplementaryFigure2:Phenotypicsummaryofindividualswithdenovomutationsingenesachievinggenomewidesignificance.Phenotypesaregroupedbytype.ThefirstgroupindicatescountsofindividualswithDNMspergenebysex(m:male,f:female),andbyfunctionalconsequence(nsv:nonsynonymousvariant,PTV:protein-truncatingvariant).Thesecondgroupindicatesmeanvaluesforgrowthparameters:birthweight(bw),height(ht),weight(wt),occipitofrontalcircumference(OFC).ValuesaregivenasstandarddeviationsfromthehealthypopulationmeanderivedfromALSPACdata.Thethirdgroupindicatesthemeanageforachievingdevelopmentalmilestones:ageoffirstsocialsmile,ageoffirstsittingunassisted,ageoffirstwalkingunassistedandageoffirstspeaking.Valuesaregiveninmonths.ThefinalgroupsummarisesHumanPhenotypeOntology(HPO)-codedphenotypespergene,ascountsofHPO-termswithindifferentclinicalcategories.
STXBP1 11 7 4 6 5 0.31 -0.17 0.85 -0.35 2 11 27 48 0 0 3 0 15 3 0 STXBP1SMARCA2 10 7 3 10 0 -0.08 -0.56 0.02 -0.34 1.5 12 24 30 4 0 11 7 11 0 0 SMARCA2
ANKRD11ARID1BKMT2ADDX3XADNP
MED13LDYRK1AEP300SCN2ASETD5KCNQ2MECP2
SYNGAP1ASXL3SATB2TCF4CDK13
CREBBPDYNC1H1FOXP1
PPP2R5DPURA
CTNNB1KAT6A
EHMT1ITPR1KAT6BNSD1
SMC1ATBL1XR1CASKCHD2CHD4HDAC8USP9XWDR45AHDC1
CSNK2A1GNAI1GNAO1
HNRNPUKANSL1KIF1AMEF2CPACS1SLC6A1CNOT3CTCF
EEF1A2FOXG1
GATAD2BGRIN2BIQSEC2POGZPUF60SCN8ATCF20BCL11ABRAFCDKL5NFIX
PTPN11AUTS2
CHAMP1CNKSR2DNM1KCNH1NAA10PPM1DZBTB18
ZMYND11ASXL1
COL4A3BPKCNQ3MSL3MYT1LPDHA1
PPP2R1ASMAD4TRIOWACCHD8
GABRB3KDM5BPTEN
QRICH1SET
ZC4H2ALG13SCN1A
SUV420H1SLC35A2
24
probands (n) Z-score delayed development terms (n)
0 10 20 30 40 50mild moderate severe-4 2 0 2 40 10 20 30
34 18 16322928211918171717161515141413
15 1713 1728 06 158 116 129 810 710 79 714 111 47 73 117 6
1212121212121111
10101010
11 188
8
44
46 67 57 44 7
7
7
5 53
101099999988888888887777777777766666555555555444444444433333332221
8310374287854654453444153423544352432323125222313324434211112132111 0
110021122220120211313330342324324143324534263445344324312175270
6420876108486577407414723642610055501504402134
3578012780402200370363053024045500054040042310
410302000
00
22
2
0
0
0
33
3
3330
1
1
2
214142645122169006311912412503
3102326538 1
64487807
8117080311081415601551214131914263032
0.310.41-0.430.340.24-0.49-1.180.01-0.251.20.84-1.050.23
0.280.27-0.360.67-1.18-0.30.4-0.38-0.87-1.55-0.641.740.04-0.060.53-0.41-0.94-1.55-0.230.090.360.59-0.34-0.690.11.30.52-0.14-0.03-0.03-0.82-0.320.070.81.070.37-0.550.19-0.70.41-0.50.680.11-0.56-1.370.75-0.64-1.34-0.780.3-0.73-0.88-0.260.77-1.70.53-0.590.70.37-0.240.86-0.73-0.62-0.22-0.69-1.260.98-1.15 -2.8
1.82-2.51
-3.41-1.152.36-0.55-0.98-0.832.070.01-1.14
-2.660.880.84-1.46-1
-0.09-1.88-0.040.310.581.10.01-1.63
-3.781.93-2.04-1.76-1.17-0.66-3.60.30.67-0.391.47-0.52-2.79
-2.6-1.09-1.310.45-1.470.24-0.8-2.46-1.18-0.75-2.64-1.78
-1.03-1.61-0.791.95-0.170.110.64-1.75-1.19-0.66-2.55-2.17
-2.54-1.24-4.07-0.210.59-2.96-3.08-1.59-1.42-2.73-2.53-3.17
-1.061.63-0.49-0.17-1.68-2.39-0.83-1.37-2.62-1.330.87-1.21-2.66
-0.190.53-1.511.13-2.06-1.12-0.5-0.9-2.06-1.221.06-1.34-1.89
-1.42-0.98-2.1-1.3-4
-2.850.89-2.05-0.6-2.39-0.33-2.62-1.59
-0.45-0.510.65-1.130.46-1.37-0.98-1.82-0.19-0.34-0.79-2.07
0.45-0.04-0.3-1.26-0.56-0.56-0.63-0.990.16-1.060.44-1
-0.99-1.16-1.341.09-2.09-1.66-1.62-0.78-0.43-2.43-1.02-0.96
-1.49-2.08-1.63-0.98-1.43-0.58-0.77-0.31-1.86-0.37-0.57
-1.20.02-1.07-0.4-0.92-0.170.25-0.84-1.510.240.83
-0.3-1.16-1.49-2.57-2.180.68-1.29-0.99-2.81-0.15-1.72
-1.72-1.23-2.051.99-0.71-0.770.41
-0.82-1.28-0.18
-0.43-0.75-1.231.2-1.07-0.950.95
-0.8-0.9-0.44
-3.94-1.13-4.211.37-2.6-1.3-2.37
-2.88-4.27-0.44
-0.680.24-0.83-2
-2.01-0.91-0.43-0.1-1.410.06
0.3-0.26
-0.02-1.06-1.050.19-0.43-0.85-0.410.66
2.081.43-1.65-2.38-1.67-2.6-0.8-1.67-1.47
0.71-0.69-0.04-1.09-1.82-0.87-0.48-0.37-0.43-0.65-0.38 -1.81
-1.33-2.120.12-1.43-0.72-2.04-2.25-0.57-0.630.09 0.34
-0.70.26-1.91-1.930.39-0.430.3-1.12-0.68-1.16 -1.68
-0.56-2.17-1.27-1.03-1.18-4.67-4.29-2.22-0.96-0.64-0.74
NA
21.752.51.53.7532.53.25NA2NA1.5
1011080
107.51911.510812102211
1011948.5155.524272217.520242424
1011880
155.527362415NA303125.5
63.7523
0.753
1.751.7521.57.753.25
118.5164818181822169
10.512
261621
66.56023.5214726152324
482410684.5363048
137.5243022
61.6231.51.756.523.52.1232
2.751.62
161785121111911261010864
256011819241915333419221985.5
33.5113146204.553.536.515.5544524243064
1.5354.52.752.751.52.251.881.53.52.25
121121271316131011.59.51216.5
2319367023.535.5452023.5222436
24246973.54950.54536452427.5118.5
1.53
1.751.253.251.75251.55.75NA3
5310.513341011.5159111211.5
31.518.52434303044.536242424
39273060
117.5303073.5302421
2971.52322
2.25
2.253
8.51211121091112
12
212418.52424186024
22
41.53632.52830243636
36
3.253.382.252
1.751.752.751.52.251.51.5
17141912108121510131210
3032.54821.53020243524602427
4236606027332284.574734830.5
2.123.126
2.122.52
2.12
3612121210.51212
60243024243630
601236.534.53647
2.251.52.252 12
9.5912 24
192422
333023.533
50260220252617650
0 11 10 31 0 000
0
937
16
00 9
0 25
1703417 44 8 0
0 000199
0 13 7 23 8 60 14 6 15 0 50 14 0 18 0 00 0 5 13 0 00 5 5 14 0 00 4 0 14 0 0
59
0 4 0 15 0 00 6 0 18 4 0
4 0 12 5 12 6 90 0 5 4 8 4 1218 0 6 4 20 4 036 13 18 10 22 3 011 0 13 3 19 7 40 0 6 0 19 0 319 3 9 3 10 6 010 0 0 0 15 0 44 0 0 0 19 3 014 0 3 0 10 0 09 3 17 3 16 0 3
3 0 3 3 8 0 60 0 0 0 9 0 014 0 9 3 21 3 03 5 9 3 10 0 04 0 3 0 8 0 00 0 8 3 9 0 09 0 0 3 8 4 00 0 0 0 9 0 014 11 7 3 10 3 47 0 6 4 7 0 0
0 0 4 11 3 040 0 0 0 13 0 0
0210060414 0 4 4 10 0 0
20122110160 0 0 0 8 0 0
029350710 0 7 0 0 2 2
04700022 0 4 0 0 4 2
0012395103 0 2 0 8 6 0
221210 0 13 42 2 2 0 4 0 02 0 5 0 0 0 05 0 3 0 11 2 013 0 0 0 10 2 02 0 3 2 4 2 04 0 4 0 15 2 05 0 0 0 8 2 011 2 10 4 6 2 44 0 0 0 6 0 07 0 5 2 11 2 02 0 3 0 8 0 05 0 0 0 5 0 03 0 2 0 0 0 04 0 0 0 5 0 014 2 5 4 4 0 20 0 0 0 0 0 04 0 2 0 0 0 00 0 3 0 6 0 02 0 5 0 4 4 05 0 8 3 4 0 02 0 0 2 5 0 28 0 13 2 6 5 24 0 0 5 5 0 00 0 0 0 0 0 0
00000002 0 0 2 5 0 2
00500036 0 2 0 4 0 0
00424026 0 2 0 3 0 0
101
3 0 0 2 3 2 02 0 8 2 3 0 04 0 2 0 8 2 04 0 2 0 4 0 00 0 0 0 2 0 00 0 0 0 3 0 00 0 0 0 0 0 02 0 2 0 2 0 00 0 0 0 0 0 00 0 0 0 0 0 03 0 4 0 3 0 00 0 0 0 2 0 02 0 0 0 2 0 04 0 2 0 2 0 00 0 0 0 0 0 0
ANKRD11ARID1BKMT2ADDX3XADNPMED13LDYRK1AEP300SCN2ASETD5KCNQ2MECP2SYNGAP1ASXL3SATB2TCF4CDK13CREBBPDYNC1H1FOXP1PPP2R5DPURACTNNB1KAT6A
EHMT1ITPR1KAT6BNSD1SMC1ATBL1XR1CASKCHD2CHD4HDAC8USP9XWDR45AHDC1CSNK2A1GNAI1GNAO1HNRNPUKANSL1KIF1AMEF2CPACS1SLC6A1CNOT3CTCFEEF1A2FOXG1GATAD2BGRIN2BIQSEC2POGZPUF60SCN8ATCF20BCL11ABRAFCDKL5NFIXPTPN11AUTS2CHAMP1CNKSR2DNM1KCNH1NAA10PPM1DZBTB18ZMYND11ASXL1COL4A3BPKCNQ3MSL3MYT1LPDHA1PPP2R1ASMAD4TRIOWACCHD8GABRB3KDM5BPTENQRICH1SETZC4H2ALG13SCN1ASUV420H1SLC35A2
n f m nsv PTV bw ht wt OFC smile sit speak face heart skelskinhairteeth
neurodev eye abdowalk
Mutations Growth Development Clinical features
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
34
SupplementaryFigure3:Exampleofanicon-,heatmap-andimage-basedsummaryofthequantitative,categoricalandaveragefaceforeachofthegenesexceedinggenome-widesignificance.Thisusesdataonthe17individualswithdenovomutations(DNMs)inEP300.Aseparatepdffilecontainingthese“phenicons”forallgenesisprovided.Eachhasuptothreeparts.Thelefthandhalfofeachpageprovidesvisualrepresentationsofthegenename,thenumberofindividualswithdenovomutationsinthatgene,sexratio,gestation(inweeks),anthropometricdata(zscoresforbirthweight,height,weightandoccipital-frontalheadcircumference(ofc))anddevelopmentalmilestones(inmonthsforattainmentofsocialsmile,sittingunaided,walkingunaidedandfirstclearwords)fromindividualswithDNMsinthegene.ThescaledcartoonfigureshowstheheightweightandOFCwiththecolourofthehead,trunkandheightgradedwithgreyrepresentingazscoreof0andredincreasingnegativeandgreenincreasingpositivescores.Foreachmetricascatterplotisgivenabovetheindicatorbarrepresentingthemeasurementforeachindividual.Wheremorethatfourvaluesareavailabletwodensityplotsaregivenbelowthebarthegreyrepresentingthedataforallindividualsinthe94-genesetandcolouredthedensityplotforthegeneinquestion.InEP300theOFCmeasurementsareshiftedsignificantlytotheleftcomparedtothewholegroup.Forthezscoredatameanvaluesareprovidedandforthedevelopmentaldatamedianvaluesaregivenabovethebar.ThetoppanelonthelefthandsideofthepagesummarisesthekeyHumanPhenotypeOntology(HPO)termsforeachgene.TheHPOtermsintheindividualswereselected,includingtheancestralterms.Termsthatarerarerinthe4,293individualsrankhigher,adjustedbythenumberofindividualswithDNMswhohadtheterm.Theheatmapsareshadedbythenumberofindividualswitheachterm.Theheatmapsexcludetermsthatranklowerthanadescendantterm(excludingmoregeneraltermsifamorespecifictermoccurredfirst),andtermswherefewerthan25%ofindividualshadtheterm,oringeneswithlessthan8individuals,termswithfewerthantwoindividuals.ThebottompanelontherighthandhalfofthepagesummarisesthefacialphotographsfromindividualswithDNMsineachgene.Theaveragedfaceimagesareonlyavailableforselectedgenes,basedontheavailabilityofsufficienthigh-qualityfacialphotographsofindividualsforeachgene.ThewholeimagewasgeneratedusingacustomRscriptemployinggridbasedgraphics.
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
35
SupplementaryFigure4:Dispersionofdenovomutationsanddomainsforeachnovelgene.A)CDK13,B)CHD4,C)CNOT3,D)CSNK2A1,E)GNAI1,F)KCNQ3,G)MSL3,H)PPM1D,I)PUF60,J)QRICH1,K)SET,L)SUV420H1,M)TCF20andN)ZBTB18.
activ
e site
ATP bind
ing si
te Protein kinase
G714R G71
7R
G717R
V719G
K734R
R751Q
E792V
N842S
N842S
N842S
R860Q
V874L
c.289
8-1G
>A
CDK13 (1512 aa)
A
PHD-finge
r
Chrom
o/chr
omo s
hado
w
CHDNT (NUC03
4)
Chrom
o/chr
omo s
hado
w
CHDCT2 (NUC03
8)
Helica
se, C
-term
inal
K1752
K
V1636
I
R1127
Q
R1068
H
L100
9ins
M954V
I871T
S851Y
Q715*
R645W
C467Y
R341S
CHD4 (1940 aa)
PHD-finge
r
SNF2 fam
ily N
-term
inal
B
NOT2/NOT3/N
OT5
CCR4-Not
N-term
inal
E20Q
L48V
R188C
R188H
R188H
P244fs
V660fs
Q694*
CNOT3 (753 aa)
C
Protein kinase
activ
e site
ATP bind
ing si
te
R312WK19
8R
K198R
K198R
F197I
R191Q
I174M
R80H
CSNK2A1 (391 aa)
D
Alpha G
prote
in (tr
ansd
ucin)
sign
ature
Alpha G
prote
in (tr
ansd
ucin)
sign
ature
Alpha G
prote
in (tr
ansd
ucin)
sign
ature
Alpha G
prote
in (tr
ansd
ucin)
sign
ature
G-pro
tein a
lpha s
ubun
it, gr
oup I
G-pro
tein a
lpha s
ubun
it, gr
oup I
G-pro
tein a
lpha s
ubun
it, gr
oup I
G-pro
tein a
lpha s
ubun
it, gr
oup I
G-pro
tein a
lpha s
ubun
it, gr
oup I
Q52P Q17
2del
Q172d
el
E186d
el
Q204R K27
0R
K270R
I319T
V332E
GNAI1 (354 aa)
E
intra
membr
ane
cytop
lasmic
extra
cellu
lar
intra
membr
ane
cytop
lasmic
intra
membr
ane
intra
membr
ane
extra
cellu
lar
intra
membr
ane
cytop
lasmic
intra
membr
ane
cytop
lasmic
Ankyri
n-G bi
nding
site
intra
membr
ane
KCNQ volta
ge-g
ated p
otass
ium ch
anne
l
extra
cellu
lar
extra
cellu
lar
G553R
A356T
R236CR23
0C
R230C
R230C
R227Q
KCNQ3 (872 aa)
F
Chrom
o/chr
omo s
hado
wMRG
Y189fs
L314
fs
A340fs
F460fs
MSL3 (521 aa)
G
RNA bind
ing
ac
tivity
-knot
of a
chro
modom
ain
PPM-type phosphatase
D397fs
P418fsE42
4fs
E424fs
W42
7*
W42
7*
PPM1D (605 aa)
H
RNA reco
gnitio
n moti
f
RNA reco
gnitio
n moti
f
RNA reco
gnitio
n moti
fPoly-U binding splicingfactor, half-pint
H526fsG49
1R
G491E
T311fs
R298W
c.604
-2A>G
E181KE17
6ins
D159N
PUF60 (559 aa)
I
Protei
n of u
nkno
wn
functi
on D
UF3504
R652fs
R652*
Q47*
Q46fs
QRICH1 (776 aa)
J
Nucleo
some a
ssem
bly
prote
in (N
AP)
R57fs
R57fs
K154fs
SET (290 aa)
K
Histone-lysineN-methyltransferase,Suvar4-20
SET
R783*
A513V
c.977
+0G>A
W26
4SR187*
Y185fs
R143C
A74fs
SUV420H1 (885 aa)
L
PHD-like z
inc-b
inding
R1907
*
L183
8fs
K1173
Q
Q1127
fs
Y1009
*
G199fs
TCF20 (1960 aa)
M
Zinc fin
ger
Zinc fin
ger
Zinc fin
ger
Zinc fin
gerBTB/POZ
G208*
P212fs
Q271*
E350fs
R464H
H475H
R495G
ZBTB18 (531 aa)
N
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
36
SupplementaryFigure5:Effectofclusteringbyphenotypeontheabilitytoidentifygenomewidesignificantgenes.A)ComparisonofP-valuesderivedfromgenotypicinformationaloneversusP-valuesthatincorporategenotypicinformationandphenotypicsimilarity.B)ComparisonofP-valuesfromtestsinthecompleteDDDcohortversustestsinthesubsetwithseizures.Genesthatwerepreviouslylinkedtoseizuresareshadedblue.
-log10(Pall probands)0 20 40 60
-log1
0(P
seiz
ure
prob
ands
)
0
20
40
60
all genesknown seizure genes
A B
-4 -2 0 2 4
Den
sity
0.00
0.10
0.20
0.30
0.40
delta P (combined minus genotypic)
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
37
SupplementaryFigure6:Simulatedestimatesofpowertodetectloss-of-functiongenesinthegenomeatdifferencecohortsizes,givenfixedbudgets.
relative cost of exome to genome0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0
$1M $2M $3M
1.201.151.101.051.00
exome
genome sensitivity
0.0
0.1
0.2
0.3
0.4
0.5
Pow
er
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
38
SupplementaryFigure7:Neurodevelopmentalgenesclassifiedbyclinicalrecognisabilitywerecomparedforthegene-wisesignificanceversustheexpectednumberofmutationspergene.Pointsareshadedbyrecognisabilitycategory.Geneshavebeenseparatedintotwoplots,oneplotwithgenesforcrypticdisorderswithlow,mildormoderateclinicalrecognisability,andoneplotwithgenesfordistinctivedisorderswithhighclinicalrecognisability.
0.00 0.04 0.08 0.12Expected loss-of-function mutations
0
20
40
60
log 1
0(P
)
0.00 0.04 0.08 0.12Expected loss-of-function mutations
0
20
40
60
log 1
0(P
)Cryptic disorders Distinctive disorders
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint
39
SupplementaryFigure8:Stringencyofdenovomutation(DNM)filtering.A)SensitivityandspecificityofDNMvalidationswithinsetsfilteredonvaryingthresholdsofDNMquality(posteriorprobabilityofDNM).TheanalysedDNMswererestrictedtositesidentifiedwithintheearlier1133trios15,whereallcandidateDNMsunderwentvalidationexperiments.ThelabelledvalueisthequalitythresholdatwhichthenumberofcandidatesynonymousDNMsequalsthenumberofexpectedsynonymousmutationsunderanullgermlinemutationrate.B)Excessofmissenseandloss-of-functionDNMsatvaryingDNMqualitythresholds.TheDNMexcessisadjustedforthesensitivityandspecificityateachthreshold.
Exc
ess
of d
e no
vo m
utat
ions
Quality threshold (posterior probability(DNM))
0
500
1000
1500
2000
2500
0.0 0.2 0.4 0.6 0.8 1.0Positive predictive value
Sen
sitiv
ity
0.5
0.6
0.7
0.8
0.9
1.0
0.86 0.90 0.94 0.98
threshold:0.00781
A B
.CC-BY-ND 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (which wasthis version posted June 16, 2016. ; https://doi.org/10.1101/049056doi: bioRxiv preprint