amp t2d portal submitter's guide to sending data to the dcc€¦ · amp t2d knowledge portal...

25
1 Last updated November 8, 2016 AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC Contents Executive Summary ...................................................................................................................................... 3 Summary of Milestones for Data Submission to the Data Coordinating Center ...................................... 5 Contacting the DCC .................................................................................................................................. 5 Introduction ................................................................................................................................................. 5 AMP and the AMP T2D Knowledge Portal Overview ................................................................................... 6 Types of Data Requested for the AMP T2D Knowledge Portal ................................................................ 6 Overview of Data Aggregation and Analysis Process ............................................................................... 7 Policies and Data Use ............................................................................................................................... 7 Submitting Data that Cannot Enter the United States ............................................................................. 8 Data Transfer Agreement ......................................................................................................................... 8 Preparing for Data Submission to the Portal................................................................................................ 8 Required and Requested Files .................................................................................................................. 9 1. AMP DCC Data Intake Form.......................................................................................................... 9 2. Analysis result files ....................................................................................................................... 9 3. Primary Genotype Data File Types. ............................................................................................ 10 4. Intensity files for SNP array data. ............................................................................................... 10 5. Read files for sequencing data. .................................................................................................. 11 6. Phenotype Data.......................................................................................................................... 11 Overview of the Data Intake, Analysis, and Deposition Process ................................................................ 12 Data Transfer .......................................................................................................................................... 12 Description of Project/Cohort ................................................................................................................ 13 Summary Statistics Only ......................................................................................................................... 13 Data QC and Analysis at DCC .................................................................................................................. 14 QC Process at the DCC........................................................................................................................ 15 Association Analysis Process at the DCC ............................................................................................ 15

Upload: others

Post on 21-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

1

LastupdatedNovember8,2016

AMPT2DKnowledgePortalSubmitterandAnalysisGuideforDataattheDCC

ContentsExecutiveSummary......................................................................................................................................3

SummaryofMilestonesforDataSubmissiontotheDataCoordinatingCenter......................................5

ContactingtheDCC..................................................................................................................................5

Introduction.................................................................................................................................................5

AMPandtheAMPT2DKnowledgePortalOverview...................................................................................6

TypesofDataRequestedfortheAMPT2DKnowledgePortal................................................................6

OverviewofDataAggregationandAnalysisProcess...............................................................................7

PoliciesandDataUse...............................................................................................................................7

SubmittingDatathatCannotEntertheUnitedStates.............................................................................8

DataTransferAgreement.........................................................................................................................8

PreparingforDataSubmissiontothePortal................................................................................................8

RequiredandRequestedFiles..................................................................................................................9

1. AMPDCCDataIntakeForm..........................................................................................................9

2. Analysisresultfiles.......................................................................................................................9

3. PrimaryGenotypeDataFileTypes.............................................................................................10

4. IntensityfilesforSNParraydata................................................................................................10

5. Readfilesforsequencingdata...................................................................................................11

6. PhenotypeData..........................................................................................................................11

OverviewoftheDataIntake,Analysis,andDepositionProcess................................................................12

DataTransfer..........................................................................................................................................12

DescriptionofProject/Cohort................................................................................................................13

SummaryStatisticsOnly.........................................................................................................................13

DataQCandAnalysisatDCC..................................................................................................................14

QCProcessattheDCC........................................................................................................................15

AssociationAnalysisProcessattheDCC............................................................................................15

Page 2: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

2

LastupdatedNovember8,2016

DataDepositionandRelease.................................................................................................................16

PublicationPolicy.......................................................................................................................................16

AppendixA:AMPDCCDataIntaketoDataDepositinAMPT2DKnowledgePortal.................................18

AppendixB:AMPDCCDataIntakeForm...................................................................................................19

AppendixC:PhenotypeSubmission...........................................................................................................21

AppendixD:DetailedOverviewofQCProcessattheDCC........................................................................24

QualityControlProcessattheDCC........................................................................................................24

InitialDataReview.................................................................................................................................24

AncestryInference,Clustering,andOutlierdetection...........................................................................24

SampleMetricOutlierDetection...........................................................................................................24

PedigreeReconstruction........................................................................................................................24

QCReport...............................................................................................................................................24

Page 3: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

3

LastupdatedNovember8,2016

ExecutiveSummaryTheAMPT2DKnowledgePortalisawebbasedportalinplaceforthetype2diabetesscientificcommunitythatistransformingthewayresearchcommunitiesshareandvisualizegeneticdataandfacilitatingnewdiseasediscoveries.Inordertoenablescientiststoutilizethesenewtoolsontheirdatasetsandincreasethepowerofthedataontheknowledgeportal,theAMPT2DDataCoordinatingCenter(DCC)isbringinginnewdatasetsfordepositionintotheAMPT2DKnowledgePortal.AlldatasetssubmittedtotheDCChavetheresultsfromtheanalysisperformedattheDCCuploadedtotheknowledgeportalthatcanbeviewedbytheknowledgeportalusers.Individualleveldatawillnotbesharedontheknowledgeportal.Thisdocumentisaguideforstudiesthatareinterestedindepositingtheirarray,wholeexomesequencing,orwholegenomesequencingdataintotheportal.Otherdatatypeswillbeacceptedinthefuture.PleaseseeFigure1belowforabriefoverviewofthesubmissionprocess.Notethattheassociationanalysiswillbedoneonlyondatasetswithindividualdata.

SubmittingyourdatatotheDCCwillbeaninteractiveprocessbetweenyouranalyst/PIandouranalysisteam.ThedataintaketeamattheDCCwillbereviewingtheQCandassociationanalyseswiththesubmitterbeforeanydataisuploadedontotheportalandworkingwiththedatasubmittertoresolveanyissuesfoundwiththedata.TheanalysisprocessisintendedtobeiterativeandthedatasubmitterandDCCwilldecidetogethertheorderandtimelinefortheassociationanalysis.

Onceananalysisisreadyforsubmissiontotheknowledgebase,analysiswillgoliveintheportalonthenextrelease.Oncethedataisliveontheportal,ourpublicationpolicycomesintoeffect.ThedatawillinitiallyenterEarlyAccessPeriod1formonths0-3andEarlyAccessPeriod2formonths3-6months.Duringthefirst6monthsontheportaldatawillbeflaggedasEarlyAccessandundertheguidelinesoftheFortLauderdaleprinciples.Afterthedatahasbeenontheportalfor6months,theopenaccessperiodwillstartforthedata.

Page 4: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

4

LastupdatedNovember8,2016

Figure1.OverviewofAMPT2DKnowledgePortalDataSubmissionatDCC

Page 5: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

5

LastupdatedNovember8,2016

SummaryofMilestonesforDataSubmissiontotheDataCoordinatingCenter1. SignedDTAexecutedbyboththesubmitter’sinstitutionandtheBroadInstitute,servingasDCC.2. SubmitterhaspreparednecessaryfilesfortransfertotheDCC.3. Geneticdatauploadedtosecuretransfersite.(Individualleveldataonly)4. Initialphenotypedatauploadedtosecuretransfersite.(Individualleveldataonly)5. Precomputedanalysesuploadtosecuretransfersite.(Requiredforsummarystatistics

submissionandstronglyrecommendedforindividualleveldatasubmission)6. Projectinfosharedwithsubmitterbeforebeingloadedtotheportal.7. SubmitterandDCCapproveprojectdescription.8. Resultsofcompliancecheckandanalysisthatwillbeshownonportalsharedwithsubmitter.

(Summarystatisticsonly)9. SubmitterandDCCapprovedepositionofsummarylevelanalysesontheportal.(Summary

statisticsonly)10. Analysisresultssharedwithsubmitter.(Individualleveldataonly)11. Projectinformationthatwillbeloadedontheportalsharedwithsubmitter.12. SubmitterandDCCapproveQC’eddata.(Individualleveldataonly)13. SubmitterandDCCapproveassociationanalysis.(Individualleveldataonly)14. SubmitterandDCCapproveprojectinformation.15. Analysisgoesliveonportal.

ContactingtheDCCTogetstartedonthisprocess,pleasereachouttotheDataCoordinatingCenterattheBroadInstitutebyemailingushere:amp-dcc-data-submission@broadinstitute.com.Pleasetellusaboutthedatasetyou’dliketosubmitandanyconcernsyouhaveaboutdepositingyourdata.Amemberofthedataintaketeamwillreplywithadditionalinformationandguideyouthroughthesubmissionprocess.

IntroductionWelcometotheAMPT2DSubmissionGuideline!BringinginnewdatatotheknowledgeportalincreasesthevalueoftheAMPT2DKnowledgePortalforthetype2diabetesresearchcommunityandallowsthesubmittertoseetheirdatainthecontextofhundredsofthousandstype2diabetesandcontrolsamplesgatheredfromaroundtheworld.Ifyouhaven’tyet,pleasecheckouttheportalhere:http://www.type2diabetesgenetics.org/.Allyouneedtogetstartedisagooglelogin.

ThisdocumentoutlinestheprocessofsubmittingdatatotheAMPT2DDataCoordinatingCenter(DCC)attheBroadInstituteinCambridge,MA,USAandwillserveasaguidetosubmittersthroughouttheprocess.TheprocessmappedoutbelowbeginswithgettingyourDataTransferAgreementsignedandendswiththedepositionofyourdataintheportal.Itreviewstheprocess,roles,andresponsibilitiesoftheDCCandthedatasubmitter.Inadditiontoreviewingtheinformationbelow,weencourageyouto

Page 6: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

6

LastupdatedNovember8,2016

reachouttoyourprojectmanagerwithanyissuesorquestionsyouencounterduringyoursubmissionprocess.Eachprocesshasdefinedmilestonesthathighlightsignificantprogressingettingthedatareadyfordeposition.

Ifyouhaven’tstartedtheprocessyetandareinterestedindepositingyourdataintotheAMPT2Dknowledgeportal,pleasecontacttheDCCattheBroadInstitutebyemailingushere:amp-dcc-data-submission@broadinstitute.com.IfyouareunabletosendyourdatatotheUSAforanyreason,youhavetheoptionofsubmittingyourdatathroughafederatednode.PleasecontacttheDCCformoreinformation.

AMPandtheAMPT2DKnowledgePortalOverviewTheAcceleratingMedicinesPartnership(AMP)effortisapublic-privatepartnershipbetweentheNationalInstitutesofHealth(NIH),10pharmaceuticalcompaniesandmultiplenon-profitorganizationsthatjoinedtogethertotransformthewayresearchersidentifyandvalidatetherapeutictargetsforseveraldiseases,includingtype2diabetes.ToreadmoreabouttheAMPinitiativeandtoseewho’sinvolved,pleasevisit:https://www.nih.gov/research-training/accelerating-medicines-partnership-amp/type-2-diabetes

TheAMPtype2diabetes(AMPT2D)consortiumisacollaborationofanumberofAMPfundedinvestigatorsfromaroundtheworld,includingtheBroadInstitute,UniversityofOxford,andUniversityofMichigan.ThegoaloftheAMPT2Dconsortiumistocreateaknowledgeportalusinggeneticandphenotypicdatageneratedfromtype2diabeticsandcontrolsacrossmultiplepopulationsinordertobringforthdiscoveriesinthegeneticarchitectureoftype2diabetesandtofacilitatethedevelopmentofnewtherapeutictargetsfortreatingthisdisease.Usingthegeneticdatacollectedfromresearchersaroundtheworldinaninteractivewebportalenvironment,researchersareabletoaskquestionsfromthedataandseesummarylevelresults.Youcanalsosearchforyourgene,variant,orregionofinterestandseeifanyoftheAMPT2Dknowledgeportaldatasetshaveassociationfortype2diabetesorrelatedtraits.

TheAMPT2Dknowledgeportalwillbecontinuingtoworktowardsimprovingthevalueoftheportalforthetype2diabetescommunity.Tothisend,theAMPT2Dconsortiumwillbeworkingtoaddnewdatasetstotheportalandimprovethewebbasedtoolsusedforanalysiswithintheportal.Wewillbeupdatingthecommunityonourprogressthroughtheuseofourmailinglistandtwitterfeedbysigninguponthehomepageoftheportal:http://www.type2diabetesgenetics.org/home/portalHome.

TypesofDataRequestedfortheAMPT2DKnowledgePortalTheDataCoordinatingCenteriscurrentlyabletoacceptarraydata,wholeexomesequencingdata,andwholegenomesequencingdatathatisabletobetransferredtotheUnitedStates.Wearebuildingthecapacitytoacceptotherdatatypes,suchasgeneexpression,metabolomics,andepigeneticdata.

Page 7: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

7

LastupdatedNovember8,2016

OverviewofDataAggregationandAnalysisProcessAsasubmittertotheknowledgeportal,weknowit’simportantforyoutounderstandhowyourdatawillbehandledonceitisattheDCC.Onlyanalyticalresults,andnotindividualleveldata,willbeaccessiblethroughtheportal.Weanticipatethatmultipleversionsofresults,ofincreasingdetailandharmonizationwithotherdatasets,willbereleasedtotheportalintime.

Eachcohort/projectbeingsubmittedtotheknowledgeportalwillhavetheappropriateanalyticalresultsidentifiedandharmonizedwithexistinganalysesintheportalthroughacollaborativeprocessbetweenananalystattheDCCandananalystatyourinstitution.Theanalyticalresultsthatareprioritizedwillbedependentonthephenotypedataavailable,thevaluetheanalysisaddstotheknowledgeportal,andanyspecialrequestsmadebythedatasubmitter.

Theanalysiswillbereleasedtotheportalin3stages:EarlyAccessPhase1,EarlyAcessPhase2,andOpenAccessPeriod.EarlyAccessPhase1getsthedataanalysisuploadedandavailableontheportalwithlimitedQC.ThesubsequentrevisionsoftheresultswilloccurinEarlyAccessPhase2,whichwillaimto(a)addressanyinconsistenciesidentifiedbytheinitialharmonizationprocess(b)applymoreuniformQCacrossalldatasetsintheportal(c)computeadditionalstatisticsdesiredintheportalbutnotavailableintheinitialversionand(d)enableon-demandinteractiveanalysesofyourdata.Fortheserevisionswewillrequiretheoriginalgenotypeandphenotypedata.Additionally,wewillalsorequiredatainasunprocessedaformataspossible,inordertofacilitateharmonizationandqualitycontrol.OncethedataisQC’edandcomplete,theOpenAccessPeriodwillbeginforyourdata.WeexpectthetimingbetweenthestartofEarlyAccessPhase1tothebeginningoftheOpenAccessPeriodtolast6months.

PoliciesandDataUseWearecommittedtoensuringthatcollaboratorssubmittinggeneticdatatotheAMPT2DknowledgeportalunderstandhowthedatawillbeusedaftertransfertotheDCCatTheBroadInstitute.BysendingyourdatatotheBroadInstituteforuploadintotheknowledgeportalthedatasubmitterandDCCareagreeingtothefollowing:

1. Throughoutthisprocess,theBroadiscommittedtoprotectingyourdata,bothintransitandwhilethedataisinourservers.

2. Wewillonlybeabletoreceivede-identifiedleveldatathatisabletobetransferredandstoredattheDCC.WewillhaveoptionsavailableforthosewhocannotsubmitdatatotheUnitedStates.

3. IndividualdatawillbestoredinoursecureserversandonlyaccessedforQCandanalysispurposesrelatedtotheAMPT2Dknowledgeportal.

4. Individualdatawillneverbeposteddirectlytotheportal.Onlysummarylevelmetricsareavailabletoportalusers.

5. Summarylevelanalysisofthesubmitteddatawillbepostedtotheknowledgeportalandavailabletousers.Thisincludesp-values,oddsratio,minorallelefrequency,effect,directionof

Page 8: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

8

LastupdatedNovember8,2016

effect,allelefrequenciesacrossethnicities,andotheranalysesthataredeemedappropriatebytheAMPT2DKnowledgePortalteamandAMPT2Dconsortium.

6. Usersoftheportalwillbeabletocreatecustomqueriesandviewsummarylevelresultsforthosequeries.Thiswillincludedisplayingresultsforspecificprojects/cohorts.

7. TheBroadwillQCandanalyzeyourdataforT2Dandrelatedtraitsinpartnershipwiththesubmitter.Thisisacollaborativeprocesssothesubmitterwillgettoviewtheanalysisbeforeitisuploadedtotheportal.

8. TheBroadmaybesendinggenotypedatasubmittedtotheportaltotheMichiganImputationServerforimputation.ThisisafreeservicehostedbytheUniversityofMichiganandallowsustousetheHaplotypeReferenceConsortiumpanelforimputation.TheUniversityofMichiganisakeymemberoftheAMPT2DconsortiumthatisfundedbytheNIHtodeveloptheAMPT2Dknowledgeportal.Foradditionalinformationontheimputationserver,pleasevisit:https://imputationserver.sph.umich.edu/index.html.

ThepoliciesrelatedtothedataintheAMPT2Dknowledgeportal,includingdatausefortheknowledgeportalusers,canbefoundhere:http://www.type2diabetesgenetics.org/informational/policies#.

SubmittingDatathatCannotEntertheUnitedStatesOurAMPT2DfundedcollaboratorsattheUniversityofOxfordarecurrentlybuildingacapabilitytoingestdata,QC,andharmonizedataatEBI.Ifdatacan’tleaveEuropeorentertheUnitedStatesyoucanstillsubmityourdatatotheknowledgeportalthroughthismethod.EBIwillperformthesamefunctionsastheDCCandwillworkwithyoutosubmityourdatatotheknowledgeportal.

DataTransferAgreementBeforewebegintransferringdataweneedasignedandexecutedDataTransferApproval(DTA).YouwillreceivetheDTAviaemailfromtheDCCprojectmanageroryoucanfinditontheknowledgeportalhere:http://www.type2diabetesgenetics.org/informational/policies.Thisdocumentshouldbereviewedbyyourinstitution’slegalcounselbeforesigningandanyeditsmadewillneedtobesignedoffbythelegalcounselattheBroad.ThedocumentoutlinesthatasadatacontributortotheAMPT2DPortal,youagreetotransferyourdatatotheDCC(BroadInstitute)andyouhavetheapprovaltodoso.Althoughnotcoveredinthisdocument,asimilarDTAwillbenecessarytotransferdatatoaFederatednodeincaseswherethedatacannotentertheUnitedStates.

Milestone:

1. SignedDTAexecutedbyboththesubmitter’sinstitutionandtheBroadInstitute,servingasDCC.

PreparingforDataSubmissiontothePortalWhileweworktowardsgettingaDTAinplace,thedatasubmittercanbegintheprocessofpreparingtheirfilesfordatasubmissiontotheDCC.Theinformationbelowoutlinestheinformationweneedto

Page 9: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

9

LastupdatedNovember8,2016

getyourdatauploadedtotheportal.Forasummarytableoftheinformationneeded,pleaseseeTable1below.

IfyourdataisunabletoleaveyoursiteorcometotheBroadInstitute,locatedinCambridge,MA,USA,thendepositingyourdatainaFederatedNodewillallowyoutostillcontributeyourdatatotheknowledgeportal.PleasecontacttheDCCformoreinformation.

RequiredandRequestedFilesBelowareguidelinesforthetypesanddesiredformatsofdatasetstransferredtotheDCC.Asageneralrule,weencourageyoutosubmitasmuchdataandasmanyresultsaspossible,andtoannotateyourfileswithasmuchinformationasisfeasible.ThisinformationwillbeextremelyhelpfulasouranalystsstarttheQCandanalysisprocessonyourdata.Pleasenotethatweunderstandthatdifferentsiteswillhavedifferentdatatypesanddifferentabilitiestotransformamongdataformats,andwearethushappytoworkwithyoutofacilitatethisprocessonacase-by-casebasis.

1. AMPDCCDataIntakeForm.ThisdataisrequiredinordertosubmityourdatatotheDCC.Theformwillbesentviaemailandpleasecontactyourprojectmanageroramp-dcc-data-submission@broadinstitute.comifyouhavenotreceivedit.Foradditionaldetailsonthetypeofinformationneeded,pleaserefertoAppendixB.

2. Analysisresultfiles.Thesefilesareoptional,butanyanalyticalresultsthatyoutransferwillhelpusexpediteandverifyouranalysis.Anynumberoffilescanbeprovided.Foreachfile,thefollowingisrequired

• Atab-orcomma-delimitedfile,withaheaderrowfollowedbyonerowforeveryvariantintheresultsfile.Theheaderrowcanhaveasmanycolumnsaspossible.Mandatorycolumnsincludethechromosome,position,effectallele(withrespecttowhichanyphenotypiceffectismeasured),andnon-effectallele.Allallelesshouldbealignedtotheforwardstrandofthegenome,theversionofwhichshouldbespecifiedintheannotationdata(seebelow).Additionaldesiredcolumnsincludeminorallelefrequency,p-valueofassociationwithoneormoretraits,estimatedoddsratiooreffectsize,case/controlcounts,andnumberofanalyzedsamples.Ifmultiplestatisticsareavailableacrossmultipletraits(e.g.T2Dvs.glucose)oracrossmultiplesamplegrouping(e.g.allsamplesvs.onlysamplesofagivenancestry)theycanbeincludedinasinglefileorsplitacrossmultiplefiles.Thesetofvariantsneednotbeidenticalacrossdifferentresultfiles.

• Annotationdatadescribingthemeaningofeachcolumnarerequired.Theseshouldbehumanreadable.Theannotationscanbeembeddedintheresultsfileorprovidedasaseparatedocument.

Page 10: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

10

LastupdatedNovember8,2016

3. PrimaryGenotypeDataFileTypes.Inordertoensurethecontinueduseofyourdataintheportalasdemandforadditionalstatisticsandanalysesgrows,werequestthefollowingfilesencodingthegenotypesofeachsample.Thesegenotypefileswillbeusedtocomputestatisticsthatareunavailablefromtheanalysisfiles,whichwillbeaddedtotheportalinsubsequentdataversions.

• GenotypefilesinVCForPLINKformatarerequired.Wewillaccepteitherformat,providedthatstrandinformationisclearlyannotated.NotethatVCFfileshaveacleardistinctionbetweenreferenceandalternatealleles,whileallelescanbeflippedbysomeplinkanalyses.

a. TheVCFfileformatisavailableatXXX.

b. InformationaboutthePLINKfileformatisavailableatXXX.Werecommendtransferringbed/bim/famfiles,whichcanbecreatedbyPLINK.

• ListsofQC+samplesandvariantsthatwereadvancedtoyourfinalanalysisareoptional.Providingthesewillensurethatwecanrecomputestatisticsconcordantwiththosethatyouproducedinyouranalysis.Ifyoudonotprovidethem,wewillperformourownQCwhichwilllikelybesimilar(butnotidentical)toyours.

• Documentationofyouroriginalanalysisplanisalsooptional.Anyhumanreadabledocumentdescribingthemotivationsofyouranalysis,thestatisticalmethodsemployed,andanyparametersettingswillalsohelpustoreplicateyouranalysis.Amethodssectionofapaper,ifsufficientlydetailed,willalsosuffice.

4. IntensityfilesforSNParraydata.Ultimately,itmaybenecessaryforustohaveaccesstotherawdatausedtocallgenotypes.Thiswillassistwithqualitycontrol(forexample,examiningevidencethatararevarianthasaccurategenotypes),aswellasharmonization(forexample,ensuringthatallvariantsarecalledusingsimilarprocedures).Thus,althoughnotessentialforthefirstversionofyourdatatoappearontheportal,thefollowingfilesarerequiredtocompletethedatatransferprocessinitsentirety.

• Rawintensityfiles(idatortheequivalent).ForSNParraygenotypingdata,anyfileformatthatlistsnormalizedX/Yintensityvaluesforeachsampleisacceptable.WhensubmittingIDAT,pleaseremembertosendbothoftheintensityfilesforeachsample.

a. ExamplefileformatsacceptedbytheSangerforasimilarprojectareatXXX

b. AguideforthefileformatsusedbyzCall(aclusteringalgorithmforexomechip)isavailableatXXX.

• Clusterandmanifestfilestoaccompanytherawintensityfiles.ForIlluminaIDATfiles,thesetwofilesarerequiredforthenecessarydownstreamanalysis.Theyshouldbeavailablefromandfamiliartotheplatformthatproducedyouroriginalgenotypecalls.Themanifestfile

Page 11: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

11

LastupdatedNovember8,2016

describesthesamplesthatweregenotyped;theclusterfilerecordsanyinformationthatwasusedtobetterclustertheintensitiesforeachSNP.

5. Readfilesforsequencingdata.SimilartointensityfilesforSNParraydata,readfilesarerequiredforsequencingdata.Wewillusethesetorun“jointvariantcalling”acrossallsamplesattheDCC,formaximumsensitivityandaccuracyofvariantcalls.Sincevariantcallsetsfromsequencedataincludenovelvariantsandalleles,re-processingrawdataisevenmoreimportantinsequencingexperimentsthanSNParrayexperiments;theExACpaper(availableatXXX)outlinessomeoftherationaleforthis.

• BAMorCRAMfilesforeachsamplearerequiredforsequencingexperiments.Thesefilesarethestandardformatforstoringreaddataandshouldbeproducedbyyoursequencingplatform.Wewouldpreferraw,unalignedBAMfiles.

a. InformationontheBAMfileformatisavailableatXXX

b. BAMfilescanbecreatedfromFASTQfiles,asdescribedatXXX.

6. PhenotypeData.Thisisrequiredalongsidesubmissionofgenotypeand/orsequencingdata.Theofficialdocumentwithfullinstructionswillbeemailedtoyou.Foranideaofthevariablesrequested,pleaseseeAppendixC.Ifyouhaveaspecificvariablenotinthislist,butrelativetotype2diabetesorrelatedconditionsletusknowandwecanincludeitforyoursubmission.

Forasummaryviewofwhatisneededforyourdatasubmission,pleaseseetable1below.

Table1.SummaryoffilesacceptedfordatasubmissionintotheAMPT2DPortal

FileType GenotypingSubmission SequencingSubmissionAMPDCCDataIntakeForm Required RequiredAnalysisResults Optional OptionalAnnotationData Optional OptionalGenotypeFiles(VCForPLINK) Required Required(VCF)ListofQC+samplesandvariants Optional OptionalAnalysisPlanDocumentation Optional OptionalRawIntensityFiles Required N/AClusterFile Required N/AManifestFile Required N/ASequencingReadFiles(BAMorCRAM) N/A RequiredPhenotypeData Required Required

Milestone:

Page 12: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

12

LastupdatedNovember8,2016

1. SubmitterhaspreparednecessaryfilesfortransfertotheDCC.

OverviewoftheDataIntake,Analysis,andDepositionProcessWhenyouarereadytostartsubmittingfilestotheDCCfordepositionintotheAMPT2Dportal,emailamp_dcc_data_submission@broadinstitute.organdwewillsetupasecuretransferportalforyoutouploadyourfiles.WewillbeusinganASPERAsite,whichwillcomewithdetailedinstructionsonhowtouploadthefiles.OncetheASPERAsiteiscreated,wehave30daysbeforethesiteexpirestouploaddata.IfitbecomesnecessarytoextendthattimelinepleaseletusknowsowecanextendthelifeoftheASPERAsite.

Thedatatransferprocessitoutlinedbystepbelow.Forafullpictureofdataintaketodeposition,pleaseseeAppendixA.

DataTransferThedatatransferprocessstartsoncethesubmitterandDCCattheBroadInstitutehaveallnecessarydocumentationinplaceandarereadytobeginphysicallytransferringthedatatotheDCC.Duringthesesteps,ifindividualleveldataisbeingprovided,thedatasubmitterwilltransferthephenotypicandgeneticdatatotheportal.Thisincludestherawdataandanyavailableprecomputedanalyses.ForsiteswherewearereceivingsummarystatisticsonlyweaskfortheprecomputedanalysestobesenttotheDCC.PleaseseeFigure2belowforanoverviewofthedataintakeprocessattheDCC.

Regardlessofwhichtypeofsubmissionbeingsent,weaskthateachsitecompletesadataintakeform,asnotedabove.ThepurposeoftheformistoinformtheDCCofthedatabeingsubmittedandtohelpuscreateaproject/cohortdescriptionforthisdataontheportal.

Figure2.DataTransferProcessatDCC

Page 13: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

13

LastupdatedNovember8,2016

Milestones:

1. Geneticdatauploadedtosecuretransfersite.(Individualleveldataonly)2. Initialphenotypedatauploadedtosecuretransfersite.(Individualleveldataonly)3. Precomputedanalysesuploadtosecuretransfersite.(Requiredforsummarystatistics

submissionandstronglyrecommendedforindividualleveldatasubmission)

DescriptionofProject/CohortEachprojectandcohortwithdataincludedintheAMPT2Dportalwillhaveadescriptionoftheprojectand/orcohortthatissubmittingdata.ThisdescriptionwillbecreatedbytheContentManagerattheDCCusingtheprojectinformationprovidedbythesubmitterontheDataIntakeForm.Duringthisprocess,thesubmitterwillhavetheopportunitytoprovidefeedbackonthedescriptionoftheirstudy.

Figure3.DescriptionofProjectandCohortInformationSubmissionProcessatDCC

Milestones:

1. Projectinfosharedwithsubmitterbeforebeingloadedtotheportal.2. SubmitterandDCCapproveprojectdescription.

SummaryStatisticsOnlyIfyouarenotabletosharerawdatawiththeportalforsomereason,theportalcanacceptsummarylevelstatisticsthatcanbepostedtotheportal.Inthisinstance,theDCCwouldtakethesummarylevelinformationthatyouhavegeneratedthensecurelystorethedataandperformadatacompliancecheck.Oncethecompliancecheckhasbeencompleted,theDCCwillshareresultswiththesubmitterandconfirmthatwecanproceedwithdepositingthedatatotheportal.

Page 14: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

14

LastupdatedNovember8,2016

Figure4.SummaryStatisticsOnlyDataSubmissionProcessatDCC

Milestones:

1. Resultsofcompliancecheckandanalysisthatwillbeshownonportalsharedwithsubmitter.

2. SubmitterandDCCapprovedepositionofsummarylevelanalysesontheportal.

DataQCandAnalysisatDCCDatasetswithindividualleveldatabeingsubmittedtotheDCCwillundergosecuredatastorage,QC,andassociationanalysisattheDCC.DuringthisprocesstheDCCwillworkwiththedatasubmittertocreateananalysisplanthatwillbeusedtodrivethefutureanalysesandcreatedatasetswithintheproject/cohortthatwillbedepositedintotheportal.Adatasetinthiscontextreferstoaspecificsetofsamplespairedwithspecificphenotype(s).WeexpecteachdatasubmissiontocontainanumberofdatasetsandwewillworkwiththedatasubmitterstoprioritizethedatasetsforsubmissiontotheAMPT2Dportal.OncetheanalysisiscompletedtheDCCwillreachouttothesubmitterandreviewtheresultsoftheanalysis.ResultswillnotbeuploadedtotheportaluntilboththedatasubmitterandDCCaffirmsthatthedataisreadytoshare.

Figure5.DataQCandAnalysisProcessforincomingdatatotheDCC

TheDCChascompiledalistofstandardsinglevariantassociationanalysesthatwillbeusedasaguideforcreatingtheanalysisplanwiththesubmitter.Eachanalysisplanwillbeuniquetoeachsite,dependingonthephenotypevariablesthatareavailableandthevalueeachanalysisaddstotheportal.

Milestones:

Page 15: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

15

LastupdatedNovember8,2016

1. Analysisresultssharedwithsubmitter.2. Projectinformationthatwillbeloadedontheportalsharedwithsubmitter.3. SubmitterandDCCapproveQC’eddata.4. SubmitterandDCCapproveassociationanalysis.5. SubmitterandDCCapproveprojectinformation.

QCProcessattheDCCTheQCprocessattheDCCisvitaltoharmonizingthedatabeingaddedtotheAMPT2DKnowledgePortal.ThegoalofourQCistoidentifyartifacts,ensurestatisticscanbecomputedconsistently,andhelpstheDCCunderstandthedatabeingsubmitted.ThisprocessisundertakenonindividualleveldatathatisprovidedtotheDCCbyananalystwhoisworkingwithalldatabeingloadedintotheknowledgeportalandperformstheQCusinganautomatedandconsistentprocess.

TheanalystattheDCCwillbecomputingmetricsadjustedforancestryandotherconfoundersandthenexcludeoutliersamples,whicharepotentiallyartifacts.TheQCcompletedfordatadestinedfortheknowledgeportaltendstobeconservatives,sinceweareaimingtoensurehighqualitydataforusers.OncethisQChascompleted,wewillprovideareporttosharewiththedatasubmitters.AnexampleQCreportcanbefoundintheAMPT2DKnowledgePortal:CAMPQCReport.ForfulldetailsontheQCprocesspleaseseeAppendixD.

AssociationAnalysisProcessattheDCCAssociationanalysisattheDCCisaninteractiveprocessbetweentheanalystattheDCCandtheanalystatthesubmittingsite.TheinitialanalysisperformedwillconsistofasetnumberoftraitsdecideduponbytheDCCandthedatasubmitter.Asaguideforoursubmitters,theDCCrecommendsfocusingonsomeinitialtraitsofrelevancetotype2diabetesfortheinitialanalysisdoneattheDCC.PleaseseeTable2foralistofrecommendedtraits.

Table2.StandardT2DtraitsforpossibleDCCassociationanalysis

Categoriesoftraits ExamplerelatedphenotypevariablesType2Diabetesstatus T2Dstatus,T2DageofdiagnosisCardiometabolic Systolicbloodpressure,Diastolicbloodpressure,HypertensionstatusLipids HDLcholesterol,LDLcholesterol,Triglycerides,TotalcholesterolGlycemic Insulin,glucose(2hr,fasting,and/orrandom)HbA1C,Anthropomorphic BMI,age,weight,waisthipratioKidneyFunction Creatinine,Urinaryalbumin

OnceaninitialanalysishasbeencompletedattheDCC,wewillsendthedatasubmitterananalysisreportfortheirreviewandcomments.AnexamplereportcanbefoundontheAMPT2DKnowledgePortal:CAMPAnalysisReport.

Page 16: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

16

LastupdatedNovember8,2016

DataDepositionandReleaseOncetheanalysesandtheproject/cohortdescriptionhavebeenreviewedandapprovedbyboththedatasubmitterandDCC,thedatawillbedepositedontotheknowledgeportalbasebeforegoingliveontheportal.

Figure6.DataDepositionProcessatDCC

Thedatawillgolivewiththenextquarterlyportalrelease,occurringinFebruary,May,August,andNovember.

Milestone:

1. Analysisgoesliveonportal.

PublicationPolicyOnceyourdataisliveontheknowledgeportal,submittersareprotectedbya6monthearlyaccessperiodthatissubjecttotheguidelinesoftheFt.Lauderdaleprinciples.This6monthperiodisbrokendownintoa3monthEarlyAccessPhase1,wheredataisliveontheportalwithlimitedQCanda3monthEarlyAccessPhase2,wherethedataisfullyintegratedintotheportal.AlldataineitheroftheEarlyAccessPeriodswillbeflaggedtoknowledgeportaluserswhoareviewingthedata.PleaseseeFigure7forthescheduleddatareleases.

Page 17: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

17

LastupdatedNovember8,2016

FormoreinformationontheFt.Lauderdalepolicies,pleasevisit:https://www.genome.gov/pages/research/wellcomereport0303.pdf.

Submitteddatawillbemadeliveontheknowledgeportaloveranumberofdatasetfreezes.Sincewewillberunningassociationanalysisovertimeandaddingtotheportal,eachdatasetwillbedefinedasasetofgeneticdataassociatedwithspecifictraits.Anyanalysisadditionaltraits,samples,ordatawillbeconsideredanewdatasetandwillstartagainintheEarlyAccessPeriod1.Forexample,iffortheinitialanalysisthedatasubmitterandDCCchosetorunanassociationanalysison3,000Exomechipsamplesusingtype2diabetesstatusthatwouldequaltoonedatasetandwouldstarttheEarlyAccessPeriodonthenextscheduledreleaseoftheportal.Ifthesame3,000exomechipsampleswerethenanalyzedlaterforBMI,fastingglucose,andfastinginsulinthatwouldcreateanewdatasetthatwouldstartintheEarlyAccessPeriod,evenifthesamesamplesanalyzedfortype2diabeteshavetheanalysisintheopenaccessperiod.

Figure7.ScheduledAMPT2DPortalReleasesforDataSubmission

Page 18: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

18

LastupdatedNovember8,2016

AppendixA:AMPDCCDataIntaketoDataDepositinAMPT2DKnowledgePortalFigure8isacompleteflowchartoutliningtheDCC’sdataintakeprocess,startingatthepointwherethedataislegallyandphysicallypreparedtobetransferredbythesubmittertotheDCCandendingwiththedatabeingliveintheportal.Thisdocumentcontains5subsectionsofworkthatisgroupedtogethertocreatethelargerprocess.Thesubsectionsarediscussedinmoredetailinthemaindocument.

Figure8.FlowchartofAMPDCCDataIntaketoDataDepositioninAMPT2DKnowledgePortal

Page 19: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

19

LastupdatedNovember8,2016

AppendixB:AMPDCCDataIntakeFormForacopyoftheAMPDCCDataIntakeFormtocompletepleasecontactyourprojectmanageroramp-dcc-data-submission@broadinstitute.com.Inordertogiveyouanideaofthetypeofinformationneeded,wehaveincludedascreenshotoftheinformationbelow.Theinformationwillbegivenfortheprojectonthefirsttab(seeFigure9)andbycohort(s)inthefollowingtabs(seeFigure10).

Inthefirsttab,weaskforthegeneralinformationaboutthefiletypesyouaresubmittingalongwithprojectinformation,includingaprojectdescriptionandinformation,anystudyspecificcovariatesthatwereusedduringyouranalysis,specialanalysisrequests,andotherplacesthedatacanbefound(ie.dbGAP,EGA,oraprojectwebsite).

Thefollowingtabsallowthedatasubmittertogiveadditionaldetailsonthecohortsthatmakeuptheproject.Weexpectthatsomeprojectswillhaveonecohort,whileotherswillhaveseveral.

Figure9.AMPDCCDataIntakeFormProjectInformation

Figure10.AMPDCCDataIntakeFormCohortInformation

Page 20: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

20

LastupdatedNovember8,2016

Page 21: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

21

LastupdatedNovember8,2016

AppendixC:PhenotypeSubmissionTheAMPT2Dconsortiumhasdeterminedanumberoftraitsthatwillbeusefulforunderstandingyourdataandperformingrelevantanalysisthatcanbesharedontheknowledgeportal.Weaskthatyousubmitanyofthesevariablesthatareavailableforyourdataandalsopleaseletusknowifyouhaveauniquevariablethatweshouldbeincluding.Thislistismeantforinformationpurposesonly.PleaseseetheAMPPhenotypeVariableInfosheetemailedtoyouforadditionalinstructionsandinformation.

Table3.AMPT2DKnowledgePortalPhenotypeVariables

Category Variable FormatIDvariables StudyID characterIDvariables SampleIDusedingenotypedataset

(ifdifferent) characterIDvariables dbGaPsampleID(ifexisting) characterIDvariables StudyIDoffather characterIDvariables StudyIDofmother characterDemographics Race characterDemographics Race-opentextdescription characterDemographics Ethnicity characterDemographics

SexPleasecodevaluesas"Male"and

"Female"Demographics Yearofbirth 4-digitinteger

Type2Diabetes(T2D)statusvariables

T2Dstatusbasedonself-report(1=T2D;0=notT2D) integer(1=T2D;0=notT2D)

Type2Diabetes(T2D)statusvariables

T2Dstatusbasedonhistoryofhealthcareproviderdiagnosis

(1=T2D;0=notT2D) integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status

variablesT2Dmedicationstatus(1=Yes;

0=No) integer(1=yes;0=no)Type2Diabetes(T2D)status

variablesT2Dstatusbasedonfastingglucose

level(1=T2D;0=notT2D) integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status

variablesT2DstatusbasedonHbA1c(1=T2D;

0=notT2D) integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status

variablesGlucosetolerancestatusbasedonoralglucosetolerancetest(OGTT) character

Type2Diabetes(T2D)statusvariables

T2Dstatusdefinedinawayotherthanoneoftheapproachesabove

(1=T2D;0=notT2D)-e.g.acombinationoftheabovethatcan't

beseparatedintoindividualvariables integer(1=T2D;0=notT2D)

Type2Diabetes(T2D)statusvariables

T2Dstatuswithunknowndefinition(1=T2D;0=notT2D)-e.g.whereaT2Dstatusvariableisavailablebutthereisnotdocumentationonhow

itwasdefined integer(1=T2D;0=notT2D)Type2Diabetes(T2D)status

variablesT2Dtreatmentwithinsulinor

analogs integer(1=yes;0=no)Type2Diabetes(T2D)status

variablesT2Dtreatmentwithnon-insulin

medication integer(1=yes;0=no)

Page 22: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

22

LastupdatedNovember8,2016

Type2Diabetes(T2D)statusvariables

T2Dageofdiagnosis(forthosethatareaffected)(years) nn.nnn

Type2Diabetes(T2D)statusvariables

Timeinterval,inyears,betweendiagnosisofdiabetesandbeginning

oftreatmentwithinsulin integerType2Diabetes(T2D)status

variablesThisisanopentextvariableto

indicatetypesofdiabetesotherthanType2(orunclearifType2).

Examplesinclude:"Type1diabetes","MODY","LADA",

"Gestationaldiabetes","Diabetesknowntobecausedbyother

processessuchascysticfibrosis,hemochromatosisorpancreaticsurgery","Diabetesstatusonlyavailableduringpregnancy". text

Bloodbiomarkers Fastingplasmaglucose(mmol/l) nn.nnBloodbiomarkers Fastinginsulin(mU/l) nnn.nBloodbiomarkers OGTT2-hourfastingglucose

(mmol/l) nn.nnBloodbiomarkers OGTT2-hourfastingInsulin(mU/l) nnn.nBloodbiomarkers Randomglucose(i.e.notfastingor

unknownfasting)(mmol/l) nn.nnBloodbiomarkers FastingC-peptide(nmol/l) nn.nnBloodbiomarkers HbA1c(fraction,%) nnn.nBloodbiomarkers HbA1c(mmol/mol) nn.nnBloodbiomarkers GlutamicAcidDecarboxylase

Autoantibodies(GADAb) integer(1=positive;0=negative)Bloodbiomarkers IsletCellAutoantibodies integer(1=positive;0=negative)Bloodbiomarkers Anti-insulinAutoantibodies integer(1=positive;0=negative)Bloodbiomarkers ZNT8Autoantibodies integer(1=positive;0=negative)Bloodbiomarkers Serumcreatinine(umol/L) nnn.nBloodbiomarkers Adiponectin(ug/ml) nn.nnBloodbiomarkers Leptin(ng/ml) nnn.nBloodbiomarkers Totalcholesterol(mmol/l) nn.nnBloodbiomarkers LDLcholesterol(mmol/l)(if

measureddirectly,missingifnot) nn.nnBloodbiomarkers CalculatedLDLcholesterol(mmol/l)

(usingFriedewaldequation) nn.nnBloodbiomarkers HDLcholesterol(mmol/l) nn.nnBloodbiomarkers Triglycerides(mmol/l) nn.nnBloodbiomarkers Anylipidloweringmedicationstatus

(1=yes,0=no) integer(1=yes;0=no)Bloodbiomarkers Statinmedicationstatus(1=yes,

0=no) integer(1=yes;0=no)Anthropometry Height(centimeters) nnn.nAnthropometry Weight(kg) nnn.nAnthropometry Hipcircumference(centimeters) nnn.nAnthropometry Waistcircumference(centimeters) nnn.nBloodpressureandhypertension Systolicbloodpressure(mmHg) nnn.nBloodpressureandhypertension Diastolicbloodpressure(mmHg) nnn.n

Page 23: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

23

LastupdatedNovember8,2016

Bloodpressureandhypertension Hypertensionstatus(1=yes,0=no) integer(1=yes;0=no)Bloodpressureandhypertension Hypertensionmedicationstatus

(1=yes,0=no) integer(1=yes;0=no)Urinemeasures Urinarycreatinine(mg/dL) nn.nnUrinemeasures Urinaryalbumin(mg/dL) nn.nnUrinemeasures Urinaryalbumintocreatinineratio

(mg/g) nn.nnSmokingstatus Currentsmokingstatus(1=yes,

0=no) integer(1=yes;0=no)Smokingstatus Eversmokingstatus(1=yes,0=no) integer(1=yes;0=no)

Reproductiveandexogenoushormoneuse Menopausalstatus character

Reproductiveandexogenoushormoneuse

Currentuseofanyfemalehormones(1=yes,0=no) integer(1=yes;0=no)

Reproductiveandexogenoushormoneuse

Currentuseof,specifically,peri-orpost-menopausalhormoneuse(i.e.

notincludingcontraceptives)(1=yes,0=no) integer(1=yes;0=no)

Page 24: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

24

LastupdatedNovember8,2016

AppendixD:DetailedOverviewofQCProcessattheDCC

QualityControlProcessattheDCCAlldatasubmittedtotheDCCwillbeprocessedthroughcomprehensivesampleandvariantqualitycontrolalgorithmstopromoteharmonizationwithexistingdataontheportal.Sincegenotypedataislikelytoexhibituniquepatternsofancestryandclassesofvariants,wehavedevelopedalgorithmsfordetectingmajorlinesofancestryandforidentifyingoutliersamongvarioussamplemetrics.SampleQCwillbeperformedusingbi-allelicvariantsonly.

InitialDataReviewInitially,whentheDCCreceivesyourdata,itwillbecheckedforduplicatesandanycrypticrelatednessthatmayresultfromcontaminationordatacollectionerrors.Duplicatesandcrypticrelatednesswillbeidentifiedusingacombinationofpairwiseidentitybydescentandarobustalgorithmforcalculatingpairwisekinshipinthepresenceofpopulationstratification.Shouldanyconcernsarise,thesubmittermaybecontactedinordertoinvestigatepossiblecausesandissuesthatmightbecorrectedpriortocontinuingwithsampleQC.

AncestryInference,Clustering,andOutlierdetectionAfteranagreementtoproceedwithQC,wewillinfermajorlinesofancestry.Ourapproachconsistsofprojectingyourdataontoprincipalcomponentsderivedfromacollectionofcommonancestryinformativevariantsin1000GenomesProjectdata.ThePCsarethenusedasfeaturesinaGaussianMixtureModelingalgorithmtoclusterthemaccordingtotheirancestry.Anysamplesthatcannotbeincludedinanyofthesubsetsduetotheiruniqueancestryorbadgenotyping,areflaggedasoutliers.

SampleMetricOutlierDetectionDuringclustering,metricsforeachsamplewillbecalculated.Whichmetricsarecalculatedwillvarydependingonthetypeofdatareceived.Someofthemorerecognizablemetricsaretransition/transversionrate,callrate,andthenumberofsingletonscalled.Foreachsamplemetric,wewillcalculatetheresidualsresultingfromregressingthemetriconprincipalcomponentsofancestry.Thenwewillcalculateprincipalcomponentsonthoseadjustedmetrics.GaussianMixtureModelingisemployedagainatthisstage,bothontheprincipalcomponentsoftheadjustedmetrics,andoneachoftheindividualadjustedmetrics.Anysamplesthatdonotclusterusingthesetwoapproacheswillbeflaggedasoutliers.

PedigreeReconstructionIfyourdataisfoundtohavepairsofrelatedsamples,pedigreereconstructionwillbeperformed.

QCReportUponcompletionofsampleQC,areportwillbeprovidedtothesubmittertofacilitatethecreationofasuitableanalysisplan.

Page 25: AMP T2D Portal Submitter's Guide to Sending Data to the DCC€¦ · AMP T2D Knowledge Portal Submitter and Analysis Guide for Data at the DCC ... bring forth discoveries in the genetic

25

LastupdatedNovember8,2016

Figure11.AMPT2DQualityControlProcess