germline variant calling and joint genotyping...joint genotyping analysis-ready n on-gatk mark...
TRANSCRIPT
![Page 1: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/1.jpg)
Germlinevariantcallingandjointgenotyping
ApplyingthejointdiscoveryworkflowwithHaplotypeCaller+GenotypeGVCFs
talks
![Page 2: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/2.jpg)
YouarehereintheGATKBestPracDcesworkflowforgermlinevariantdiscovery
Analysis-Ready Variants
111Raw Reads
Raw Variants IndelsSNPs
Analysis-ReadyReads
Indel Realignment
Base Recalibration
SNPs & Indels
Variants
IndelsSNPs
VariantAnnotation
Variant Evaluation
look good?
use in projecttroubleshoot
111Analysis-ReadyReads
Genotype Likelihoods
Joint Genotyping
Analysis-Ready
No
n-G
AT
K
Mark Duplicates& Sort (Picard)
Var. Calling HC in ERC mode
separately per variant type
Variant Recalibration
Map to Reference
BWA mem GenotypeRefinement
Data Pre-processing Variant Discovery>> >> Callset Refinement
![Page 3: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/3.jpg)
NewGVCFworkflowsolvesbothproblems,yieldssameresultsAscalableworkflowforjointvariantdiscovery
+Incrementalover:meScalableoversamplesize
![Page 4: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/4.jpg)
Toolsinvolvedintheworkflow
• IdenDfypotenDalvariantsineachsample
➔ HaplotypeCaller
• Performjointgenotypingonthecohort
➔ GenotypeGVCFs
![Page 5: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/5.jpg)
Whatitdoes:• CallsSNPandindelvariantssimultaneously• Performslocalre-assemblytoidenDfyhaplotypes• ReferenceconfidencemodelenablesdetecDonoflow
frequencyvariants• Joint-discoveryworkflow(referenceconfidencemodel,GVCFs)• HandlesRNAseqnaDvely• Handlesnon-diploidorganismsandpooledsamples
Whatitdoesn’tdo• SomaDcvariantcalling(useMuTect2instead!)
KeyHaplotypeCallerfeatures
![Page 6: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/6.jpg)
HowHaplotypeCallerworksin4simplesteps
![Page 7: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/7.jpg)
Step1:IdenDfyAcDveRegions
• Slidingwindowalongthereference• Countmismatches,indelsandsoVclips
Ø Measureofentropy
Overthreshold:Trigger“AcDveRegion”tobeprocessed
![Page 8: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/8.jpg)
Step2:Assembleplausiblehaplotypes
• Localrealignmentviagraphassembly
• Traversegraphtocollectmostlikelyhaplotypes
• AlignhaplotypestorefusingSmith-Waterman
Likelyhaplotypes+candidatevariantsites
CanmakeHCoutputthereassembledreadsandselectedhalpotypesusingthe–bamOutparameter
![Page 9: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/9.jpg)
ExampleassemblygraphproducedbyHaplotypeCaller
• Previousalignmentsareignored• K-mersconsistofeverypossiblesequencecombinaDonbasedonthereads• Mostlikelypathsthroughthegrapharescored
![Page 10: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/10.jpg)
GraphassemblyrecoversindelsandremovesarDfacts
NA12878originalreaddata
HaplotypeCaller
(validated)
MulDplecallerarDfactsthatarehardtofilterout,sincetheyarewellsupportedbyreaddata
![Page 11: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/11.jpg)
GraphassemblyresolvescomplexitycausedbymapperlimitaDons
OriginalBWAalignments
Reference TConsensus C T T A A T A A G T G TReads A C
Canberepresentedbythemappertwodifferentways,atrandom:
HaplotypeCallerwillseMleononerepresenta:on->cleaneroutputcall
[+A][T->C]
[T->A][+C]
![Page 12: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/12.jpg)
Bonusperkofhaplotypecalling:freephysicalphasing
Twonewsample-levelannotaDons,PID(forphaseidenDfier)andPGT(phasedgenotype)
![Page 13: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/13.jpg)
Step3:ScorehaplotypesusingPairHMM
• Calculatehaplotypelikelihoodsgiventheread– PairHMMalignseachreadtoeachhaplotype
Likelihoodofthehaplotypegivenreads
![Page 14: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/14.jpg)
PairHMM State(M) Match(Ix) Insertion(Iy) DeletionTransition probabilities (derived from BQSR)(ε) = Gap continuation(δ) = Gap open penalty(1 - ε) = Base precedes an insertion or a deletion(1 - 2δ) = Base matches and continues
PairHMMusesbasequaliDestoscorealignments
->likelihoodsofthehaplotypesgiventhereads->storeinmatrix
Haplotypes
Reads
Aij=probabilityofhaplotypevsread
![Page 15: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/15.jpg)
Step4:GenotypeeachsampleateachpotenDalvariantsite
• Determinemostlikelyallelesforeachsample• Basedonsupportforhaplotypes(fromPairHMM)• Evaluatedoverreadsfromeachsample
Genotypecallsforeachsample
![Page 16: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/16.jpg)
0.01 0.02 0.03 0.04
0.09 0.06 0.07 0.08
0.10 0.11 0.01 0.02
HaplotypesR 1 32
Reads 1
2
3
0.04 0.03
0.08 0.09
0.11 0.10
Alleles
Reads
- T1
2
3
*Thesenumbersaremadeuptogiveasenseofhowtheprocessworks.Inrealitythenumberswouldbemuchsmaller.
Takehighestprobabilityofhaplotypesgivenreadsthatcontaintheallele(foreachvariantposi:on)
Reference:ATCGATCATAGCTAGCTGCG Haplotype1:ATCGA-CATAGCTAGCTGCGHaplotype2:ATGGATCATAGCTTGCTGCGHaplotype3:ATCGA-CATAGCTTGCTGCG
*
Transformingsupportforhaplotypesintosupportforalleles
![Page 17: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/17.jpg)
Bayesianmodel
4 SNP calling
4.1 Simple genotype likelihoods for presentations
Pr{G|D} =Pr{G}Pr{D|G}
�i Pr{Gi}Pr{D|Gi}, [Bayes’ rule]
Pr{D|G} =⇧
j
�Pr{Dj|H1}
2+
Pr{Dj|H2}2
⇥where G = H1H2
Pr{D|H} is the haploid likelihood function
4.1.1 SNP haploid likelihood
Pr{Dj|H} = Pr{Dj|b}, [single base pileup]
Pr{Dj|b} =
⇤1� �j Dj = b,�j otherwise.
4.1.2 Indel haploid likelihood
Pr{Dj|H} =⌅
alignments � of Dj to H
Pr{Dj, ⇥}
4.2 Genotype likelihoods
Pr{Di|GTi} =⇧
j
Pr{Di,j|GTi}
Pr{Di,j|GTi = AB} = (Pr{Di,j|A}+ Pr{Di,j|B}) /2
Pr{Di,j|B} =
⇤1� �i,j Di,j = B,
�i,j · Pr{B is true|Di,j is miscalled} otherwise.
3
Prior of the genotype
Likelihood of the genotype
Diploid assumption
Justpluginthenumbers! 0.04 0.03
0.08 0.09
0.11 0.10
Alleles- T
Reads
1
2
3
DeterminesthemostlikelygenotypeofthesampleateachsitewherethereisevidenceofvariaDon
Andfinally,abitofBayesianmath
![Page 18: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/18.jpg)
HaplotypeCallerrecap:readsin/variantsout
BAM
VCF
Thisisallyouneedforasinglesampleortradi:onalmul:-sampleanalysis
![Page 19: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/19.jpg)
Forjointdiscovery:emitGVCF+addjointgenotypingstep
s
• RunHCinGVCFmodetoemitGVCF
• RunGenotypeGVCFstore-genotypesampleswithmul:-samplemodel
![Page 20: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/20.jpg)
GVCFincludes<NON-REF>allele+genotypelikelihoodsforjointgenotyping
Symbolicallelestandsforallnon-calledbutpossiblenon-referencealleles
endposofhom-refband
![Page 21: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/21.jpg)
GVCFsarevalidVCFswithextrainformaDon
![Page 22: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/22.jpg)
MulDpleGVCFscombinedformasquared-offmatrixofgenotypes
s
![Page 23: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/23.jpg)
ThejointdiscoveryworkflowinpracDce
RawgVCF*fileRawgVCF*fileRawgVCF*file
Analysis-readyBAMfileAnalysis-readyBAMfileAnalysis-readyBAMfile
GenotypeGVCFs
RawVCFfile
HaplotypeCaller
java–jarGenomeAnalysisTK.jar
–THaplotypeCaller\–Rhuman.fasta\–Isample1.bam\–osample1.g.vcf\
[–Lexome_targets.intervals\]–ERCGVCF
java–jarGenomeAnalysisTK.jar
–TGenotypeGVCFs\–Rhuman.fasta\–Vsample1.g.vcf\–Vsample2.g.vcf\–VsampleN.g.vcf\–ooutput.vcf
If>200samples,combineinbatchesfirstusingCombineGVCFs
![Page 24: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/24.jpg)
NewGVCFworkflowsolvesbothproblems,yieldssameresultsAndthatishowwecanscalejointdiscoverytoeleventythousandsamples
+Incrementalover:meScalableoversamplesize
![Page 25: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/25.jpg)
YouarehereintheGATKBestPracDcesworkflowforgermlinevariantdiscovery
Analysis-Ready Variants
111Raw Reads
Raw Variants IndelsSNPs
Analysis-ReadyReads
Indel Realignment
Base Recalibration
SNPs & Indels
Variants
IndelsSNPs
VariantAnnotation
Variant Evaluation
look good?
use in projecttroubleshoot
111Analysis-ReadyReads
Genotype Likelihoods
Joint Genotyping
Analysis-Ready
No
n-G
AT
K
Mark Duplicates& Sort (Picard)
Var. Calling HC in ERC mode
separately per variant type
Variant Recalibration
Map to Reference
BWA mem GenotypeRefinement
Data Pre-processing Variant Discovery>> >> Callset Refinement
![Page 26: Germline variant calling and joint genotyping...Joint Genotyping Analysis-Ready N on-GATK Mark Duplicates & Sort (Picard) Var. Calling HC in ERC mode separately per variant type Variant](https://reader030.vdocument.in/reader030/viewer/2022040316/5e26f4acdd757666f1577c11/html5/thumbnails/26.jpg)
Furtherreadinghgp://www.broadinsDtute.org/gatk/guide/best-pracDces
hgp://www.broadinsDtute.org/gatk/guide/arDcle?id=1237
hgps://www.broadinsDtute.org/gatk/gatkdocs/
org_broadinsDtute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php
hgps://www.broadinsDtute.org/gatk/gatkdocs/org_broadinsDtute_gatk_tools_walkers_variantuDls_GenotypeGVCFs.php
talks