alignment post-processing and variant...

48

Upload: others

Post on 23-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

1

Page 2: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

Alignment post-processing and variant calling

Gabriel RenaudAssociate Professor

Section of BioinformaticsTechnical University of Denmark

[email protected]

DTU Health Technology Bioinformatics

Page 3: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:

P:

HeterozygosityTACAAATATTACAGATAT

5

Page 4: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:

P:

Heterozygosity

segregating sites

6

Page 5: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:P:

TACAAATATTACAGATAT

Heterozygosity

ind#A

M:P:

TACAGATCTTACAGATCT

ind#B

9

Page 6: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:P:

TACAAATATTACAGATAT

Heterozygosity

M:P:

TACAGATCTTACAGATCT

ind#A

ind#B

10

Page 7: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:P:

TACAAATATTACAGATAT

M:P:

TACAGATCTTACAGATCT

Homozygous variantind#A

ind#B

11

Page 8: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

M:P:

TACAAATATTACAGATAT

M:P:

TACAGATCTTACAGATCT

Homozygous invariantind#A

ind#B

12

Page 9: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

AAACAGATCCCGCTGGGTTT

reference

ind#X TACAAATATTACAGATAT

Genotyping

13

Which of the 10 possible genotype is the most likely?

Page 10: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

reference

ind#X

reference

ind#Y

TACAAATATTACAGATAT

TACAGATCTTACAGATCT

Joint Genotyping

14

Page 11: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Menu

• Introduction• Alignment post-processing• From aligned reads to genomic variation• Variant effect

15

Page 12: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Generalized NGS analysis

16

Page 13: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

In principle it is very simple…

Page 14: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

But reality is slightly more complex…

How much of what we observe is actually due to errors (noise) and how much represent real genomic variation?

Page 15: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Recommended workflow1

Alignment sorting, filtering and indexing (.sort.flt.bam)

Local Realignment (.realign.bam)

Base Quality Score Recalibration (recalibrated.bam)

Duplicate marking/removal (.rmdup.bam)

Alig

nmen

t po

st-p

roce

ssin

g

Variant filtering

Variant effect

BAM Variant Calling Format (VCF)

Alig

nmen

t st

atis

tics

Mapping & alignment (.bam)

Variant calling & genotyping (.vcf)

Page 16: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

ALIGNMENT POST-PROCESSING

Page 17: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Recommended workflow1

Alignment sorting, filtering and indexing (.sort.flt.bam)

Local Realignment (.realign.bam)

Base Quality Score Recalibration (recalibrated.bam)

Duplicate marking/removal (.rmdup.bam)

Alig

nmen

t po

st-p

roce

ssin

g

Variant filtering

Variant effect

BAM VCF

Alig

nmen

t st

atis

tics

Mapping & alignment (.bam)

Variant calling & genotyping (.vcf)

Page 18: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Recommended workflow1

Alignment sorting, filtering and indexing (.sort.flt.bam)

Local Realignment (.realign.bam)

Base Quality Score Recalibration (recalibrated.bam)

Duplicate marking/removal (.rmdup.bam)

Alig

nmen

t po

st-p

roce

ssin

g

Variant filtering

Variant effect

BAM VCF

Alig

nmen

t st

atis

tics

Mapping & alignment (.bam)

Variant calling & genotyping (.vcf)

Page 19: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Local realignment 19,20

Each read is independently aligned to the reference genome to cut down on the computational cost.But this means that many reads spanning indels are likely to be misaligned.We see this as regions containing indels as well as clusters of mismatching bases.

Page 20: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Local realignment19,20

✧ How does it work?✧ Step 1: Identify regions likely in need of realignment.✧ Step 2: Perform a multiple sequence realignment in this region, such that the number of mismatching bases

is minimized across all reads.

1 – At least one read contains an indel.2 – A cluster of mismatching bases exists.3 – An already known indel segregates at the site.

RAW REALIGNED

Page 21: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Recommended workflow1

Alignment sorting, filtering and indexing (.sort.flt.bam)

Local Realignment (.realign.bam)

Base Quality Score (recalibrated.bam)

Duplicate marking/removal (.rmdup.bam)

Alig

nmen

t po

st-p

roce

ssin

g

Variant filtering

Variant effect

BAM VCF

Alig

nmen

t st

atis

tics

Mapping & alignment (.bam)

Variant calling & genotyping (.vcf)

Page 22: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Duplicate marking/removal

Why do we have duplicates in our data?• The PCR amplification step

included in the majority of NGS library construction techniques can introduce duplicates in the data.

Why do we need to remove these?• It will bias our variant calls.• PCR errors are propagated and

sampled many times giving rise to FP variant calls.

Basic concepts of duplicate marking algorithm:

1. Identify genomic position and strand for 5’-most bases.

2. Mark reads that are duplicates of each other.

3. Within a group of duplicate reads the read with the highest sum of base quality scores is retained.

http://picard.sourceforge.net/

Before

After

But it is not perfect…❖Does not account for sequencing errors.❖Does not account for natural duplicates.❖Does not account for duplicate reads with different mapping

locations.

Page 23: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Recommended workflow1

Alignment sorting, filtering and indexing (.sort.flt.bam)

Local Realignment (.realign.bam)

Base Quality Score (recalibrated.bam)

Duplicate marking/removal (.rmdup.bam)

Alig

nmen

t po

st-p

roce

ssin

g

Variant filtering

Variant effect

BAM VCF

Alig

nmen

t st

atis

tics

Mapping & alignment (.bam)

Variant calling & genotyping (.vcf)

Page 24: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

◆ Base-calling algorithms produce per-base quality scores by using noise estimates from image analysis.

◆ The raw Phred-scaled quality scores produced by base-calling algorithms may not accurately reflect the true base-calling error rates.

◆ Obtaining well-calibrated base quality scores is important as SNP and genotype calling depends on both the base calls and the per-base quality scores.

Base quality score recalibration?2,3

Page 25: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Base quality score recalibration6

Base quality scores are systematically biased.

underestimated

overestimated

Page 26: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Base quality score recalibration2,3

• How it works!

1. Collect information regarding the following features of the bases:Reported quality scorePosition within the read (machine cycle)Dinucleotide context (current base plus previous base)

2. Count the number of times a site was a mismatch to the reference (excluding known polymorphic sites).

3. Estimate new quality score as:

# of reference mismatches +1

# of observed bases +2Phred-scaled quality score

Example:We observed A [AA, pos. 35, Q20] a 1,000,000 times.

A in this context mismatches the reference a 1,000 times.This gives us: Q value = -10log10((1,000+1)/(1,000,000+2)) ~ Q30

Except East Asians and Europeans, human diversity is very poorly sampled!

Page 27: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Page 28: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Base quality score recalibration2,3

To work we need:• East Asian or European (as in mostly West

European) samples• WGS• Sufficient coverage

My biased opinion:• Just don’t bother

Page 29: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

FROM BAM TO VCFAligned reads genomic variation

Page 30: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Recommended workflow1

Alignment sorting, filtering and indexing (.sort.flt.bam)

Local Realignment (.realign.bam)

Base Quality Score (recalibrated.bam)

Duplicate marking/removal (.rmdup.bam)

Alig

nmen

t po

st-p

roce

ssin

g

Variant filtering

Variant effect

BAM VCF

Alig

nmen

t st

atis

tics

Mapping & alignment (.bam)

Variant calling & genotyping (.vcf)

Page 31: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant calling & genotyping4

◆ Aims

◆ Variant calling: Identify polymorphic sites => sites that differs from the reference.

◆ Genotyping: Determine the genotype for a certain individual at such sites.

◆ Early methods

◆ Works by simply counting the alleles at each site, and then identifying a variant by use of simple cutoff rules.

Page 32: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Identify variants

What if this is human DNA?What if this is bacterial DNA?

---TCGTCGTGGTTGAACTGACGTACCGTTCCCTGAGGCTTAT---

TCCTCGTGGTTGAACGGACGTCGTGGTTGAACGGAC

CGTGGTTGAACGGACGTAGTGGTTGAACGGACGTACAG

GTTGAACGGACGTACCGTTCCCTGTGAACGGACGTACCGTTCCGAACGGACGTACCGTTCCCTGAGGCACGGACGTACAGTTCCCTGGACGTACCGTTCC

GTACCCTGAG--TTATCCCTGAG--TTA

CCTGAGGCTTAT

Page 33: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Identify variants

---TCGTCGTGGTTGAACTGACGTACCGTTCCCTGAGGCTTAT---

TCCTCGTGGTTGAACGGACGTCGTGGTTGAACGGAC

CGTGGTTGAACGGACGTAGTGGTTGAACGGACGTACAG

GTTGAACGGACGTACCGTTCCCTGTGAACGGACGTACCGTTCCGAACGGACGTACCGTTCCCTGAGGCACGGACGTACAGTTCCCTGGACGTACCGTTCC

GTACCCTGAG--TTATCCCTGAG--TTA

CCTGAGGCTTAT

What if this is human DNA?What if this is bacterial DNA?

Page 34: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant calling & genotyping◆ Several Bayesian genotyping methods available◆ Use the information on base counts, base qualities, mapping quality◆ Calculate genotype likelihoods

UnifiedGenotyper (GATK)9

HaplotypeCaller (GATK)10FreeBayes11

Samtools21

GraphTyper

Page 35: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant calling & genotyping• Probabilistic methods – a Bayesian

genotyper5

• The posterior probability of each genotype given the pileup of sequence reads is given by Bayes theorem:

Hardy–Weinberg principle

Page 36: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Determine for each possible genotype

Q20 = 0.01

Q10 = 0.1

They share what is left

They share what is left

Page 37: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Page 38: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

This is “GQ” in VCF

Page 39: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Recommended workflow1

Alignment sorting, filtering and indexing (.sort.flt.bam)

Local Realignment (.realign.bam)

Base Quality Score (recalibrated.bam)

Duplicate marking/removal (.rmdup.bam)

Alig

nmen

t po

st-p

roce

ssin

g

Variant filtering

Variant effect

BAM VCF

Alig

nmen

t st

atis

tics

Mapping & alignment (.bam)

Variant calling & genotyping (.vcf)

Page 40: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant filtration (soft)18

Variant quality score recalibration (GATK)

How do we remove false positive calls?

Use known polymorphic sites to estimate what a real variant and a false variant “looks like”

Learn how does the known sites (=truth set) look like in our data

Evaluate on all our data, filter sites that look different!

Page 41: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Page 42: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant filtration (hard)• Hard filtering16 (when VQSR is not possible*)

– Variant quality score /depth – Mapping quality– Strand bias (the variant being seen only on the forward strand or only on

the reverse strand)– Depth

• Some recommendations17

*number of samples < 30 or if your doing targeted resequencing of a small region, non-model organism.

10 or more samples!

Page 43: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Recommended workflow1

Alignment sorting, filtering and indexing (.sort.flt.bam)

Local Realignment (.realign.bam)

Base Quality Score (recalibrated.bam)

Duplicate marking/removal (.rmdup.bam)

Alig

nmen

t po

st-p

roce

ssin

g

Variant filtering

Variant effect

BAM VCF

Alig

nmen

t st

atis

tics

Mapping & alignment (.bam)

Variant calling & genotyping (.vcf)

Page 44: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Variant annotation8

Link: http://www.ensembl.org/info/docs/variation/vep/vep_script.html

Page 45: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

I did not cover...

● Polyploid● Phasing:

TACAAATAT

TAGAAACAT

TACAAACAT

TAGAAATATvs

Page 46: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

References1) http://gatkforums.broadinstitute.org/discussion/1186/best-practice-variant-d

etection-with-the-gatk-v4-for-release-2-0.2) http://gatkforums.broadinstitute.org/discussion/44/base-quality-score-recalibration-bqsr3) http://www.youtube.com/watch?v=L4D1dwES9s84) Nielsen R, Paul JS, Albrechtsen A, Song YS: Genotype and SNP calling from

next-generation sequencing data. Nature reviews Genetics 2011, 12(6):443-451.5) McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K,

Altshuler D, Gabriel S, Daly M et al: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research 2010, 20(9):1297-1303.

6) DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, del Angel G, Rivas MA, Hanna M et al: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics 2011, 43(5):491-498.

7) http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

8) http://www.ensembl.org/info/docs/variation/vep/index.html

Page 47: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

References9) http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_genotyper

_UnifiedGenotyper.html10) http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_haplotype

caller_HaplotypeCaller.html11) http://arxiv.org/abs/1207.390712) http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantrec

alibration_VariantRecalibrator.html13) http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantrec

alibration_ApplyRecalibration.html14) http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_varianteva

l_VariantEval.html15) http://www.broadinstitute.org/gatk/guide/article?id=124716) http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_filters_Vari

antFiltration.html17) http://www.broadinstitute.org/gatk/guide/article?id=118618) http://www.broadinstitute.org/gatk/guide/article?id=3919) http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_indels_Re

alignerTargetCreator.html20) http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_indels_Ind

elRealigner.html21) Heng Li, Jue Ruan, Richard Durbin. Mapping short DNA sequencing reads and calling variants

using mapping quality scores, Genome Research, 2008

Page 48: Alignment post-processing and variant callingteaching.healthtech.dtu.dk/material/22126/2020/3_post... · 2020-01-07 · 5. juni 2019 DTU Sundhedsteknologi Alignment post-processing

DTU Sundhedsteknologi5. juni 2019 Alignment post-processing and variant calling

Exercise time!

http://teaching.healthtech.dtu.dk/22126/index.php/Postprocess_exercise