finding and calling genome variantsbarc.wi.mit.edu/education/hot_topics/genomevariants_jul... ·...

50
Finding and Calling Genome Variants

Upload: others

Post on 08-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

FindingandCallingGenomeVariants

Page 2: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

Outline•  Genomevariantsoverview•  Miningvariantsfromdatabases

! dbSNP! HapMap! 1000Genomes! Disease/Clinicalvariantsdatabases

•  Callingvariantsusingyourowndata! GATKbestpracGces!  Samtools(mpileup/bcIools)

2

Page 3: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

GenomicVariaGon•  PopulaGongeneGcs

"  Measure/explaindiversity/heritability

•  DiseasesuscepGbility"  GWAS"  Biomarkers

•  VariantsmaycauseaparGculartrait"  Regulatoryelement(eg.promoter,enhancer,3’UTRetc.)"  Proteincodingsequence(eg.silent,missense,ornonsensemutaGon)

Palstra,RJ.etal(2012)hYp://evoluGon.berkeley.edu/evolibrary/arGcle/mutaGons_06

3

Page 4: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

GenomicVariaGon:SequencevsStructuralVariaGon

•  SequenceVariants

•  StructuralVariants(>50basesormore)

hYp://www.ensembl.org/info/genome/variaGon

Type DescripGon Example(Reference/AlternaGve)

SNP SingleNucleoGdePolymorphism Ref:...TTGACGTA... Alt:...TTGGCGTA...

Inser+on InserGonofoneorseveralnucleoGdes Ref:...TTGACGTA... Alt:...TTGATGCGTA...

Dele+on DeleGonofoneorseveralnucleoGdes Ref:...TTGACGTA... Alt:...TTGGTA...

Subs+tu+on AsequencealteraGonwherethelengthofthechangeinthevariantisthesameasthatofthereference.

Ref:...TTGACGTA... Alt:...TTGTAGTA...

Type DescripGon Example(Reference/AlternaGve) CNV

CopyNumberVariaGon:increasesordecreasesthecopynumberofagivenregion

"Gain"ofonecopy: "Loss"ofonecopy:

Inversion AconGnuousnucleoGdesequenceisinvertedinthesameposiGon

Transloca+on AregionofnucleoGdesequencethathastranslocatedtoanewposiGon(eg.BCR-ABLfusiongene)

4

Page 5: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

GenomeVariaGon:IndividualandPopulaGon

•  SingleNucleoGdePolymorphisms(SNP)– MAF*>1%commonSNP– MAF*<1%rareSNP– SomedefiniGonsuse5%asthreshold

•  Onaverageonevariantevery1200bases(basedonHapMap)

*MinorAlleleFrequency5

Page 6: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

GenomeVariaGon:Reference

Organism Descrip+on/Strain Assembly*Human DNAisolatedfromWBCof4anonymousindividuals

(2malesand2females).However,themajorityofthesequencecamefromoneofthemaledonors

GRCh37/GRCh38

Mouse C57BL/6J GRCm37/GRCm38C.elegans N2 WormBasevWS220Fruitfly ISO1 BDGPRelease5Yeast S288C SGDFeb2011A.thaliana Colecotype TAIR10

*Availablein/nfs/genomes 6

Page 7: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

Describing/AnnotaGngVariants•  Generalguidelines*

"  noposiGon0"  rangeindicatedby“_”(eg.586_591)

•  DNA"  g.957A>T(toincludechromosomeusechr9:g.957A>T)"  g.413delG"  g.451_452insT"  InCDS,

! c.23G>C! +1isAofATG(startcodon);-1istheprevious/upstreamnucleoGde! “*”isthestopcodon(eg.*1isthefirstnucleoGdeofthestopcodon)

•  RNA"  r.957a>u

•  Protein(three/oneleYeraa)"  p.His78Gln

*Forcompletelist/guidelinesseehgvs.org

ChrPosi+on Ref AltSourceg.change:rsID:Depth=AvgSampleReadDepth:Func+onGVS:hgvsProteinVariant1689824989 G T EVS g.89824989G>T:rs140823801:Depth=141:missense:p.Q993K

7

Page 8: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

GenomeVariaGonDatabases:dbSNP

•  RepositoryforSNPsandshortsequencevariaGon(<50bases)•  Currentbuild:dbSNP150(Feb2017)

"  Approx.135Mvalidatedrs#’sforhuman!MostlygermlinemutaGons(smallersubsetofsomaGc) #!Containsrarevariantsaswell #

"  Variousorganisms(Supportfornon-humanorganismsendingSept1st.)

•  EachSNP,orrecord,isidenGfiedbyanrs#thatincludes"  SummaryaYributes" NCBIresources(linkedtoClinVar,GenBank,etc.)"  Externalresources(linkedtoOMIMandNHGRIGWAS)

•  SubmissionsaremadefrompubliclaboratoriesandprivateorganizaGons(ss#’s),andidenGcalrecordsareclusteredintoasinglerecord(rs#’s).

•  rsidissamefordifferentassemblies(eg.GRCh37/38),butchromosomalcoordinatesmaydiffer!

8

Page 9: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

Hands-on:dbSNP

•  Miningvariantsfromdatabases•  FindingSNPsforyourfavoritegeneindbSNP

9

Page 10: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

GenomeVariaGonDatabases:1000GenomesProject

•  ExtensionoftheHapMapin2008tocataloguegeneGcvariaGonbysequencingatleast1000parGcipants

•  DiscoverpopulaGonlevelhumangeneGcvariaGons•  IniGallyconsistedofwholegenomelowcoverage

(4X)andhighcoverageexome(20X)sequencing•  VCFformatwasdeveloped,andiniGally

maintained,fortheproject•  Phase3containsWGSdatafor2504individuals

across26populaGons.

hYp://www.internaGonalgenome.org/ 10

Page 11: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

MiningDisease/ClinicalVariantsDatabase Link

CatalogofPublishedGWAS(NHGRI) hYps://www.ebi.ac.uk/gwas/

GWASCentral gwascentral.org

ClinVar(NCBI) ncbi.nlm.nih.gov/clinvar

PheGenI(NCBI) ncbi.nlm.nih.gov/gap/phegeni

SNPedia snpedia.com

11

Page 12: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

MiningDisease/ClinicalVariantsinCancer:COSMIC

•  hYp://cancer.sanger.ac.uk/cosmic•  CatalogofSomaGcMutaGonsinCancer(COSMIC)

createdin2005•  v70(Aug2014)had~2McodingpointmutaGons•  Datasetsarecuratedfrompublishedliteratureandotherdatabases(eg.TCGA,ICGC)

•  AvailableinbothGRCh37/38coordinates•  Tools/Features"CancerGeneCensus(currently572genes)"Browser:Cancer/CellLine"COSMICMart(similartoBioMart) 12

Page 13: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

Callingvariantsfromsequencedatarequires3broadsteps

Preparedata;QC,align,SAM->BAM,sort,removePCRduplicates

Annotateforfunc+on;snpEff,HaploReg,GTEx

Callvariants;basequalityscorecalibraIon,variantcall,qualityfiltering

1

3

2

13

Page 14: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

14

Page 15: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

Checkreadqualitywithfastqc

(hYp://www.bioinformaGcs.babraham.ac.uk/projects/fastqc/)

Alignreadstoreferencegenome•  UseasensiGve(gapped)alignertoaccountforlargeindels

(BWA,hYp://bio-bwa.sourceforge.net/)*.

*SeeBaRCSOPsforusageinstrucGons. 15

Page 16: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

16

Page 17: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

PicardTools

17

Page 18: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

ConvertSAM->BAMandsortreadsbycoordinates(hYps://broadinsGtute.github.io/picard/)

•  PicardTools:AddOrReplaceReadGroups•  SO=coordinate<-sortsmappedreadsbycoordinate.

•  PicardTools:MarkDuplicates•  Thiscommandflagsallduplicatereadsinfile.•  ThisflagisrecognizedbysamtoolsmpileupandGATK

HaplotypeCaller.•  Bydefault,readswiththistagwillbeignored.

18

Page 19: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

PicardTools

19

Page 20: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

Samtools GATK

PicardTools

20

Page 21: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

Samtools GATK

PicardTools

Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality

21

Page 22: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

Samtoolsmpileup

•  ThempileupcommandscanseveryposiGonsupportedbyanalignedreadandrecordsthepossiblegenotypes.

•  Moreover,everyGmeamappedreadhasamis-matchtothereferencegenome,itincorporatesinformaGon,suchasthenumberofreadsthatsharethemis-match,thequalityofthebaseatthatposiGon,andtheexpectedsequencingerrorrates.

•  Itthencomputestheprobabilitythateachofthesegenotypesistrulypresentinthesample.

22

Page 23: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

BasequalityscorerecalibraGon*(BQSR)

•  QualityscoresproducedbysequencersaresubjecttosystemaGctechnicalerror,thatmayleadtoover-orunder-esGmatedbasequalityscores.

•  BQSRisaprocessthatappliesmachinelearningtomodeltheseerrorsempiricallyandadjustthequalityscoresaccordingly.

•  Forexample,foragivenrun,whentwoAnucleoGdesinarowarecalled,thenextbasecalledhada1%higherrateoferror.SoanybasecallthatcomesaIerAAinareadshouldhaveitsqualityscorereducedby1%.

*hYps://gatkforums.broadinsGtute.org/gatk/discussion/44/base-quality-score-recalibraGon-bqsr

23

Page 24: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

CallingVariants

24

Page 25: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

CallingVariants:QuesGonableCalls

25

Page 26: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

CallingVariants:QuesGonableCalls

26

Page 27: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

Samtools GATK

PicardTools

Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality

27

Page 28: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

Samtools GATK

PicardTools

Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality

BcIoolscall HaplotypeCallerCallvariants

28

Page 29: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

bcIoolscall

•  ThebcNoolscallcommandusesthegenotypelikelihoodsgeneratedfromsamtoolsmpileuptocallvariants,andoutputsallidenGfiedvariantsinvariantcall(VCF)format.

29

Page 30: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

GATKHaplotypeCaller

•  WhenHaplotypeCallerencountersaread-mappedregionshowingsignsofvariaGon,itdiscardstheexisGngmappinginformaGonandcompletelyreassemblesthereadsinthatregion.

•  ThisallowstheHaplotypeCallertobemoreaccuratewhencallingregionsthataretradiGonallydifficulttocall,forexamplewhentheycontaindifferenttypesofvariantsclosetoeachother.

•  Foreachregion,itperformsapairwisealignmentofeachreadagainsteachhaplotype.Thisproducesamatrixoflikelihoodsofhaplotypes.ThemostlikelyalleleforeachposiGonisassigned.

•  HaplotypeCallerisabletocorrectlyhandlethesplicejuncIonsthatmakeRNAseqachallengeformostvariantcallers.

30

Page 31: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

VCFFormat

www.1000genomes.org

•  VariantCallFormat(VCF);BCF$ binaryversionofVCF•  Textfileformatwithmeta-informaGonandheaderlines,

followedbydatalinescontaininginformaGonaboutaposiGoninthegenome.

31

Page 32: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

Samtools GATK

PicardTools

Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality

BcIoolscall HaplotypeCallerCallvariants

32

Page 33: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

Samtools GATK

PicardTools

Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality

BcIoolscall HaplotypeCallerCallvariants

VariantQualityScoreRecalibraGon(VQSR)vcIoolsvcf-annotate Filtervariants

33

Page 34: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

VcIooolsvcf-annotate

•  VcNoolsvcf-annotateisawaytohardfilteryourcalledvariantsusing“standard”qualitythresholdsorthroughuser-specifiedthresholds.! vcf-annotate -f + myFile.vcf > myFile_annot.vcf

! “+”appliesseveralfilterswithdefaultvalues,eg.! QualINTMinimumvalueoftheQUALfield[10]! MinDPINTMinimumreaddepth[2]

34

Page 35: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

GATKVariantqualityscorerecalibrator(VQSR)

•  VQSRassignsawell-calibratedprobabilitytoeachvariantcallinacallsetwhichcanbeusedtofilterforhighqualityvariants.

•  VQSRachievesthisbytakingareferencesetitassumestobe“true”variants(Hapmap)andbuildsadistribuGonoftheirqualitymetrics.Thisisusedtobuildamodelofwhata“true”variantshouldlooklike.

•  ThismodelthenassignsarecalibraGonqualityscoretoyourvariants.Thehigherthisscore,thegreateritsfittothe“true”model.

•  Thetoolallowsforthese�ngof“Tranches”orthresholdsthattheoreGcallyallowyoutorecover100%,99%,90%,etcoftheTruevariantsinthetrainingset.Youcanfilteryourresultsonthismetrictoachievegreater/reducedspecificity/sensiGvity.

35

Page 36: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

GATKVariantqualityscorerecalibrator(VQSR)cont’d

•  Caveats:•  ThisproceduremustbeperformedforSNPsandINDELs

separately.•  Itdoesnotworkfororganismsforwhichno“true/training”data

setsareavailable.•  Thepowerofthismethodisdependentofthe#ofreads.Exome

and/orlowcoverageexperimentsmayproducemanylow-qualityvariantcalls.

•  SeetheGATKbestpracGcesformoreinformaGononapplyingthis

method•  hYps://soIware.broadinsGtute.org/gatk/documentaGon/

arGcle.php?id=2805

36

Page 37: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

Samtools GATK

PicardTools

Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality

BcIoolscall HaplotypeCallerCallvariants

VariantQualityScoreRecalibraGon(VQSR)vcIoolsvcf-annotate Filtervariants

37

Page 38: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

Samtools GATK

PicardTools

Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality

BcIoolscall HaplotypeCallerCallvariants

VariantQualityScoreRecalibraGon(VQSR)vcIoolsvcf-annotate Filtervariants

Assessforrare/commonvariants

38

Page 39: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

AnnotatecommonSNPsinyourdata

•  BedtoolsintersectcanbeusedtoannotatevariantsfromyourcallsetthatoverlapwithvariantsfoundindbSNP.•  intersectBed-wao-split-aA_reads.bt2.sorted_unique.raw.vcf-b

SNP146.bed>A_reads.bt2.sorted_unique.annotated.vcf

39

Page 40: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

Samtools GATK

PicardTools

Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality

BcIoolscall HaplotypeCallerCallvariants

VariantQualityScoreRecalibraGon(VQSR)vcIoolsvcf-annotate Filtervariants

Assessforrare/commonvariants

40

Page 41: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

1)Preparedata

3)AnnotateforFuncGon

2)Callvariants

QCreads&Aligntoreference

Samtools GATK

PicardTools

Samtoolsmpileup BasequalityscorerecalibraGon(BQSR)Assessquality

BcIoolscall HaplotypeCallerCallvariants

VariantQualityScoreRecalibraGon(VQSR)vcIoolsvcf-annotate Filtervariants

Assessforrare/commonvariants

VariantAnnotaGons41

Page 42: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

CallingVariants:AnnotaGon

•  Annotatevariantswith(funcGonal)consequence

eg.chr12:g25232372A>Gisamissensevariant•  PopulartoolsincludesnpEff,andVariantEffectPredictor(VEP)fromEnsembl•  ChoiceofannotaGonmayaffectvariantannotaGon"  RefSeq"  Ensembl"  GENCODE

42

Page 43: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

AnnotaGonofnon-codingvariaGon

•  HaploreghYp://archive.broadinsGtute.org/mammals/haploreg/haploreg.php

•  SNPscanbevisualizedwith

•  ChromaGnstateandproteinbindingannotaGonfromtheRoadmapEpigenomicsandENCODEprojects.

•  SequenceconservaGonacrossmammals,theeffectofSNPsonregulatorymoGfs,andtheeffectofSNPsonexpressionfromeQTLstudies.

Page 44: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

Hands-on:Haploreg

•  IdenGfyingthepotenGalfuncGonofnon-codingvariants.

44

Page 45: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

Hands-on:Samtools:ExamineCalledvariants

•  AnalyzecalledvariantsinIGV.

45

Page 46: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

BaRCSOP

•  VariantcallingusingSamtoolsandGATK.ManipulaGng/interpreGngVCFfiles

hYp://barcwiki/wiki/SOPsunderVariantcallingandanalysis

46

Page 47: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

ResourcesForMiningVariantsDatabase LinkdbSNP www.ncbi.nlm.nih.gov/SNP

HapMap hapmap.ncbi.nlm.nih.gov

1000Genomes 1000genomes.org

UK10K uk10k.org

ExomeVariantServer(EVS) evs.gs.washington.edu/EVS

PersonalGenomeProject(Harvard) personalgenomes.org

ExACBrowser(Broad) exac.broadinsGtute.org

47

Page 48: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

ResourcesForMiningVariants:Cancer

Database LinkInternaGonalCancerGenomeConsorGum(ICGC)

icgc.org

CatalogueofSomaGcMutaGoninCancer(COSMIC)

cancer.sanger.ac.uk

cBioPortalforCancerGenomics cbioportal.org

CancerCellLineEncyclopedia(CCLE) broadinsGtute.org/ccle

48

Page 49: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

ResourcesForMiningVariants:Plants

•  1001Genomes(A.thaliana1001strains)"  1001genomes.org

•  1000Genomes(large-scalegenesequencingofatleast1000plantspecies)"  www.onekp.com

49

Page 50: Finding and Calling Genome Variantsbarc.wi.mit.edu/education/hot_topics/GenomeVariants_Jul... · 2017. 7. 20.  · 1000 Genomes Project • Extension of the HapMap in 2008 to catalogue

VariantCallingworkflow

•  PleaseseeourVariantCallingwalkthroughexercisehere:• hYp://jura.wi.mit.edu/bio/educaGon/hot_topics/GenomeVariants_Jul2017/Genome_Variant_calling_walkthrough.txt

• WithinyouwillfindthecommandsrequiredforcallingvariantswithbothsamtoolsandGATK.

50