com par 25jun14
TRANSCRIPT
• Xin-zhuan Su• Sittiporn Pattaradilokrat• Sethu Nair • Yanwei Qi• Gordon Bullen
NIH/ NIAID – Malaria Functional Genomics Section • Sebastian Gurevich
McGill University
Funding: National Institutes of HealthCanadian Institutes of Health Research
• Philip AwadallaUniversity of Montreal
https://github.com/parasite-genomics/Pipelines - 2.0 Coming in July [email protected]
ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines
Martine Zilversmit
http://www.slideshare.net/zmartine1/com-par-25jun14
ComPar: Genome Assembly, Variant Mapping, and Validation Pipelines
https://github.com/parasite-genomics/Pipelines
• BASH-scripted pipelines
• Accurate variant prediction– SNPs– Small indels – Large indels
(>17bp)– Focused regions of
extreme divergence (35-70% amino acid identity)
• In silico variant validation
Parameters:- Quality Metric and Cutoff- Number of variants per cluster- Maximum distance between variants within a cluster- Maximum distance between smaller clusters to merge
into an HDR
Finding Highly Divergent Regions – HDR Program
VCF File
False Positive Variants
True PositiveVariants
HDR File:- Size of HDR- Position of HDR- Variants Contained
Python - Stand-alone interactive or pipelined
Num
ber o
f Var
iant
s
Position on “Chromosome”
Dye-Terminator Sequenced Variation – 50 basepair Sliding window
Comparing 2 Plasmodium Genomes
Predicted Variants – No filtering Based on Quality Metrics
Num
ber o
f Var
iant
s
Position on “Chromosome”
Num
ber o
f Var
iant
s
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Num
ber o
f Var
iant
s
Position on “Chromosome”
Num
ber o
f Var
iant
s
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Predicted Variants - Filtering Based on Quality Score ≥ 30 Cutoff
Num
ber o
f Var
iant
s
Position on “Chromosome”
Num
ber o
f Var
iant
s
Position on “Chromosome”
Comparing 2 Plasmodium GenomesFiltering Based on Consensus Quality (FQ) ≤ -100 Cutoff
Num
ber o
f Var
iant
s
Position on “Chromosome”
Num
ber o
f Var
iant
s
Position on “Chromosome”
Comparing 2 Plasmodium Genomes
Highly-Divergent Regions (HDRs)
Num
ber o
f Var
iant
s
Position on “Chromosome”
Num
ber o
f Var
iant
s
Position on “Chromosome”
Comparing 2 Plasmodium GenomesQuality ≥ 30 Variants without Consensus Quality ≥ -100
Highly-Divergent Regions (HDRs)
Characteristics of Highly Divergent Regions
33X 44.4% By265 55.6% N67 66.7%
histone acetyltransferase GCN5, putative (GCN5)
RNA-binding protein NOB1, putative
Percent Identity
DNA repair protein, putative
33X 41.4% By265 79.3% N67 51.7%
Characteristics of Highly Divergent Regions