human genome variants analysis -...

15
Human Genome Variants Analysis Marin Vargas, Sergio Paul Gennaio 2014

Upload: trinhkiet

Post on 21-Aug-2019

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

Human GenomeVariants Analysis

Marin Vargas, Sergio Paul

Gennaio 2014

Page 2: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

Why make predictive diagnosis of a genetic disease using the whole genome?

o Genome sequencing is the only wayto get all genetic information.

o The cost of whole genomesequencing is drastically decreased.

o The Next Generation Sequencing(NGS) technology has beenremarkably successful in finding thecauses of Mendelian and rarediseases.

o With personalized medicine we canget targeted therapies.

Page 3: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

SNV might cause a genetic disorderSickle cell anemiaSNV: Single Nucleotide Variant

o If a SNV is present in at least 1% of population � SNP.o If a SNV isn’t present in at least 1% of population � Mutation.

SNP: Single Nucleotide Polimorfism

Page 4: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

From Genome Variants to patient’s genetic disease diagnosis

DNA

ExtractionDNA Sequencing

(Genome or Exome)

Genome reference

Variants Calling

(Bioinformatics Process)

Variants Analysis

(Bioinformatics Process)

Page 5: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

DNA extraction

o The “Centro Genomica Funzionale” (CGF) of Prof. Delledonne has available a sequencer “Illumina Hiseq 1000” with Next Generation Sequencing (NGS) technology.

Page 6: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

DNA sequencingNext Generation

Sequencing (NGS)

(reads)

o To be reliable enough to sequence the entire genome high coverage is required.

o Human Genome (~ 3 Gb)needs more than 90 Gb.

o Human Exome (~ 135 Mb) needs more than 4 Gb.

Load into sequencer

Page 7: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

DNA sequencing resulto The result of NGS is a file in FASTQ format containing all paired-end reads

(length every single read ~ 100) of the Exome/Genome sequenced.

(Sequence)

Page 8: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

Variant Calling

o Raw NGS reads quality filtering.

o Alignment to reference genome (BWA).o Alignment to reference genome (BWA).

o Variants calling (GATK).

Example VCF file:

Page 9: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

Aligment to reference genome by BWA

Burrows-WheelerAligner

Page 10: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

Variant Calling by GATK

Page 11: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

Variant Analysis

• Genome ~ 5,000,000 variants• Exome ~ 200,000 variants

The variant analysis is based on different filtering levels:

o Allele frequency (Common Variants).o Exonic and splice regions and variants are not synonymous (No obvious effect on

protein).o Probability that the mutation is harmful and that the conservation sites are probably

very important for the protein functionality (Biological interpretation).o Homozygous and heterozygous mutations are different.o The expected result are a few biologically relevant variants.

• Exome ~ 200,000 variants

Page 12: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

Variant Analysis Software

o Free academic software:

Functional annotation of genetic variants from high-throughput sequencing data

Human Genome Interpreter

o Paid software:

Page 13: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

Variant Analysis - Case of study 1o We analyzed the exome of a 20-year-old man with Alagille syndrome and Acute

Lymphoblastic Leukemia.

o More than 280,000 variants were analyzed using KNOME software.

o We found putative damaging mutations in genes such as PAX5 (R38H) andNOTCH1 (K1821N) which might be strongly related to the observed disease.

o The publication about this work is ongoing.

Page 14: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

Variant Analysis - Case of study 2We are analyzed two families:

� The first family had a phenotype of a neuromusculardisease together an inflammatory disease.

� Their exomes have been analyzed using ANNOVAR.� We found a mutation in the son’s gene NEB (I6512T)

correlated with the Nemaline myopathy disease.

� The second family had Palmoplantar keratoderma(PK) disease and a Charcot marie tooth type 2(CMT) disease.

� Their exomes have been analyzed using ANNOVAR.� We found a mutation in the gene KRT1 (F200L) for

these ones with PK disease and found a mutation inthe gene MPZ (S44F) for these ones with CMTdisease, no mutations were found in these genes forhealthy patient.

Page 15: Human Genome Variants Analysis - Bioinformaticsmolsim.sci.univr.it/2014_bioinfo2/genomica/04_Variant_analysis.pdf · Variant Analysis - Case of study 1 o We analyzed the exome of

www.molecularlab.it di Ott-2013

Alla ricerca dei geni responsabili delle malattie

IL PROGETTO DOSE STUDIA LE DIFFERENZE TRA LE QUANTITÀ MINORE O MAGGIORE DEI GENI E LE CONSEGUENZE CHE QUESTO HA PER LA SALUTE

L'era informatica che stiamo vivendo ha fornito gli strumenti affinché la biologia possa produrre un numero enorme disequenze di DNA di molte specie diverse. La tecnologia moderna ha reso il sequenziamento del DNA più semplice, menocostoso e più affidabile, con enormi benefici per la diagnosi e la cura delle malattie.

In passato la difficoltà stava nel raccogliere dati genetici, oggi la sfida è dare loro un senso.

"Stiamo assumendo un approccio evolutivo per dare un senso alle sequenze di DNA, ciò significa che esaminiamo il modo incui i geni si sono evoluti, per capire meglio come funzionano”.cui i geni si sono evoluti, per capire meglio come funzionano”.

Con il suo progetto DOSE ("Dosage sensitive genes in evolution and disease"), finanziato dall'UE, McLysaght sta studiandole differenze tra le dosi dei geni, ovvero, possedere una quantità minore o maggiore del gene, e le conseguenze che questoha per la salute."Le variazioni delle quantità di un gene tra individui - variazioni di dose - sono una scoperta relativamente recente e a voltesono implicate nella malattia", spiega McLysaght. In termini semplici, la dottoressa ha adottato un approccio evolutivo perscoprire quali variazioni di dose sono accettabili e quali sono probabilmente coinvolti nelle malattie umane.

“Osservando l'evoluzione possiamo capire le variazioni accettabili e inaccettabili del DNA", spiega McLysaght. "Icambiamenti del DNA che si sono rivelati inaccettabili durante l'evoluzione sono probabilmente gli stessi di quelli che oggicausano le malattie”

Genome-wide deserts for copy number variation in vertebrates (articolo pubblicato)