human genome variants analysis -...
TRANSCRIPT
Human GenomeVariants Analysis
Marin Vargas, Sergio Paul
Gennaio 2014
Why make predictive diagnosis of a genetic disease using the whole genome?
o Genome sequencing is the only wayto get all genetic information.
o The cost of whole genomesequencing is drastically decreased.
o The Next Generation Sequencing(NGS) technology has beenremarkably successful in finding thecauses of Mendelian and rarediseases.
o With personalized medicine we canget targeted therapies.
SNV might cause a genetic disorderSickle cell anemiaSNV: Single Nucleotide Variant
o If a SNV is present in at least 1% of population � SNP.o If a SNV isn’t present in at least 1% of population � Mutation.
SNP: Single Nucleotide Polimorfism
From Genome Variants to patient’s genetic disease diagnosis
DNA
ExtractionDNA Sequencing
(Genome or Exome)
Genome reference
Variants Calling
(Bioinformatics Process)
Variants Analysis
(Bioinformatics Process)
DNA extraction
o The “Centro Genomica Funzionale” (CGF) of Prof. Delledonne has available a sequencer “Illumina Hiseq 1000” with Next Generation Sequencing (NGS) technology.
DNA sequencingNext Generation
Sequencing (NGS)
(reads)
o To be reliable enough to sequence the entire genome high coverage is required.
o Human Genome (~ 3 Gb)needs more than 90 Gb.
o Human Exome (~ 135 Mb) needs more than 4 Gb.
Load into sequencer
DNA sequencing resulto The result of NGS is a file in FASTQ format containing all paired-end reads
(length every single read ~ 100) of the Exome/Genome sequenced.
(Sequence)
Variant Calling
o Raw NGS reads quality filtering.
o Alignment to reference genome (BWA).o Alignment to reference genome (BWA).
o Variants calling (GATK).
Example VCF file:
Aligment to reference genome by BWA
Burrows-WheelerAligner
Variant Calling by GATK
Variant Analysis
• Genome ~ 5,000,000 variants• Exome ~ 200,000 variants
The variant analysis is based on different filtering levels:
o Allele frequency (Common Variants).o Exonic and splice regions and variants are not synonymous (No obvious effect on
protein).o Probability that the mutation is harmful and that the conservation sites are probably
very important for the protein functionality (Biological interpretation).o Homozygous and heterozygous mutations are different.o The expected result are a few biologically relevant variants.
• Exome ~ 200,000 variants
Variant Analysis Software
o Free academic software:
Functional annotation of genetic variants from high-throughput sequencing data
Human Genome Interpreter
o Paid software:
Variant Analysis - Case of study 1o We analyzed the exome of a 20-year-old man with Alagille syndrome and Acute
Lymphoblastic Leukemia.
o More than 280,000 variants were analyzed using KNOME software.
o We found putative damaging mutations in genes such as PAX5 (R38H) andNOTCH1 (K1821N) which might be strongly related to the observed disease.
o The publication about this work is ongoing.
Variant Analysis - Case of study 2We are analyzed two families:
� The first family had a phenotype of a neuromusculardisease together an inflammatory disease.
� Their exomes have been analyzed using ANNOVAR.� We found a mutation in the son’s gene NEB (I6512T)
correlated with the Nemaline myopathy disease.
� The second family had Palmoplantar keratoderma(PK) disease and a Charcot marie tooth type 2(CMT) disease.
� Their exomes have been analyzed using ANNOVAR.� We found a mutation in the gene KRT1 (F200L) for
these ones with PK disease and found a mutation inthe gene MPZ (S44F) for these ones with CMTdisease, no mutations were found in these genes forhealthy patient.
www.molecularlab.it di Ott-2013
Alla ricerca dei geni responsabili delle malattie
IL PROGETTO DOSE STUDIA LE DIFFERENZE TRA LE QUANTITÀ MINORE O MAGGIORE DEI GENI E LE CONSEGUENZE CHE QUESTO HA PER LA SALUTE
L'era informatica che stiamo vivendo ha fornito gli strumenti affinché la biologia possa produrre un numero enorme disequenze di DNA di molte specie diverse. La tecnologia moderna ha reso il sequenziamento del DNA più semplice, menocostoso e più affidabile, con enormi benefici per la diagnosi e la cura delle malattie.
In passato la difficoltà stava nel raccogliere dati genetici, oggi la sfida è dare loro un senso.
"Stiamo assumendo un approccio evolutivo per dare un senso alle sequenze di DNA, ciò significa che esaminiamo il modo incui i geni si sono evoluti, per capire meglio come funzionano”.cui i geni si sono evoluti, per capire meglio come funzionano”.
…
Con il suo progetto DOSE ("Dosage sensitive genes in evolution and disease"), finanziato dall'UE, McLysaght sta studiandole differenze tra le dosi dei geni, ovvero, possedere una quantità minore o maggiore del gene, e le conseguenze che questoha per la salute."Le variazioni delle quantità di un gene tra individui - variazioni di dose - sono una scoperta relativamente recente e a voltesono implicate nella malattia", spiega McLysaght. In termini semplici, la dottoressa ha adottato un approccio evolutivo perscoprire quali variazioni di dose sono accettabili e quali sono probabilmente coinvolti nelle malattie umane.
…
“Osservando l'evoluzione possiamo capire le variazioni accettabili e inaccettabili del DNA", spiega McLysaght. "Icambiamenti del DNA che si sono rivelati inaccettabili durante l'evoluzione sono probabilmente gli stessi di quelli che oggicausano le malattie”
Genome-wide deserts for copy number variation in vertebrates (articolo pubblicato)