one man's *1 is another man's *13? trouble with nomenclatures in personalized medicine
TRANSCRIPT
Trouble with nomenclatures in personalized medicine
Asst.-Prof. Mag. Dr. Matthias SamwaldCeMSIIS, Medical University of Vienna
SUMMER SCHOOL: GENOMIC MEDICINE – Bridging research and the clinic, May 6 2016, Portoroz, Slovenia
One man's *1 is another man's *13?
Funded by Austrian Science Fund (FWF): [P 25608-N15]
This project has received funding from the European Union’s Horizon 2020 research and Innovation programme under grant agreement No 668353 (KB and MS).
What‘s the problem?
We simulated the accuracy of various targeted, low-cost assays suitable for pre-emptive testing compared to next-gen sequencing
Venn diagram displaying the numbers and overlaps of polymorphisms covered by constrained views derived from four pharmacogenomic assays. DMET: derived from the Affymetrix DMET™ Plus assay, VERA: Illumina VeraCode® ADME Core Panel, TAQM: TaqMan® OpenArray® PGx Panel, FLOR: University of Florida and Stanford Custom Array.
We simulated the accuracy of various targeted, low-cost assays suitable for pre-emptive testing compared to next-gen sequencing
We simulated the accuracy of various targeted, low-cost assays suitable for pre-emptive testing compared to next-gen sequencing
We simulated the accuracy of various targeted, low-cost assays suitable for pre-emptive testing compared to next-gen sequencing
Fraction of tested genes resulting in aberrations in haplotype calling with restricted assay compared to next-gen sequencing. Based on full genome sequences of 2504 persons. Manuscript currently under review at ‘Pharmacogenomics’.
We simulated the accuracy of various targeted, low-cost assays suitable for pre-emptive testing compared to next-gen sequencing
Fraction of tested genes resulting in aberrations in haplotype calling with restricted assay compared to next-gen sequencing. Based on full genome sequences of 2504 persons. Manuscript currently under review at ‘Pharmacogenomics’.
Where to go from here?
Allele Registry project
From the lab: experimental mnemonic nomenclature
• Idea: Experiment with human-friendly nomenclatureo No human committeeo Less cryptic alphanumeric descriptors
From the lab: experimental mnemonic nomenclature
• Synthetic pseudo-words can encode a lot of information
• CVCVCV pattern examples (C = consonant, V = vowel):o binoruo nivudio pekuvoo jutoxuo hacifio dejula
• CVCVCV tuple (Y as vowel) can denote: 20 * 6 * 20 * 6 * 20 * 6 = 1 728 000 variants
Algorithm (no human curation / committee)
• Take large dataset containing variant data of our usual (1000 Genomes, 100.000 Genomes, 1M genomes…) as reference
• Create list of genome loci and variants observed there (some loci might have more than 2 possible variants)
• For each gene:o For each locus:
Sort observed variants based on their frequencies define most frequently observed variant as ‘wild type’;
remove these variants from the table we use for constructing the mnemonics (they are considered to be the default)
o Sort loci based on the frequency of the most frequent non-wild-type variant of each locus
o Assign mnemonics to each variant systematically, starting with shorter mnemonic strings (i.e., 2-character tuple)
Algorithm (no human curation / committee)
• Take large dataset containing variant data of our usual (1000 Genomes, 100.000 Genomes, 1M genomes…) as reference
• Create list of genome loci and variants observed there (some loci might have more than 2 possible variants)
• For each gene:o For each locus:
Sort observed variants based on their frequencies define most frequently observed variant as ‘wild type’;
remove these variants from the table we use for constructing the mnemonics (they are considered to be the default)
o Sort loci based on the frequency of the most frequent non-wild-type variant of each locus
o Assign mnemonics to each variant systematically, starting with shorter mnemonic strings (i.e., 2-character tuple)
Example mnemonic code sequences
VKORC1: cy-do-du | be-do-duCYP2D6: nai / nai-pek CYP2D6: nai / be-wi / nai-pek (copy number variation)TMPT: be-fu-fy | ba-bi-fi-tek
Mnemonic code + reference to variants/regions covered by assay = automatically decompress to full sequence / genotype result
Sets auf co-occuring SNP variants could automatically be assigned identifier of their own and combined with individual SNP variant identifiers
Currently creating humble proof-of-concept based on 1000 Genomes data
Local team (Medical University of Vienna) Asst.-Prof. Mag. Dr. Matthias Samwald (PI)Dr. Kathrin Blagec Mag. Sebastian HoferHong Xu, BScWolfgang Kuch
Webhttp://samwald.info/http://safety-code.org/http://upgx.eu
Thanks!
• Reference: Matthias Samwald, Kathrin Blagec, Sebastian Hofer and Robert R. Freimuth. “Analysing the potential for incorrect haplotype calls with different pharmacogenomic assays in different populations: a simulation based on 1000 Genomes data.” Pharmacogenomics, September 30, 2015. doi:10.2217/pgs.15.108
• Code Availability: The curated resources and the IPython notebooks available at https://gitlab.com/medication-safety/ms-ipython
Further info