one man's 1 is another man's 13? trouble with nomenclatures in personalized medicine

Trouble with nomenclatures in personalized medicine

Asst.-Prof. Mag. Dr. Matthias SamwaldCeMSIIS, Medical University of Vienna

SUMMER SCHOOL: GENOMIC MEDICINE – Bridging research and the clinic, May 6 2016, Portoroz, Slovenia

One man's *1 is another man's *13?

Funded by Austrian Science Fund (FWF): [P 25608-N15]

This project has received funding from the European Union’s Horizon 2020 research and Innovation programme under grant agreement No 668353 (KB and MS).

What‘s the problem?

We simulated the accuracy of various targeted, low-cost assays suitable for pre-emptive testing compared to next-gen sequencing

Venn diagram displaying the numbers and overlaps of polymorphisms covered by constrained views derived from four pharmacogenomic assays. DMET: derived from the Affymetrix DMET™ Plus assay, VERA: Illumina VeraCode® ADME Core Panel, TAQM: TaqMan® OpenArray® PGx Panel, FLOR: University of Florida and Stanford Custom Array.


Fraction of tested genes resulting in aberrations in haplotype calling with restricted assay compared to next-gen sequencing. Based on full genome sequences of 2504 persons. Manuscript currently under review at ‘Pharmacogenomics’.

Where to go from here?

Allele Registry project

From the lab: experimental mnemonic nomenclature

• Idea: Experiment with human-friendly nomenclatureo No human committeeo Less cryptic alphanumeric descriptors

From the lab: experimental mnemonic nomenclature

• Synthetic pseudo-words can encode a lot of information

• CVCVCV pattern examples (C = consonant, V = vowel):o binoruo nivudio pekuvoo jutoxuo hacifio dejula

• CVCVCV tuple (Y as vowel) can denote: 20 * 6 * 20 * 6 * 20 * 6 = 1 728 000 variants

Algorithm (no human curation / committee)

• Take large dataset containing variant data of our usual (1000 Genomes, 100.000 Genomes, 1M genomes…) as reference

• Create list of genome loci and variants observed there (some loci might have more than 2 possible variants)

• For each gene:o For each locus:

Sort observed variants based on their frequencies define most frequently observed variant as ‘wild type’;

remove these variants from the table we use for constructing the mnemonics (they are considered to be the default)

o Sort loci based on the frequency of the most frequent non-wild-type variant of each locus

o Assign mnemonics to each variant systematically, starting with shorter mnemonic strings (i.e., 2-character tuple)

Example mnemonic code sequences

VKORC1: cy-do-du | be-do-duCYP2D6: nai / nai-pek CYP2D6: nai / be-wi / nai-pek (copy number variation)TMPT: be-fu-fy | ba-bi-fi-tek

Mnemonic code + reference to variants/regions covered by assay = automatically decompress to full sequence / genotype result

Sets auf co-occuring SNP variants could automatically be assigned identifier of their own and combined with individual SNP variant identifiers

Currently creating humble proof-of-concept based on 1000 Genomes data

Local team (Medical University of Vienna) Asst.-Prof. Mag. Dr. Matthias Samwald (PI)Dr. Kathrin Blagec Mag. Sebastian HoferHong Xu, BScWolfgang Kuch

Webhttp://samwald.info/http://safety-code.org/http://upgx.eu

Thanks!

http://safety-code.org/

http://safety-code.org/

http://upgx.eu/

• Reference: Matthias Samwald, Kathrin Blagec, Sebastian Hofer and Robert R. Freimuth. “Analysing the potential for incorrect haplotype calls with different pharmacogenomic assays in different populations: a simulation based on 1000 Genomes data.” Pharmacogenomics, September 30, 2015. doi:10.2217/pgs.15.108

• Code Availability: The curated resources and the IPython notebooks available at https://gitlab.com/medication-safety/ms-ipython

Further info

http://dx.doi.org/10.2217/pgs.15.108

https://gitlab.com/medication-safety/ms-ipython

https://gitlab.com/medication-safety/ms-ipython

one man's *1 is another man's *13? trouble with nomenclatures in personalized medicine

Technology

one man's 1 is another man's 13? trouble with nomenclatures in personalized medicine