which drug did you mean ?

22
[1] Which Drug Did You Mean? Resolving the linkage spaghetti between semantic names, structures, bioactivity and mixtures Christopher Southan ChrisDS Consulting, Göteborg, Sweden, Prepared for BioIT, Boston, April 2012, Track 14, Tuesday See also http :// cdsouthan.blogspot.se/201 2/06/will-real-bosinhib-p lease-stand-up-take.html

Upload: chris-southan

Post on 11-May-2015

1.049 views

Category:

Technology


1 download

DESCRIPTION

BioIT workshop 2012

TRANSCRIPT

Page 1: Which Drug Did You Mean ?

[1]

Which Drug Did You Mean?Resolving the linkage spaghetti between semantic names, structures, bioactivity and mixtures

Christopher Southan

ChrisDS Consulting, Göteborg, Sweden,

Prepared for BioIT, Boston, April 2012, Track 14, Tuesday

See also

http://cdsouthan.blogspot.se/2012/06/will-real-bosinhib-please-stand-up-take.html

Page 2: Which Drug Did You Mean ?

[2]

History of Drug Names

Approximate timelines

[cpd registration system structure and ID------------------------------------------------------------]

[patent IUPAC or image--------------------------------------------------------------------]

[internal code name(s) externally blinded-------]

[code name(s) > structure declared externally -----]

[journal papers -----------------------------------------------------------------------]

[International Non-proprietary name INN]

[INN indexed in MeSH-----------------]

[USAN, BAN, JAN --------------------]

[brand name(s)-------------------]

[combination brand ]

Page 3: Which Drug Did You Mean ?

[3]

History of Atorvastatin

• 1985: (3R,5R)-7-[2-(4-fluorophenyl)-3-phenyl-4-(phenylcarbamoyl)-5-(propan-2-yl)-1H-pyrrol-1-yl]-3,5-dihydroxyheptanoic acid IUPAC

• ~ 1987: Park-Davis internal code number CI-981• ~ 1995: Atorvastatin [INN:BAN] Atorvastatin calcium [USAN], Atorvastatin calcium

trihydrate INN (error ?) Atorvastatina (Spain) • 1997 Lipitor (brand name) Faboxim (Argentina) Zurinel (Chile) etc • 2004: Caduet (brand name) Norvasc (amlodipine besylate) and Lipitor(atorvastatin

calcium)• 2012: atorvastatin calcium – generic - Ranbaxy• 2012: amlodipine besylate and atorvastatin calcium – generic - Ranbaxy

Page 4: Which Drug Did You Mean ?

[4]

• Tautomer/stereo mutiplexing and structure interconversion differences (e.g. complex antibiotics)

• Popular structures > 100s of submitters > many vendors > more noise

• Opaque ecosystem of primary submitters, secondary linkers, declared circularity, cryptic circularity, and submitters having independent portals with different rules

• Older drugs accumulate 100’s of synonyms and database x-refs, with erros

• Accumulated wet assay results are dependent on how long the drug has been in which public screening collection

• Deprecated structures not always refreshed between databases globally

• Pro-drugs, metabolites or tested combinations rarely have explicit x-refs

Causes of Drug Linkage Spaghetti (I)

Page 5: Which Drug Did You Mean ?

[5]

• Literature extractions flowing into drug databases (including MeSH) can have– Author errors and paucity of standards in the primary report– No quality filtration at the result level– Curation errors and different annotation rules– No discrimination of independent de-novo checking from annotation recycling

• Large-scale patent extraction feeds into databases bring in– Forests of analogues with no data links– High redundency for drugs and leads – Structural differences between pipeline outputs– Opportunistic permutations of salts and mixtures– Opportunistic virtual deuteration of all best-selling drugs

• Drug discovery operations use many drugs as reference compounds in their internal screening collections . This means– Name > structure cross-mapping, internal, public and commercial – Integration of internal and external data across the same drugs

Causes of Drug Linkage Spaghetti (II)

Page 6: Which Drug Did You Mean ?

[6]

Atorvastatin • The scale of links provides a good cross section of problems

• Relationship cross-mappings and the PubChem tool-box facilitate navigation through the links

• External submissons get a substance ID (SID) which are merged to compound records (CID) vi chemistry rules (see PubChem documentation)

• This drug has accumulated years of submissions from different sources, BioAssay entries and pharmacology literature links

• The parent CID 60823 has– 99 synonyms– 6 stero forms– 70 cannonicaly-related structures– 449 substance records

Page 7: Which Drug Did You Mean ?

[7]

What is Atorvastatin ? - for Patients

Page 8: Which Drug Did You Mean ?

[8]

Atorvastatin - for Informaticians

PubChem CID 60823

Wikepedia

ChemSpider 54810

DrugBank APRD00055

CHEMBL1487

CAS 134523-00-5

PubChem submissions include: (3R,5R) CID 60823(5R) CID 51052072(3R) CID 21029434(3S,5R) CID 6093359(3S,5S) CID 62976No stereo CID 2250

Query: Same, Isotopes for PubChem Compound (Select 60823)

Page 9: Which Drug Did You Mean ?

[9]

Name Retrieval Specificity (I)

Page 10: Which Drug Did You Mean ?

[10]

Name Retrieval Specificity (II)

”atorvastin” in DailyMed link not synonyms

Page 11: Which Drug Did You Mean ?

[11]

Drug BioAssay Data: Splitting by Submitted Structure Differences

AIDs 406848-53 in ChEMBL – (antimalarial assay specified salt)

Mainly uHTS and counterscreens from Scripps & Burnham

ChEMBL Antimalarial strain assays (also specified salt), in vivo plus three target links

Mainly qHTS from NCGC, no hits

Page 12: Which Drug Did You Mean ?

[12]

Pharmacological Activity in vivo is ~70% Active Metabolites i.e. not Atorvastatin

CID 9851106

CID 9808225

CID 60823

Hazardous Substances Data Bank x-ref in the CID, but no direct links to the metabolites (yet). Only one in-vitro assay result for 9808225

Page 13: Which Drug Did You Mean ?

[13]

Salt Confusion (I) Atorvastatin Calcium

CID 60822 Mw 1155CAS 134523-03-8

CID 656846 Mw 1209CAS 344423-98-9

CID 11227182 Mw 598

INN = atorvastatinUSAN/BAN = atorvastatin calcium

FDA packege insert lable, hemicalcium trihydrate

Page 14: Which Drug Did You Mean ?

[14]

Salt Confusion (II): What gets to Patients

CID 53252956

CID 656846

CID 23665101

No INNs, USANs or clinical trials entries for these salts

Page 15: Which Drug Did You Mean ?

[15]

Mixtures: Problematic all Round• Atorvastatin parent (CID 60823) has 379 mixture SIDs and 147 mixture CIDs

permuatated from 122 component CIDs • Of the 122 components 58 have a MeSH pharmacology tag, 92 have

BioAssays results, 70 are in DrugBank, 101 are in ChEMBL, and 47 are below 200 mw (and thus probably salts not drugs)

• Of the 147 mixture CIDs, only the 2 atorvastatin dimers have assay results or pharmacology so none of the drug mixtures have direct data links

• None are in DrugBank CIDs and only atorvastin calcium is in ChEMBL• 138 of the 147 have been extracted from patents by Derwent/Thomson and are

unlikely to get data links• The small number of important drug combinations that do have data and/or

trial results are difficult to identify• Tested drug mixtures rarely get public code names, some get trade names but

never INNs• Chemistry rules may split mixtures and synonyms in databases• PubMed "Drug Combinations"[MeSH Term] = 54,186 but no SID or CID links• Mixture components can be designated with space, / , + or ”co”

Page 16: Which Drug Did You Mean ?

[16]

The Famous Polypill: A Fuzzy term

CID 44602839 Thomson Pharma 18 clinicaltrials.gov entries, but only partial component links

aspirin 81 mg, enalapril 2.5 mg, atorvastatin 20 mg and hydrochlorothiazide 12.5 mg (polypill) PMID: 21647425: Australian New Zealand Clinical Trials Registry ACTRN12607000099426

DrugBank and TTD negative

Page 17: Which Drug Did You Mean ?

[17]

Caduet: an Approved Combination

http://clinicaltrials.gov/ct2/show/NCT01107743

Drugbank Wikipedia

Page 18: Which Drug Did You Mean ?

[18]

Submitter Synonym Noise in PubChem

Page 19: Which Drug Did You Mean ?

[19]

A more Recent Combination

But, QA149 is negative in PubChem, DrugBank and TTD

Page 20: Which Drug Did You Mean ?

[20]

Spaghetti is Resolvable but Errors are Tough:Will the Real LX4211 Please Stand up ?

http://cenblog.org/the-haystack/2012/03/liveblogging-first-time-disclosures-from-acssandiego/

See also: http://cdsouthan.blogspot.se/2012/03/live-chemical-structure-blogging-but.html

Page 21: Which Drug Did You Mean ?

[21]

Summary

• You can navigate the linkage spaghetti in name, synonym, structure bioactivity and mixture space, but this needs perspicacity and circumspection.

• The current drug information ecosystem with multiple stakeholders seems destined to remain ”fuzzy”

• Beyond informatics challenges the consequences, particularly from frank errors, could be more serious

• WHO INNs and naming stems play a key positive role – but ;– No open athoritative database - only 7000 PDF entries (!)– No transparent coordination between USAN, FDA, MeSH, national offices, or

clinical trials registries– Susceptable to commercial flanking tactics

• Drug combinations have a bright pharmacological future but a difficult informatics one

• The fuzz includes scientific challenges (e.g. complex strucutures, dynamic tautomerism, active metabolites, formulation differences, paucity of standardised and comparable activity data.

• Efforts are being made to improve the situation, including from the databases represented in this Workshop session.

Page 22: Which Drug Did You Mean ?

[22]

Questions WelcomeChrisDS Consulting: http://www.cdsouthan.info/Consult/CDS_cons.htm

Mobile: +46(0)702-530710, Skype: cdsouthan

Email: [email protected]

Twitter: http://twitter.com/#!/cdsouthan

Blog: http://cdsouthan.blogspot.com/

LinkedIN: http://www.linkedin.com/in/cdsouthan

Website: http://www.cdsouthan.info/CDS_prof.htm

Publications: http://www.citeulike.org/user/cdsouthan/publications/order/year

Citations: http://scholar.google.com/citations?user=y1DsHJ8AAAAJ&hl=en

Presentations: http://www.slideshare.net/cdsouthan

FYI : A short piece on identifying the names and molecular details of drugs in clinicaltrials.gov

http://www.samedanltd.com/magazine/13/issue/166/article/3152