developing cas products for substructure searching...

41
Developing CAS Products for Substructure Searching by Chemists Linda Toler ®

Upload: dinhthien

Post on 06-Feb-2018

226 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Developing CAS Products for Substructure Searching by Chemists

Linda Toler

®

Page 2: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 20012

Developing CAS Products for Substructure Searching

� Evolution of the CAS Registry� Development of substructure searching

for CAS products� Future challenges

Page 3: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 20013

Evolution of the CAS Registry

Page 4: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 20014

Evolution of the CAS Registry

Names and MFs

Fragment Codes and

Linear Notations

Page 5: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 20015

Building the CAS Registry

Late 1950’s

Dyson linear notation

Register Number

Page 6: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 20016

1959 1965

Registry I Morgan

Connection Table

Dyson linear notation

Building the CAS Registry

Page 7: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 20017

Sample of a CAS Connection Table

CAS Registry Number: 125417-03-0

Rank # : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Atom : C C C C C C C C C O C C C C CBond to: 8 8 1 4 4 4 5 5 6 6 7 7 11 13 10 12 8 14 9 15Bond is: -- -- -- -* -* -* -* =* -* -* =* -* -* -* RC -* RC -* RC -*Mol Form: C14 H20 O

ooo

Page 8: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 20018

1959 1965

Dyson linear notation

- Covered organic molecules- CAS Registry Number- “Normalized” ring bonds

Registry I Morgan

Connection Table

Building the CAS Registry

®

Page 9: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 20019

1959 1965

Dyson linear notation

1968

Registry II

- Covered all compound classes- Standardized stereo descriptors- “Normalized” tautomer bonds

Registry I Morgan

Connection Table

Building the CAS Registry

Page 10: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200110

1959 1965

Dyson linear notation

1968

Registry II

1973

Registry III

Streamlined internal handling and display of connection table information

Registry I Morgan

Connection Table

Building the CAS Registry

Page 11: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200111

CAS Registry Handles New Chemistry

� Superconducting substances of the 1980s� registered ranges of element compositions � registered non-stochiometric compositions

RN 301237-56-9 REGISTRYCN Barium calcium mercury rhenium oxide (Ba4Ca1.5Hg2Re0.5O8.5)

(9CI) (CA INDEX NAME)MF Ba . Ca . Hg . O . ReAF Ba4 Ca1.5 Hg2 O8.5 Re0.5

Component | Ratio | Component | | Registry Number

==============+====================+===================O | 8.5 | 17778-80-2Ca | 1.5 | 7440-70-2Ba | 4 | 7440-39-3Re | 0.5 | 7440-15-5Hg | 2 | 7439-97-6

Page 12: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200112

CAS Registry Handles New Chemistry

� “Textual” descriptions of stereochemistry became outdated and difficult to use � Upgraded connection tables to include stereo

parityRN 350021-96-4 REGISTRYCN Benzeneethanol, .beta.-[(2-furanylmethyl)[(1R)-1-methyl-2-

propynyl]amino]-, (.beta.R)- (9CI) (CA INDEX NAME)

Absolute stereochemistry.

Page 13: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200113

CAS Registry Handles New Chemistry

� Emphasis on biomolecules mushroomed in the 1990’s� Macromolecules represented via one letter

codes for their basic building blocks

RN 349518-49-6 REGISTRYCN G protein-coupled receptor 35 (mouse fragment) (9CI)

(CA INDEX NAME)FS PROTEIN SEQUENCESQL 93

SEQ 1 AHMVWANLAV FVICFLPLHV VLTVQVSLNL NTCAARDTFS RALSITGKLS

51 DTNCCLDAIC YYYMAREFQE ASKPATSSNT PHKSQDSQIL SLT

Page 14: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200114

CAS Registry Today

World’s Largest, Most Diverse Substance Collection

Polymers3%

Alloys2%

Coordination compounds

5%

"Inorganics"2%

Sequences 43%"Organic"

45%

>32,400,000 records

Page 15: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200115

CAS Registry Today

Know for the Quality and Integrity of Its Structural Information

<Pictures of CAS Registry staff>

Page 16: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200116

� CAS Registry Numbers are used throughout the world to identify substances

– Databases– Handbooks– Government regulatory agencies– Consumer products

CAS Registry Today

An International Resource for Substance Identification

Page 17: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200117

CAS Registry Today

Largest publicly available structure searchable compound collection

Connection Tables>18.3M

Biosequences>13.9M

Page 18: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200118

Substructure Searching in the CAS Registry

Page 19: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200119

“Substructure” Searching Can Mean Different Things

� Compounds with structures� answers contain specified structural

characteristics� Alloys and non-stochiometric inorganics

� answers are compounds of various composition ranges

� Protein and nucleic acid sequences� answers are sequences containing the same

string of building-block residues

Page 20: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200120

“Substructure” Searching Can Mean Different Things

� Compounds with structures� answers contain specified structural

characteristics

Page 21: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200121

� Query Input� Easy-to-use query input mechanism� Flexible query definition options

-C; -CH3; R=Me, Et, or n-Pr; Ak� Retrieval

� Quick, comprehensive retrieval of all matching compounds

� Tools for dealing with registration “idiosyncracies”

What Should a Good Substructure Search System Offer?

Page 22: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200122

Query Input Methods Evolved Over Time

� Screens � “Commands”� Drawing

Page 23: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200123

Query Input via Screens

=> SCREEN 1867 AND 42 1199 AND 745 AND 1033 AND 1139 AND 1142 AND 1707 AND 1831

1867 TR DDDDDD42 AS C-C*C*C-C-O

1199 AA C -1C -1O -2O745 BS A *2A *1A *1A *1A *1A *1A

Page 24: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200124

Query Input via Commands

=> STR:GRAPH R6, 1 C1, 3 C1, 4 C3, 5 C1, 6 C1, 9 C1:NODE 8 OH, 7 14 10 O, 11 12 13 AK :BOND ALL S, 1-7 2-3 9-14 D:RSP I, CON 11 12 13 E1, DIS

=> STR:GRAPH R6, 1 C1, 3 C1, 4 C3, 5 C1, 6 C1, 9 C1:NODE 8 OH, 7 14 10 O, 11 12 13 AK :BOND ALL S, 1-7 2-3 9-14 D:RSP I, CON 11 12 13 E1, DIS

CC

C

CC

C

2O 1

10

OH3 8

C O Ak4

14

11Ak

5

12

Ak6

13

O

9

7

Page 25: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200125

Query Input Offline via Drawing

Page 26: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200126

� Query Input� Easy-to-use query input mechanism� Flexible query definition options

-C; -CH3; R=-CH3, Et, or n-Pr; Ak� Retrieval

� Quick, comprehensive retrieval of all matching compounds

� Tools for dealing with registration “idiosyncracies”

What Should a Good Substructure Search System Offer?

Page 27: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200127

The CAS Substructure Search System Evolved Over Time

1980

Substructure searching via

screens

1960’s

Development of Prototype SSS system

1981

Substructure searching via

structure diagrams

Page 28: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200128

Substructure Searching Is a Two-step Process

“Iterative” Search

Structure Compilation

Screen Generation

Screen Search

Candidate Answers

Page 29: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200129

The CAS Substructure Search System Evolved Over Time

1960’s 1980

Substructure searching via

screens

Development of Prototype SSS system

1981

Substructure searching via

structure diagrams

1990

Substructure searching

extended to Markush

Markush specification:

R= unsaturated alkyl of 1-4 carbon atoms

R = methyl, ethyl, or n-propyl

Query structure:

C??

Ak ??

Page 30: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200130

� Query Input� Easy-to-use query input mechanism� Flexible query definition options

-C; -CH3; R=-CH3, Et, or n-Pr; Ak� Retrieval

� Quick, comprehensive retrieval of all matching compounds

� Tools for dealing with registration “idiosyncracies”

What Should a Good Substructure Search System Offer?

Page 31: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200131

Structural Representations Present Search Challenges

� Salts

Page 32: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200132

Structural Representations Present Search Challenges

� Enol-keto tautomers

Page 33: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200133

Structural Representations Present Search Challenges

� Pyrazoles

Page 34: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200134

CAS Designed Search Tools/Systems to Address Structuring Conventions

� Query input tools to allow for all relevant structural characteristics

� Algorithms that handle structural representation conventions

Page 35: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200135

STN Searchers Have a Variety of Tools To Allow for Different Structural Representations

Answers:

Unspecified bonds

Connectivity

®

Page 36: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200136

SciFinder Search Algorithms Handle Many Structure Conventions

®

Page 37: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200137

� Query Input� Easy-to-use query input mechanism� Flexible query definition options

-C; -CH3; R=-CH3, Et, or n-Pr; Ak� Retrieval

� Quick, comprehensive retrieval of all matching compounds

� Tools for dealing with registration “idiosyncracies”

What Should a Good Substructure Search System Offer?

Page 38: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200138

What Does the Future Hold?

Page 39: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200139

� Chemical substance retrieval� Similar structures � “Shape” matching

� Converting information into knowledge� Tools for discovering “structural

relationships”� Tools for mining the diversity in the CAS

Registry for relevant substances

Customers Want More!

Page 40: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200140

"To remain relevant to the work of scientists in the twenty-first century…., CAS information technology must keep pace with the evolution of the chemical sciences, including related biological sciences, and remain adaptive enough to accommodate the unexpected and exciting developments that undoubtedly lie ahead."

From: “Chemical Abstracts Service Information System", Encyclopedia of Computational Chemistry, John Wiley & Sons, November 1998.

What Does the Future Hold?

Page 41: Developing CAS Products for Substructure Searching …acscinf.org/docs/meetings/222nm/presentations/222nm06.pdf · Developing CAS Products for Substructure Searching by Chemists

Kurt Loening Symposium, August 200141

Acknowledgements

Weisgerber, D. W. (1977) Chemical Abstracts Chemical Registry System: History, Scope, and Impacts. Journal of the American Society for Information Science 48(4) p. 349-360

Fisanick, W., Amaral, N.J., Metanomski, W.V., Shively, E.R., Soukup, K.M., Stobaugh, R.E. (1998) Chemical Abstracts Information System. Encyclopedia of Computational Chemistry p. 277-315