developing cas products for substructure searching...

Post on 06-Feb-2018

227 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Developing CAS Products for Substructure Searching by Chemists

Linda Toler

®

Kurt Loening Symposium, August 20012

Developing CAS Products for Substructure Searching

� Evolution of the CAS Registry� Development of substructure searching

for CAS products� Future challenges

Kurt Loening Symposium, August 20013

Evolution of the CAS Registry

Kurt Loening Symposium, August 20014

Evolution of the CAS Registry

Names and MFs

Fragment Codes and

Linear Notations

Kurt Loening Symposium, August 20015

Building the CAS Registry

Late 1950’s

Dyson linear notation

Register Number

Kurt Loening Symposium, August 20016

1959 1965

Registry I Morgan

Connection Table

Dyson linear notation

Building the CAS Registry

Kurt Loening Symposium, August 20017

Sample of a CAS Connection Table

CAS Registry Number: 125417-03-0

Rank # : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15Atom : C C C C C C C C C O C C C C CBond to: 8 8 1 4 4 4 5 5 6 6 7 7 11 13 10 12 8 14 9 15Bond is: -- -- -- -* -* -* -* =* -* -* =* -* -* -* RC -* RC -* RC -*Mol Form: C14 H20 O

ooo

Kurt Loening Symposium, August 20018

1959 1965

Dyson linear notation

- Covered organic molecules- CAS Registry Number- “Normalized” ring bonds

Registry I Morgan

Connection Table

Building the CAS Registry

®

Kurt Loening Symposium, August 20019

1959 1965

Dyson linear notation

1968

Registry II

- Covered all compound classes- Standardized stereo descriptors- “Normalized” tautomer bonds

Registry I Morgan

Connection Table

Building the CAS Registry

Kurt Loening Symposium, August 200110

1959 1965

Dyson linear notation

1968

Registry II

1973

Registry III

Streamlined internal handling and display of connection table information

Registry I Morgan

Connection Table

Building the CAS Registry

Kurt Loening Symposium, August 200111

CAS Registry Handles New Chemistry

� Superconducting substances of the 1980s� registered ranges of element compositions � registered non-stochiometric compositions

RN 301237-56-9 REGISTRYCN Barium calcium mercury rhenium oxide (Ba4Ca1.5Hg2Re0.5O8.5)

(9CI) (CA INDEX NAME)MF Ba . Ca . Hg . O . ReAF Ba4 Ca1.5 Hg2 O8.5 Re0.5

Component | Ratio | Component | | Registry Number

==============+====================+===================O | 8.5 | 17778-80-2Ca | 1.5 | 7440-70-2Ba | 4 | 7440-39-3Re | 0.5 | 7440-15-5Hg | 2 | 7439-97-6

Kurt Loening Symposium, August 200112

CAS Registry Handles New Chemistry

� “Textual” descriptions of stereochemistry became outdated and difficult to use � Upgraded connection tables to include stereo

parityRN 350021-96-4 REGISTRYCN Benzeneethanol, .beta.-[(2-furanylmethyl)[(1R)-1-methyl-2-

propynyl]amino]-, (.beta.R)- (9CI) (CA INDEX NAME)

Absolute stereochemistry.

Kurt Loening Symposium, August 200113

CAS Registry Handles New Chemistry

� Emphasis on biomolecules mushroomed in the 1990’s� Macromolecules represented via one letter

codes for their basic building blocks

RN 349518-49-6 REGISTRYCN G protein-coupled receptor 35 (mouse fragment) (9CI)

(CA INDEX NAME)FS PROTEIN SEQUENCESQL 93

SEQ 1 AHMVWANLAV FVICFLPLHV VLTVQVSLNL NTCAARDTFS RALSITGKLS

51 DTNCCLDAIC YYYMAREFQE ASKPATSSNT PHKSQDSQIL SLT

Kurt Loening Symposium, August 200114

CAS Registry Today

World’s Largest, Most Diverse Substance Collection

Polymers3%

Alloys2%

Coordination compounds

5%

"Inorganics"2%

Sequences 43%"Organic"

45%

>32,400,000 records

Kurt Loening Symposium, August 200115

CAS Registry Today

Know for the Quality and Integrity of Its Structural Information

<Pictures of CAS Registry staff>

Kurt Loening Symposium, August 200116

� CAS Registry Numbers are used throughout the world to identify substances

– Databases– Handbooks– Government regulatory agencies– Consumer products

CAS Registry Today

An International Resource for Substance Identification

Kurt Loening Symposium, August 200117

CAS Registry Today

Largest publicly available structure searchable compound collection

Connection Tables>18.3M

Biosequences>13.9M

Kurt Loening Symposium, August 200118

Substructure Searching in the CAS Registry

Kurt Loening Symposium, August 200119

“Substructure” Searching Can Mean Different Things

� Compounds with structures� answers contain specified structural

characteristics� Alloys and non-stochiometric inorganics

� answers are compounds of various composition ranges

� Protein and nucleic acid sequences� answers are sequences containing the same

string of building-block residues

Kurt Loening Symposium, August 200120

“Substructure” Searching Can Mean Different Things

� Compounds with structures� answers contain specified structural

characteristics

Kurt Loening Symposium, August 200121

� Query Input� Easy-to-use query input mechanism� Flexible query definition options

-C; -CH3; R=Me, Et, or n-Pr; Ak� Retrieval

� Quick, comprehensive retrieval of all matching compounds

� Tools for dealing with registration “idiosyncracies”

What Should a Good Substructure Search System Offer?

Kurt Loening Symposium, August 200122

Query Input Methods Evolved Over Time

� Screens � “Commands”� Drawing

Kurt Loening Symposium, August 200123

Query Input via Screens

=> SCREEN 1867 AND 42 1199 AND 745 AND 1033 AND 1139 AND 1142 AND 1707 AND 1831

1867 TR DDDDDD42 AS C-C*C*C-C-O

1199 AA C -1C -1O -2O745 BS A *2A *1A *1A *1A *1A *1A

Kurt Loening Symposium, August 200124

Query Input via Commands

=> STR:GRAPH R6, 1 C1, 3 C1, 4 C3, 5 C1, 6 C1, 9 C1:NODE 8 OH, 7 14 10 O, 11 12 13 AK :BOND ALL S, 1-7 2-3 9-14 D:RSP I, CON 11 12 13 E1, DIS

=> STR:GRAPH R6, 1 C1, 3 C1, 4 C3, 5 C1, 6 C1, 9 C1:NODE 8 OH, 7 14 10 O, 11 12 13 AK :BOND ALL S, 1-7 2-3 9-14 D:RSP I, CON 11 12 13 E1, DIS

CC

C

CC

C

2O 1

10

OH3 8

C O Ak4

14

11Ak

5

12

Ak6

13

O

9

7

Kurt Loening Symposium, August 200125

Query Input Offline via Drawing

Kurt Loening Symposium, August 200126

� Query Input� Easy-to-use query input mechanism� Flexible query definition options

-C; -CH3; R=-CH3, Et, or n-Pr; Ak� Retrieval

� Quick, comprehensive retrieval of all matching compounds

� Tools for dealing with registration “idiosyncracies”

What Should a Good Substructure Search System Offer?

Kurt Loening Symposium, August 200127

The CAS Substructure Search System Evolved Over Time

1980

Substructure searching via

screens

1960’s

Development of Prototype SSS system

1981

Substructure searching via

structure diagrams

Kurt Loening Symposium, August 200128

Substructure Searching Is a Two-step Process

“Iterative” Search

Structure Compilation

Screen Generation

Screen Search

Candidate Answers

Kurt Loening Symposium, August 200129

The CAS Substructure Search System Evolved Over Time

1960’s 1980

Substructure searching via

screens

Development of Prototype SSS system

1981

Substructure searching via

structure diagrams

1990

Substructure searching

extended to Markush

Markush specification:

R= unsaturated alkyl of 1-4 carbon atoms

R = methyl, ethyl, or n-propyl

Query structure:

C??

Ak ??

Kurt Loening Symposium, August 200130

� Query Input� Easy-to-use query input mechanism� Flexible query definition options

-C; -CH3; R=-CH3, Et, or n-Pr; Ak� Retrieval

� Quick, comprehensive retrieval of all matching compounds

� Tools for dealing with registration “idiosyncracies”

What Should a Good Substructure Search System Offer?

Kurt Loening Symposium, August 200131

Structural Representations Present Search Challenges

� Salts

Kurt Loening Symposium, August 200132

Structural Representations Present Search Challenges

� Enol-keto tautomers

Kurt Loening Symposium, August 200133

Structural Representations Present Search Challenges

� Pyrazoles

Kurt Loening Symposium, August 200134

CAS Designed Search Tools/Systems to Address Structuring Conventions

� Query input tools to allow for all relevant structural characteristics

� Algorithms that handle structural representation conventions

Kurt Loening Symposium, August 200135

STN Searchers Have a Variety of Tools To Allow for Different Structural Representations

Answers:

Unspecified bonds

Connectivity

®

Kurt Loening Symposium, August 200136

SciFinder Search Algorithms Handle Many Structure Conventions

®

Kurt Loening Symposium, August 200137

� Query Input� Easy-to-use query input mechanism� Flexible query definition options

-C; -CH3; R=-CH3, Et, or n-Pr; Ak� Retrieval

� Quick, comprehensive retrieval of all matching compounds

� Tools for dealing with registration “idiosyncracies”

What Should a Good Substructure Search System Offer?

Kurt Loening Symposium, August 200138

What Does the Future Hold?

Kurt Loening Symposium, August 200139

� Chemical substance retrieval� Similar structures � “Shape” matching

� Converting information into knowledge� Tools for discovering “structural

relationships”� Tools for mining the diversity in the CAS

Registry for relevant substances

Customers Want More!

Kurt Loening Symposium, August 200140

"To remain relevant to the work of scientists in the twenty-first century…., CAS information technology must keep pace with the evolution of the chemical sciences, including related biological sciences, and remain adaptive enough to accommodate the unexpected and exciting developments that undoubtedly lie ahead."

From: “Chemical Abstracts Service Information System", Encyclopedia of Computational Chemistry, John Wiley & Sons, November 1998.

What Does the Future Hold?

Kurt Loening Symposium, August 200141

Acknowledgements

Weisgerber, D. W. (1977) Chemical Abstracts Chemical Registry System: History, Scope, and Impacts. Journal of the American Society for Information Science 48(4) p. 349-360

Fisanick, W., Amaral, N.J., Metanomski, W.V., Shively, E.R., Soukup, K.M., Stobaugh, R.E. (1998) Chemical Abstracts Information System. Encyclopedia of Computational Chemistry p. 277-315

top related