structure searching in new stn - stn international · • structure searchable databases on new stn...
TRANSCRIPT
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number® query modeling
• EXACT and FAMILY searches in CAS REGISTRYSM
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
3
Structure searchable databases on new STN
Database Coverage Time coverage
CAS REGISTRY Includes organics, inorganics, biochemicals and sequences
1800’s - present
MARPAT®
Markush structures from patents 1961 - present
Derwent Chemistry Resource - DCR
Selected structures from Sections B, C, and E of Derwent’s World Patent Index
1999 - present
Derwent Markush Resource - DWPIM
Markush structures from Sections B, C and E of Derwent World Patents Index
1961 - present
REAXYSFILESub Chemical structures from the literature 1771 to present. Includes organic, inorganic and metal-organic compounds as well as polymers and biological molecules
1771 - present
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number query modeling
• EXACT and FAMILY searches in CAS REGISTRY
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
5
1. Choose broadest applicable search
Exact (EXA) search retrieves stereoisomers, charged, and isotopically labeled structures. Exact searches of multiple fragments finds exact matches with exactly the same number of components.
Family (FAM) search retrieves salts, mixtures, and answers from an EXA search. Family searches of multiple fragments finds exact matches to the fragments, but additional components can be present.
Closed Substructure Search (CSS) retrieves substances with no substitutions at open nodes unless the query is specified for substitution and answers from FAM search.
Substructure (SSS) search retrieves analogs and derivatives of a core structure and answers from FAM search.
2. Draw appropriate bonds
Use unspecified bonds to find any type of bond: single, double, or triple. Useful in searching tautomeric structures in different databases.
Tips for broadening a structure query
6
Tips for broadening a substructure query 3. Use Variable Points of Attachment on rings
VPA can connect atoms, shortcuts, variables, and R-groups to rings.
4. Replace atoms with variables A - any atom except H Ak - at least 1 carbon in a chain Cb - carbocyclic ring systems Cy - any ring system Hy - heterocyclic ring system (at least one heteroatom) M - metal Q - heteroatom, any atom except C or H X - halogens
5. Use R-groups
R-groups can contain atoms, shortcuts, variables, and fragments.
6. Add repeating groups
Repeating groups can be used in rings or chains. Repeating groups must have nodes on either side.
7
Tips for narrowing a substructure query
1. Block substitution
Use the Lock Atoms tool to block any further substitution on an atom or generic node. You can use H-attachments, or H, except in MARPAT and DWPIM. Remember, shortcuts are already blocked!
2. Add generic definitions to variables
Draw variables with generic definitions as fragments before including in R-groups, e. g., monocyclic Hy.
3. Add element counts to variables
Draw variables with element counts as fragments before including in R-groups, e.g., Ak with 1-6 C.
3. Change default attributes
Use the Lock Ring tool to prevent retrieval of fused, bridged, and spiro rings. Right-click to change node or bond characteristics.
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number query modeling
• EXACT and FAMILY searches in CAS REGISTRY
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
11
Search feature let you find templates quickly and easily
Search string can be at beginning or embedded in the template name.
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number query modeling
• EXACT and FAMILY searches in CAS REGISTRY
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
13
Structures can be modeled directly from CAS Registry Numbers via Add to Editor tool
SMILES and InChI strings also work.
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number query modeling
• EXACT and FAMILY searches in CAS REGISTRY
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
15
EXACT and FAMILY structure search scope in CAS REGISTRY
• Use EXACT to find multicomponent substances with exact number of components when all components are drawn in the same window
• Use FAMILY to find multicomponent substances with additional components
• Both types of searches find structures as drawn, with all open positions blocked from substitution
EXACT and FAMILY are not currently working in DCR.
16
Tips for drawing EXACT and FAMILY queries
• EXACT and FAMILY structure queries can be built without drawing explicit H’s
• Only real atom nodes are allowed, no variables • No variable attachment points on rings • Unspecified bonds are allowed
18
EXACT searches find substances containing exactly the components drawn, no additional components in CAS REGISTRY
19
FAMILY searches find REGISTRY substances containing the components drawn, plus additional components if present
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number query modeling
• EXACT and FAMILY searches in CAS REGISTRY
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
21
Options to block substitution
Search Example: Compare retrieval between drawing in H atoms, using Hydrogen Count, or the Lock Atoms tool in REGISTRY, DCR, and REAXYSFILESub.
Drawn in Lock Atoms Hydrogen Count
23
Use Lock Atoms to block substitution for the most comprehensive retrieval in all structure databases
Lock Atoms prevents further substitution on an atom or a generic variable.
24
Lock Atoms has slightly higher retrieval - why?
The 46 REGISTRY answers are all radical ions or structural repeating polymers.
25
DCR and REAXYSFILESub differences are due to isotopes of hydrogen
Drawing a H atom or using the Hydrogen Count eliminates D and T in DCR and REAXYSFILESub, but not in REGISTRY. These answers are retrieved using the Lock Atoms tool.
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number® query modeling
• EXACT and FAMILY searches in CAS REGISTRY
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
27
Search Example: Case A – Allow Ring Fusion at dashed site Case B – Force Ring Fusion at dashed site
Specify ring fusion with Non-Hydrogen attachments
N
N
N
N
28
Non-hydrogen attachments allows precision control of ring fusion
• Lock Rings prevents further ring fusion for entire ring systems or chain bonds ‒ Chain bonds are blocked from ring fusion by default, but can
be changed
• Either open or lock out positions to further substitution – ring, chain, or ring/chain
• Chain substitutions can occur on nodes with exact ring non-hydrogen attachments
29
Right-click a node to change the Node Attributes
• Use the Marquee tool and Ctrl+Click to select multiple nodes.
• Right-click a single node to open the Node Attributes window.
• Changes apply to all selected nodes.
30
Allow Ring Fusion at selected sites
• To allow ring fusion, set Non-H Attachments to Exactly 2 Ring (gold box) and Exactly 3 Ring (red oval)
• Additional ring fusion is blocked other than open positions
N
N
Chain substituents are still allowed at any position with this technique.
31
Force Ring Fusion at selected sites
• To force ring fusion, set Non-H Attachments to Exactly 2 Ring (gold box) and Exactly 3 Ring (red oval)
• Additional ring fusion is forced at ring carbons
N
N
Chain substituents are still allowed.
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number® query modeling
• EXACT and FAMILY searches in CAS REGISTRY
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
33
Normalized bonds are used for tautomeric bonds in CAS REGISTRY
• The CAS REGISTRY definition for a tautomer is
‒ Node 2 = C, N, P, As, Sb, S, Se, Te, Cl, Br, I ‒ Hetero atoms 1 and 3 = N, O, S, Se, Te ‒ One hetero atom must have a hydrogen
(or D, T, or a negative charge) • REGISTRY generally uses a preferred tautomer format to
index • Other databases may have different policies
H1 3H 2 3 2 1
34
Keto-enol tautomers can not have Normalized bonds in CAS REGISTRY
Normalized bonds in chains and rings
Keto-enol tautomer does not have Normalized bonds
acids amines
amides
Keto-enol tautomers do not fit the REGISTRY Normalized bond rule.
Use unspecified bonds in a query to find both variations.
HOO
36
Tautomers may be represented in other databases in exact formats
Both structures pick up a large amount of overlap in REGISTRY, but DCR and REAXYSFILESub index the bonds to the oxygen as exact bonds.
37
Use Unspecified bonds to pick up tautomer structures in all databases
Use Lock Atoms to prevent further substitution.
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number® query modeling
• EXACT and FAMILY searches in CAS REGISTRY
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
39
Search Example: A composition comprising a compound of structure A, having heterocyclic Ring B as shown wherein R1 is an oxygen-containing fragment which is connected to either the carbon or the nitrogen atom of Ring B through an oxygen atom, wherein Ring B is a ring of 3-8 atoms with 2 heteroatoms, and wherein R2 is phenyl or aryl.
Fragments with 2 points of attachment in R-groups
Structure A
40
Various fragment lengths result in different ring sizes
Fragment (R1) Resulting Ring Fragment (R1) Resulting Ring
O
41
Fragment orientation determines where the oxygen connects in the ring
O connected to N O connected to C
42
Repeating groups in R-groups will be a future enhancement
Initially used a simple repeating fragment, but repeating motifs are not currently allowed in R-groups.
Look for enhancements in a future release.
43
R-groups can contain up to 20 items
Fragment orientation is denoted symbolically. Touch a node using the Fn tool to flip the orientation.
R-groups can contain up to 20 items.
44
Set fragment bonds to Ring
Fragments which occur in rings need to have ring bond attributes applied. Use Lock Rings to lock the ring containing R1.
47
Sample answers from DWPIM, DCR and REAXYSFILESub
DCR
DWPIM REAXYSFILESub
Additional fusion is possible on the heterocyclic ring.
49
What about a generic structure?
The Hy node is set to: • Monocyclic • 2 or more heteroatoms • Less than 7 carbon atoms.
The Element Count is set to: • Exactly 1 N • Exactly 1 O
51
The combined approach gives a good answer set in DCR, DWPIM and REAXYSFILESub
Combine the fragment and generic search to give a better answer set than either of the strategies alone in DCR, DWPIM and REAXYSFILESub.
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number® query modeling
• EXACT and FAMILY searches in CAS REGISTRY
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
53
Multiple fragments are searched across multiple components in a substructure search (SSS)
• Unlike classic STN, multiple fragments drawn in a single window are not restricted to a single component in a SSS search
• There will not be any overlap of fragments in answers where they appear in a single component
• Q-Lists can refine structure queries to ensure all fragments are in a single component in CAS REGISTRY and REAXYSFILESub ‒ Not available for DCR
54
Search example with multiple fragments
Search Example: Retrieve substances that contain at least the following 4 ring systems: a piperazine, a pyrimidine, and 2 phenyl rings.
55
Place all four rings in the Structure Editor drawing stage
Block ring fusion with the Lock Rings tool. Rings that are blocked display bold bonds.
58
Use Q-lists to find fragments in single components in REGISTRY or REAXYSFILESub 1
2 REGISTRY: Create Q-list of CAS RNs. REAXYSFILESub: Create Q-list of ANs.
59
Search the REGISTRY Q-List in the RN and CRN fields to retrieve substances that have all fragments in the same component
Multicomponent answer
3
This answer set has all four rings in one component in REGISTRY, whether single or multicomponent answers. CRN = Component Registry Number.
60
Create a Q-List of ANs in REAXYSFILESub, and then search in the AN and CMAN fields
This answer set has all four rings in single components in REAXYSFILESub. AN - Accession Number CMAN - Component Molecular Accession Number
• Structure searchable databases on new STN
• Structure drawing tips
• Templates
• CAS Registry Number® query modeling
• EXACT and FAMILY searches in CAS REGISTRY
• Options to block substitution
• Specify ring fusion with Non-Hydrogen attachments
• Tautomers and bonding patterns
• Fragments with two points of attachment in R-groups
• Search multiple fragments
• Metal containing substances: Coordination compounds
Agenda
62
Metal containing substances: Coordination Compounds
CAS REGISTRY indexes metals in rings for coordination compounds.
DCR indexes coordination compounds with neutral ligand, metal and counter ions, all as separate components. The single metal is a “chain node.”
DCR
REGISTRY
63
REAXYSFILESub can have metal containing substances indexed in a number of ways
Counter ions can be shown with bonds, or a separate components. The metals are shown here as part of the ring system.
REAXYSFILESub
REAXYSFILESub
64
Best practice for coordination complexes in CAS REGISTRY, DCR and REAXYSFILESub
• Use unspecified bonds to find any bonding pattern. • Set any chain bonds Ring/Chain. • Set metal atom to Ring/Chain.
65
MARPAT file structures contain the M variable node
If a single specific atom is shown in the patent it is shown separately.
66
MARPAT file structures contain the M variable node
More often, the metal centers are shown as part of a G-group.
67
Best practice for MARPAT: Set metal node to Match level CLASS in your query
Ring nodes are ATOM match by default. Set the M node to CLASS match to find file structures with the M in the ring.
68
Search specific metals by drawing separate query with R-group, combine with AND
STR1 STR2
Search Query: STR1/SSS,FULL AND STR2/SSS,FULL
69
Resources - recorded events
• http://www.stn-international.com/recorded_events.html ‒ Substance and Chemical Structure Searching in CAS Registry and
DCR on new STN
‒ Unified Markush Search on new STN
‒ MARPAT on new STN
‒ Derwent Markush Resource (DWPIM) now available on STN!
Click on View Event Recordings to see a list of previously recorded sessions.
CAS [email protected] Support and Training: www.cas.org
FIZ Karlsruhe [email protected] Support and Training: www.stn-international.de
For more information …