structure searching in new stn - stn international · • structure searchable databases on new stn...

70
Structure Searching on New STN®

Upload: hathuan

Post on 02-Apr-2018

227 views

Category:

Documents


2 download

TRANSCRIPT

Structure Searching on New STN®

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number® query modeling

• EXACT and FAMILY searches in CAS REGISTRYSM

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

3

Structure searchable databases on new STN

Database Coverage Time coverage

CAS REGISTRY Includes organics, inorganics, biochemicals and sequences

1800’s - present

MARPAT®

Markush structures from patents 1961 - present

Derwent Chemistry Resource - DCR

Selected structures from Sections B, C, and E of Derwent’s World Patent Index

1999 - present

Derwent Markush Resource - DWPIM

Markush structures from Sections B, C and E of Derwent World Patents Index

1961 - present

REAXYSFILESub Chemical structures from the literature 1771 to present. Includes organic, inorganic and metal-organic compounds as well as polymers and biological molecules

1771 - present

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number query modeling

• EXACT and FAMILY searches in CAS REGISTRY

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

5

1. Choose broadest applicable search

Exact (EXA) search retrieves stereoisomers, charged, and isotopically labeled structures. Exact searches of multiple fragments finds exact matches with exactly the same number of components.

Family (FAM) search retrieves salts, mixtures, and answers from an EXA search. Family searches of multiple fragments finds exact matches to the fragments, but additional components can be present.

Closed Substructure Search (CSS) retrieves substances with no substitutions at open nodes unless the query is specified for substitution and answers from FAM search.

Substructure (SSS) search retrieves analogs and derivatives of a core structure and answers from FAM search.

2. Draw appropriate bonds

Use unspecified bonds to find any type of bond: single, double, or triple. Useful in searching tautomeric structures in different databases.

Tips for broadening a structure query

6

Tips for broadening a substructure query 3. Use Variable Points of Attachment on rings

VPA can connect atoms, shortcuts, variables, and R-groups to rings.

4. Replace atoms with variables A - any atom except H Ak - at least 1 carbon in a chain Cb - carbocyclic ring systems Cy - any ring system Hy - heterocyclic ring system (at least one heteroatom) M - metal Q - heteroatom, any atom except C or H X - halogens

5. Use R-groups

R-groups can contain atoms, shortcuts, variables, and fragments.

6. Add repeating groups

Repeating groups can be used in rings or chains. Repeating groups must have nodes on either side.

7

Tips for narrowing a substructure query

1. Block substitution

Use the Lock Atoms tool to block any further substitution on an atom or generic node. You can use H-attachments, or H, except in MARPAT and DWPIM. Remember, shortcuts are already blocked!

2. Add generic definitions to variables

Draw variables with generic definitions as fragments before including in R-groups, e. g., monocyclic Hy.

3. Add element counts to variables

Draw variables with element counts as fragments before including in R-groups, e.g., Ak with 1-6 C.

3. Change default attributes

Use the Lock Ring tool to prevent retrieval of fused, bridged, and spiro rings. Right-click to change node or bond characteristics.

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number query modeling

• EXACT and FAMILY searches in CAS REGISTRY

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

9

New STN offers numerous structure templates

10

Use the Grid or List view to scroll quickly through the templates

11

Search feature let you find templates quickly and easily

Search string can be at beginning or embedded in the template name.

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number query modeling

• EXACT and FAMILY searches in CAS REGISTRY

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

13

Structures can be modeled directly from CAS Registry Numbers via Add to Editor tool

SMILES and InChI strings also work.

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number query modeling

• EXACT and FAMILY searches in CAS REGISTRY

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

15

EXACT and FAMILY structure search scope in CAS REGISTRY

• Use EXACT to find multicomponent substances with exact number of components when all components are drawn in the same window

• Use FAMILY to find multicomponent substances with additional components

• Both types of searches find structures as drawn, with all open positions blocked from substitution

EXACT and FAMILY are not currently working in DCR.

16

Tips for drawing EXACT and FAMILY queries

• EXACT and FAMILY structure queries can be built without drawing explicit H’s

• Only real atom nodes are allowed, no variables • No variable attachment points on rings • Unspecified bonds are allowed

17

Polymer structure search with EXACT or FAMILY scope

Draw components in the same window.

18

EXACT searches find substances containing exactly the components drawn, no additional components in CAS REGISTRY

19

FAMILY searches find REGISTRY substances containing the components drawn, plus additional components if present

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number query modeling

• EXACT and FAMILY searches in CAS REGISTRY

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

21

Options to block substitution

Search Example: Compare retrieval between drawing in H atoms, using Hydrogen Count, or the Lock Atoms tool in REGISTRY, DCR, and REAXYSFILESub.

Drawn in Lock Atoms Hydrogen Count

22

Drawn H’s are equivalent to Hydrogen Count

23

Use Lock Atoms to block substitution for the most comprehensive retrieval in all structure databases

Lock Atoms prevents further substitution on an atom or a generic variable.

24

Lock Atoms has slightly higher retrieval - why?

The 46 REGISTRY answers are all radical ions or structural repeating polymers.

25

DCR and REAXYSFILESub differences are due to isotopes of hydrogen

Drawing a H atom or using the Hydrogen Count eliminates D and T in DCR and REAXYSFILESub, but not in REGISTRY. These answers are retrieved using the Lock Atoms tool.

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number® query modeling

• EXACT and FAMILY searches in CAS REGISTRY

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

27

Search Example: Case A – Allow Ring Fusion at dashed site Case B – Force Ring Fusion at dashed site

Specify ring fusion with Non-Hydrogen attachments

N

N

N

N

28

Non-hydrogen attachments allows precision control of ring fusion

• Lock Rings prevents further ring fusion for entire ring systems or chain bonds ‒ Chain bonds are blocked from ring fusion by default, but can

be changed

• Either open or lock out positions to further substitution – ring, chain, or ring/chain

• Chain substitutions can occur on nodes with exact ring non-hydrogen attachments

29

Right-click a node to change the Node Attributes

• Use the Marquee tool and Ctrl+Click to select multiple nodes.

• Right-click a single node to open the Node Attributes window.

• Changes apply to all selected nodes.

30

Allow Ring Fusion at selected sites

• To allow ring fusion, set Non-H Attachments to Exactly 2 Ring (gold box) and Exactly 3 Ring (red oval)

• Additional ring fusion is blocked other than open positions

N

N

Chain substituents are still allowed at any position with this technique.

31

Force Ring Fusion at selected sites

• To force ring fusion, set Non-H Attachments to Exactly 2 Ring (gold box) and Exactly 3 Ring (red oval)

• Additional ring fusion is forced at ring carbons

N

N

Chain substituents are still allowed.

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number® query modeling

• EXACT and FAMILY searches in CAS REGISTRY

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

33

Normalized bonds are used for tautomeric bonds in CAS REGISTRY

• The CAS REGISTRY definition for a tautomer is

‒ Node 2 = C, N, P, As, Sb, S, Se, Te, Cl, Br, I ‒ Hetero atoms 1 and 3 = N, O, S, Se, Te ‒ One hetero atom must have a hydrogen

(or D, T, or a negative charge) • REGISTRY generally uses a preferred tautomer format to

index • Other databases may have different policies

H1 3H 2 3 2 1

34

Keto-enol tautomers can not have Normalized bonds in CAS REGISTRY

Normalized bonds in chains and rings

Keto-enol tautomer does not have Normalized bonds

acids amines

amides

Keto-enol tautomers do not fit the REGISTRY Normalized bond rule.

Use unspecified bonds in a query to find both variations.

HOO

35

Exact/Normalized will find bonds exactly as drawn, or as normalized bonds in file structures

36

Tautomers may be represented in other databases in exact formats

Both structures pick up a large amount of overlap in REGISTRY, but DCR and REAXYSFILESub index the bonds to the oxygen as exact bonds.

37

Use Unspecified bonds to pick up tautomer structures in all databases

Use Lock Atoms to prevent further substitution.

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number® query modeling

• EXACT and FAMILY searches in CAS REGISTRY

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

39

Search Example: A composition comprising a compound of structure A, having heterocyclic Ring B as shown wherein R1 is an oxygen-containing fragment which is connected to either the carbon or the nitrogen atom of Ring B through an oxygen atom, wherein Ring B is a ring of 3-8 atoms with 2 heteroatoms, and wherein R2 is phenyl or aryl.

Fragments with 2 points of attachment in R-groups

Structure A

40

Various fragment lengths result in different ring sizes

Fragment (R1) Resulting Ring Fragment (R1) Resulting Ring

O

41

Fragment orientation determines where the oxygen connects in the ring

O connected to N O connected to C

42

Repeating groups in R-groups will be a future enhancement

Initially used a simple repeating fragment, but repeating motifs are not currently allowed in R-groups.

Look for enhancements in a future release.

43

R-groups can contain up to 20 items

Fragment orientation is denoted symbolically. Touch a node using the Fn tool to flip the orientation.

R-groups can contain up to 20 items.

44

Set fragment bonds to Ring

Fragments which occur in rings need to have ring bond attributes applied. Use Lock Rings to lock the ring containing R1.

45

Run a multi-file search in REGISTRY, MARPAT, DCR, DWPIM and REAXYSFILESub

46

Sample answers from REGISTRY and MARPAT

MARPAT result

47

Sample answers from DWPIM, DCR and REAXYSFILESub

DCR

DWPIM REAXYSFILESub

Additional fusion is possible on the heterocyclic ring.

48

What about a generic structure?

The Cb node is set to unsaturated.

49

What about a generic structure?

The Hy node is set to: • Monocyclic • 2 or more heteroatoms • Less than 7 carbon atoms.

The Element Count is set to: • Exactly 1 N • Exactly 1 O

50

All ring types are retrieved in all files

REGISTRY

REGISTRY

REGISTRY

DCR REAXYSFILESub

51

The combined approach gives a good answer set in DCR, DWPIM and REAXYSFILESub

Combine the fragment and generic search to give a better answer set than either of the strategies alone in DCR, DWPIM and REAXYSFILESub.

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number® query modeling

• EXACT and FAMILY searches in CAS REGISTRY

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

53

Multiple fragments are searched across multiple components in a substructure search (SSS)

• Unlike classic STN, multiple fragments drawn in a single window are not restricted to a single component in a SSS search

• There will not be any overlap of fragments in answers where they appear in a single component

• Q-Lists can refine structure queries to ensure all fragments are in a single component in CAS REGISTRY and REAXYSFILESub ‒ Not available for DCR

54

Search example with multiple fragments

Search Example: Retrieve substances that contain at least the following 4 ring systems: a piperazine, a pyrimidine, and 2 phenyl rings.

55

Place all four rings in the Structure Editor drawing stage

Block ring fusion with the Lock Rings tool. Rings that are blocked display bold bonds.

56

Fragments can be found in single components…

CAS REGISTRY REAXYSFILESub

DCR

57

…or fragments can be found in multiple components

DCR-2982165

CAS RN 1346676-85-0

58

Use Q-lists to find fragments in single components in REGISTRY or REAXYSFILESub 1

2 REGISTRY: Create Q-list of CAS RNs. REAXYSFILESub: Create Q-list of ANs.

59

Search the REGISTRY Q-List in the RN and CRN fields to retrieve substances that have all fragments in the same component

Multicomponent answer

3

This answer set has all four rings in one component in REGISTRY, whether single or multicomponent answers. CRN = Component Registry Number.

60

Create a Q-List of ANs in REAXYSFILESub, and then search in the AN and CMAN fields

This answer set has all four rings in single components in REAXYSFILESub. AN - Accession Number CMAN - Component Molecular Accession Number

• Structure searchable databases on new STN

• Structure drawing tips

• Templates

• CAS Registry Number® query modeling

• EXACT and FAMILY searches in CAS REGISTRY

• Options to block substitution

• Specify ring fusion with Non-Hydrogen attachments

• Tautomers and bonding patterns

• Fragments with two points of attachment in R-groups

• Search multiple fragments

• Metal containing substances: Coordination compounds

Agenda

62

Metal containing substances: Coordination Compounds

CAS REGISTRY indexes metals in rings for coordination compounds.

DCR indexes coordination compounds with neutral ligand, metal and counter ions, all as separate components. The single metal is a “chain node.”

DCR

REGISTRY

63

REAXYSFILESub can have metal containing substances indexed in a number of ways

Counter ions can be shown with bonds, or a separate components. The metals are shown here as part of the ring system.

REAXYSFILESub

REAXYSFILESub

64

Best practice for coordination complexes in CAS REGISTRY, DCR and REAXYSFILESub

• Use unspecified bonds to find any bonding pattern. • Set any chain bonds Ring/Chain. • Set metal atom to Ring/Chain.

65

MARPAT file structures contain the M variable node

If a single specific atom is shown in the patent it is shown separately.

66

MARPAT file structures contain the M variable node

More often, the metal centers are shown as part of a G-group.

67

Best practice for MARPAT: Set metal node to Match level CLASS in your query

Ring nodes are ATOM match by default. Set the M node to CLASS match to find file structures with the M in the ring.

68

Search specific metals by drawing separate query with R-group, combine with AND

STR1 STR2

Search Query: STR1/SSS,FULL AND STR2/SSS,FULL

69

Resources - recorded events

• http://www.stn-international.com/recorded_events.html ‒ Substance and Chemical Structure Searching in CAS Registry and

DCR on new STN

‒ Unified Markush Search on new STN

‒ MARPAT on new STN

‒ Derwent Markush Resource (DWPIM) now available on STN!

Click on View Event Recordings to see a list of previously recorded sessions.

CAS [email protected] Support and Training: www.cas.org

FIZ Karlsruhe [email protected] Support and Training: www.stn-international.de

For more information …