cdk workshop 2009 intro course material

38
Contents 1 Preface 3 Bibliography ....................................... 3 2 Installation 4 2.1 Binaries ....................................... 4 2.2 Source ........................................ 4 2.2.1 Git ...................................... 4 2.2.2 Compiling .................................. 5 2.3 Debian GNU/Linux & Ubuntu .......................... 5 3 Writing CDK Applications 6 3.1 A (Very) Basic Java Application ......................... 6 3.2 BeanShell ...................................... 6 3.3 Groovy ....................................... 7 3.3.1 Closures ................................... 8 3.4 Other Languages .................................. 9 3.4.1 Bioclipse .................................. 10 3.4.2 Cinfony ................................... 10 3.4.3 R ....................................... 10 4 Documentation 11 4.1 JavaDoc ....................................... 11 4.2 Other Sources .................................... 11 Bibliography ....................................... 11 5 Atoms, Bonds and Molecules 12 5.1 Atoms ........................................ 12 5.1.1 IElement .................................. 12 5.1.2 IIsotope ................................... 13 5.1.3 IAtomType ................................. 13 5.2 Bonds ........................................ 13 5.3 Molecules ...................................... 14 5.3.1 Iterating of atoms and bonds ....................... 15 6 Graph Properties 17 6.1 Partitioning ..................................... 17 6.2 Spanning Tree .................................... 17 1

Upload: egon-willighagen

Post on 10-Apr-2015

1.064 views

Category:

Documents


0 download

DESCRIPTION

Course material for my part of the CDK Workshop 2009.

TRANSCRIPT

Page 1: CDK Workshop 2009 Intro Course Material

Contents

1 Preface 3Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Installation 42.1 Binaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2.1 Git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2.2 Compiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Debian GNU/Linux & Ubuntu . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Writing CDK Applications 63.1 A (Very) Basic Java Application . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 BeanShell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.3 Groovy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3.1 Closures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.4 Other Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.4.1 Bioclipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4.2 Cinfony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.4.3 R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Documentation 114.1 JavaDoc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 Other Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Atoms, Bonds and Molecules 125.1 Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

5.1.1 IElement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.1.2 IIsotope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.1.3 IAtomType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5.2 Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.3 Molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.3.1 Iterating of atoms and bonds . . . . . . . . . . . . . . . . . . . . . . . 15

6 Graph Properties 176.1 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176.2 Spanning Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1

Page 2: CDK Workshop 2009 Intro Course Material

7 Missing Information 197.1 Reconnecting Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.2 Missing Hydrogens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

7.2.1 Implicit Hydrogens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

8 Input/Output 228.1 File Format Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228.2 Example: Downloading Domoic Acid from PubChem . . . . . . . . . . . . . . 23Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

9 Keyword List 24

2

Page 3: CDK Workshop 2009 Intro Course Material

Chapter 1

Preface

This book is written to help people start developing cheminformatics software using theChemistry Development Kit, an open source cheminformatics toolkit written in Java [1, 2].This book is written for version 1.2.1 and all code snippets in this book are compiled againstthis version when the book is build.

Each snippet is actually a small Groovy script, BeanShell script, or Java Application, andall code is compiled and run. Some sections show the output of the code snippets. Each codesnippet has an orange bar which points to name of the script or application from which thesnippet was derived.

Bibliography

[1] C. Steinbeck, Y. Han, S. Kuhn, O. Horlacher, E. Luttmann, and E. Willighagen. Thechemistry development kit (cdk): an open-source java library for chemo- and bioinformat-ics. J Chem Inf Comput Sci, 43(2):493–500, 2003.

[2] Christoph Steinbeck, Christian Hoppe, Stefan Kuhn, Matteo Floris, Rajarshi Guha, andEgon L. Willighagen. Recent developments of the chemistry development kit (cdk) - anopen-source java library for chemo- and bioinformatics. Current Pharmaceutical Design,pages 2111–2120, June 2006.

3

Page 4: CDK Workshop 2009 Intro Course Material

Chapter 2

Installation

2.1 Binaries

Like most Java software, CDK can be downloaded in binary form as .jar file. The CDK1.2.1 version can be downloaded from http://dl.sf.net/sourceforge/cdk/cdk-1.2.1.jar. This Java Archive file includes all third party dependencies, and only requires a JavaVirtual Machine to be used.

2.2 Source

There are two primary methods to get the source code for the CDK: you can downloadthe source package, or you can download the source code from the Git repository. Thesource package has the advantage that you run exactly the version of the CDK for whichyou downloaded the source code; using Git has the approach to run any version you like, butrequires a bit more effort to get going.

The source distribution with all required third party libraries (except a Java Virtual Ma-chine) can be downloaded as tar.gz from http://dl.sf.net/sourceforge/cdk/cdk-src+libs-1.2.1.tar.gz or as ZIP file from http://dl.sf.net/sourceforge/cdk/cdk-src+libs-1.2.1.zip.

2.2.1 Git

The CDK source code is hosted in a Git repository at SourceForge: http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk;a=summary. You will find there the complete his-tory of the source code of the CDK library. Git (http://git-scm.com/) is a version controlsystem that allows us to develop the CDK in a distributed manor. Anyone can write patches,even publish them on their web server, and the CDK release managers can include them inthe main distribution when these patches are approved.

In contrast to the source dist you download the full CDK history, but still is only about100MB in git format. The command to make a local copy of the git repository on SourceForgelooks like:

$ git clone git://cdk.git.sourceforge.net/gitroot/cdk

This will get you a copy of trunk, but since we discuss CDK 1.2.1 , you will need to getthe cdk-1.2.x branch which you can do by making a local branch:

4

Page 5: CDK Workshop 2009 Intro Course Material

$ git checkout -b cdk1.2.1 origin/cdk-1.2.x

2.2.2 Compiling

Compiling requires you have Ant 1.7.1 installed, which can be downloaded from http://ant.apache.org/.

The source code can be compile by issueing:

$ ant clean dist-large

2.3 Debian GNU/Linux & Ubuntu

Debian GNU/Linux and Ubuntu users can install older version of the CDK with aptitude:

$ sudo aptitude install libcdk-java

If you wish to compile CDK 1.2.1 , you can take advantage of the build-dependenciesfor the above package. With the following command you can download most of the requireddependencies:

$ sudo apt-get build-dep libcdk-java

5

Page 6: CDK Workshop 2009 Intro Course Material

Chapter 3

Writing CDK Applications

3.1 A (Very) Basic Java Application

Using the CDK library smaller and larger Java programs can be written that take advantagefor the cheminformatics functionality in the library. Experiences programmers know how touse external libraries, and there is enough tutorial material on Java programming on theInternet. This chapter only gives the very basic pointers to get you started from scratch.Given you already downloaded the CDK jar file, or compiled it from scratch, consider thefollowing piece of Java source code:

import org.openscience.cdk.interfaces.IAtom;import org.openscience.cdk.Atom;

public class BasicProgram {public static void main(String args[]) throws Exception {

IAtom atom = new Atom("C");System.out.println(atom);

}}

This code can then be compiled with javac to byte code, creating a BasicProgram.class:

$ javac -classpath cdk-1.2.1.jar BasicProgram

And then run with:

$ java -classpath .:cdk-1.2.1.jar BasicProgram

3.2 BeanShell

BeanShell (http://www.beanshell.org/) is a simple interactive environment where one canexperiment with Java libraries. For example, consider this simple script:

code/BeanShell.bsh

import org.openscience.cdk.Atom;Atom atom = new Atom("C");print(atom);

6

Page 7: CDK Workshop 2009 Intro Course Material

Figure 3.1 shows the effect of running this script in the graphical frontend xbsh.Beanshell needs to be made aware of the CLASSPATH, which uses the common approach

for setting this:

$ CLASSPATH=cdk-1.2.1.jar bsh

Figure 3.1: Screenshot of xbsh showing a simple BeanShell script.

3.3 Groovy

Groovy (http://groovy.codehaus.org/) is a programming language that advertizes itselfas is an agile and dynamic language for the Java Virtual Machine. Indeed, like BeanShellis provides an environment to quickly try Java code. However, unlike BeanShell, it providemore linguistic changes and adds quite interesting sugar to that Java language.

A simple script may look like:

code/IterateAtoms.groovy

for (IAtom atom : molecule.atoms()) {System.out.println(atom.getSymbol());

}

But in Groovy it can also look like:

code/IterateAtomsGroovy.groovy

for (atom in molecule.atoms()) {println atom.getSymbol()

}

Groovy needs to be made aware of the CDK, which uses the common CLASSPATHapproach for doing this. To start the GUI console shown in Figure 3.2:

$ CLASSPATH=cdk-1.2.1.jar groovyConsole

7

Page 8: CDK Workshop 2009 Intro Course Material

Figure 3.2: Screenshot of groovyConsole showing a simple Groovy script.

3.3.1 Closures

However, one of the more interesting features of Groovy is the closures. I have know theprogramming pattern from R and happily used for a long time, but only recently learned thisto be called closures. Closures allow you to pass method as parameter, which can have manyapplications, and I will show one situation here.

Consider the calculation of molecular properties which happen to be a mere summationover atomic properties, such as the total charge, or the molecular weight. Both these calcula-tions require an iteration over all atoms. If we need those properties at the same time, we cancombine the calcultion into one iteration. However, let’s generalize the situation a bit andassume they are not. Therefore, we have to slices of code which share a large amount of data:

code/CalculateTotalCharge.groovy

totalCharge = 0.0

8

Page 9: CDK Workshop 2009 Intro Course Material

for (atom in molecule.atoms()) {totalCharge += atom.getCharge()

}

and

code/CalculateMolecularWeight.groovy

molWeight = 0.0for (atom in molecule.atoms()) {molWeight += isotopeInfo.getNaturalMass(atom)

}

In both cases we want to apply a custom bit of code to all atoms. Groovy allows us toshare the common code:

code/GroovyClosureForAllAtoms.groovy

def forAllAtoms(molecule, block) {for (atom in molecule.atoms()) {

block(atom)}

}totalCharge = 0.0forAllAtoms(molecule, { totalCharge += it.getCharge() } )totalCharge = String.format(’%.2f’, totalCharge)println "Total charge: ${totalCharge}"molWeight = 0.0forAllAtoms(molecule, { molWeight += isotopeInfo.getNaturalMass(it) } )molWeight = String.format(’%.2f’, molWeight)println "Molecular weight: ${molWeight}"

which gives the output:

Total charge: -0.00Molecular weight: 16.04

3.4 Other Languages

Except Java and BeanShell, there are other languages at you disposal when using the CDKlibrary. This book will only use Java code snippets, either as BeanShell or as Java application,but in this section point to alternatives. These alternatives do not always provide access tothe full CDK API, but at the same time often offer a custom API which hides certain moretechnical details.

9

Page 10: CDK Workshop 2009 Intro Course Material

3.4.1 Bioclipse

Bioclipse has a custom scripting language with a JavaScript interface [1]. Functionality is pro-vided using managers, and CDK functionality is provided using two such managers. Bioclipsecan be downloaded from http://www.bioclipse.net/ and example scripts are available fromthe following bookmark lists: http://delicious.com/tag/bioclipse+gist+manager:cdkand http://delicious.com/tag/bioclipse+gist+manager:cdx.

3.4.2 Cinfony

Cinfony is a Python module that integrates to the CDK as well as two other cheminformaticstoolkits [2]. Cinfony can be downloaded from http://code.google.com/p/cinfony/.

3.4.3 R

The statistical software R (http://www.r-project.org/) also provide access to the CDKfunctionality via the rcdk package [3]. This package can be downloaded from CRAN fromhttp://cran.r-project.org/web/packages/rcdk/.

10

Page 11: CDK Workshop 2009 Intro Course Material

Chapter 4

Documentation

4.1 JavaDoc

Besides this book, and in particular the keyword index at the end, you will find the Java APIdocumentation (JavaDoc) valuable. If you have downloaded the source distributions will, youcan generate the documentation with Ant in the doc/cdk-javadoc-1.2.1 folder, using:

$ ant -f javadoc.xml html

Alternatively, you can download the documentation here at http://dl.sf.net/sourceforge/cdk/cdk-javadoc-1.2.1.tar.gz.

4.2 Other Sources

More information can be found in the following resource:

� CDK News: http://www.cdknews.org/

� CDK Wiki: https://apps.sourceforge.net/mediawiki/cdk/index.php?title=Documentation

Bibliography

[1] Ola Spjuth, Tobias Helmus, Egon Willighagen, Stefan Kuhn, Martin Eklund, JohannesWagener, Peter M. Rust, Christoph Steinbeck, and Jarl Wikberg. Bioclipse: An opensource workbench for chemo- and bioinformatics. BMC Bioinformatics, 8(1), 2007.

[2] N. M. O’Boyle and G. R. Hutchison. Cinfony - combining open source cheminformaticstoolkits behind a common interface. Chemistry Central journal, 2, 2008.

[3] Rajarshi Guha. Chemical informatics functionality in r. Journal of Statistical Software,18(6), 2007.

11

Page 12: CDK Workshop 2009 Intro Course Material

Chapter 5

Atoms, Bonds and Molecules

The basic objects in the CDK are the IAtom, IBond and IAtomContainer. The name ofthe latter is somewhat misleading, as it contains not just IAtoms but also IBonds. Theprimary use of the model is the graph-based representation of molecules, where bonds areedges between two atoms being the nodes.

Before we start, it is important to note that CDK 1.2 has an important convention aroundobject properties: when a property is unset, the object’s field is set to null. This brings insources for NullPointerExceptions, but also allows us to distinguish between, for example, zeroand unset formal charge.

5.1 Atoms

The CDK interface IAtom is the underlying data model of atoms. Creating a new atom isfairly easy:

code/CreateAtom1.java

IAtom atom = new Atom("C");

Or, alternatively, using the Elements class:

code/CreateAtom2.java

IAtom atom = new Atom(Elements.CARBON);

An CDK atom has many properties, many of them inherited from IElement, IIsotope andIAtomType (see Figure 5.1).

5.1.1 IElement

The most common property of IElement it the symbol, which is set in the constructors in theabove code. The other property for elements found in the CDK is the atomic number :

code/ElementProperties.groovy

atom.setSymbol("N")atom.setAtomicNumber(7)

12

Page 13: CDK Workshop 2009 Intro Course Material

Figure 5.1: The IAtom interface extends the IAtomType interface, which extends the IIsotopeinterface, which, in turn, extends the IElement interface.

5.1.2 IIsotope

The IElement information consists of the mass number, exact mass and natural abundance:

code/IsotopeProperties.groovy

IAtom atom = new Atom("C");atom.setMassNumber(13)atom.setNaturalAbundance(1.07)atom.setExactMass(13.00335484)

5.1.3 IAtomType

The IAtomType interface contains fields that relate to the model we use to describe the prop-erties of atoms, for example, used in force fields. These properties include formal charge,neighbor count, maximum bond order and atom type name:

code/AtomTypeProperties.groovy

atom.setAtomTypeName("C.3")atom.setFormalCharge(-1)atom.setMaxBondOrder(IBond.Order.SINGLE)

5.2 Bonds

The IBond interface of the CDK is an interaction between two or more IAtoms, extendingthe IElectronContainer interface. While the most common application in the CDK originatesfrom graph theory, it is not restricted to that. That said, many algorithms implemented inthe CDK expect a graph theory based model, where each bond connects two, and not more,atoms.

For example, to create ethanol we write:

code/Ethanol.groovy

IAtom atom1 = new Atom("C")IAtom atom2 = new Atom("C")

13

Page 14: CDK Workshop 2009 Intro Course Material

IAtom atom3 = new Atom("O")IBond bond1 = new Bond(atom1, atom2, IBond.Order.SINGLE);IBond bond2 = new Bond(atom2, atom3, IBond.Order.SINGLE);

The CDK has a few bond orders, which we can list with this groovy code:

code/BondOrders.groovy

IBond.Order.each {println it

}

which outputs:

SINGLEDOUBLETRIPLEQUADRUPLE

As you might notice, there is no AROMATIC bond defined. This is deliberate and theCDK allows to define single-double bond order patterns at the same time as aromaticity in-formation. For example, a kekule structure of benzene with bonds marked as aromatic canbe constructued with:

code/AromaticBond.groovy

IAtom atom1 = new Atom("C")IAtom atom2 = new Atom("C")IAtom atom3 = new Atom("C")IAtom atom4 = new Atom("C")IAtom atom5 = new Atom("C")IAtom atom6 = new Atom("C")IBond bond1 = new Bond(atom1, atom2, IBond.Order.SINGLE)IBond bond2 = new Bond(atom2, atom3, IBond.Order.DOUBLE)IBond bond3 = new Bond(atom3, atom4, IBond.Order.SINGLE)IBond bond4 = new Bond(atom4, atom5, IBond.Order.DOUBLE)IBond bond5 = new Bond(atom5, atom6, IBond.Order.SINGLE)IBond bond6 = new Bond(atom6, atom1, IBond.Order.DOUBLE)bond1.setFlag(CDKConstants.ISAROMATIC, true);bond2.setFlag(CDKConstants.ISAROMATIC, true);bond3.setFlag(CDKConstants.ISAROMATIC, true);bond4.setFlag(CDKConstants.ISAROMATIC, true);bond5.setFlag(CDKConstants.ISAROMATIC, true);bond6.setFlag(CDKConstants.ISAROMATIC, true);

5.3 Molecules

We already saw in the previous pieces of code how the CDK can be used to create molecules,and while the above is strictly enough to find all atoms in the molecule starting with only

14

Page 15: CDK Workshop 2009 Intro Course Material

one of the atoms in the molecule, it often is more convenient to store all atoms and bonds ina container.

The CDK has two containers, which are identical in functionality, but which have differentsemantics: the IAtomContainer and the IMolecule. The first is a general container to holdsatoms an bonds, while the IMolecule has the added implication that it is meant that thecontainer holds a single molecule, of which all atoms are connected to each other via one ormore sudo covalent bonds. It is important to note, however, that the latter is not enforced.

Adding atoms and bonds is done by the methods addAtom(IAtom) and addBond(IBond):

code/AtomContainerAddAtomsAndBonds.groovy

mol = new AtomContainer();mol.addAtom(new Atom("C"));mol.addAtom(new Atom("H"));mol.addAtom(new Atom("H"));mol.addAtom(new Atom("H"));mol.addAtom(new Atom("H"));mol.addBond(new Bond(mol.getAtom(0), mol.getAtom(1)));mol.addBond(new Bond(mol.getAtom(0), mol.getAtom(2)));mol.addBond(new Bond(mol.getAtom(0), mol.getAtom(3)));mol.addBond(new Bond(mol.getAtom(0), mol.getAtom(4)));

The addBond() method has an alternative which takes three parameters: the first atom,the second atom, and the bond order. Note that atom indices follows programmers habitsand starts at 0, as you can observe in the previous example too. This shortens the previousversion a bit:

code/AtomContainerAddAtomsAndBonds2.groovy

mol = new AtomContainer();mol.addAtom(new Atom("C"));mol.addAtom(new Atom("H"));mol.addAtom(new Atom("H"));mol.addAtom(new Atom("H"));mol.addAtom(new Atom("H"));mol.addBond(0,1,IBond.Order.SINGLE);mol.addBond(0,2,IBond.Order.SINGLE);mol.addBond(0,3,IBond.Order.SINGLE);mol.addBond(0,4,IBond.Order.SINGLE);

5.3.1 Iterating of atoms and bonds

The IAtomContainer comes with convenience methods to iterate over atoms and bonds. Bothmethods use the Iterable interfaces, and for atoms we do:

code/CountHydrogens.groovy

int hydrogenCount = 0for (IAtom atom : mol.atoms()) {

15

Page 16: CDK Workshop 2009 Intro Course Material

if ("H".equals(atom.getSymbol())) hydrogenCount++}println "Number of hydrogens: $hydrogenCount"

which returns

Number of hydrogens: 4

And for bonds the equivalent:

code/CountDoubleBonds.groovy

int doubleBondCount = 0for (IBond bond : mol.bonds()) {if (IBond.Order.DOUBLE == bond.getOrder())

doubleBondCount++}println "Number of double bonds: $doubleBondCount"

giving

Number of double bonds: 1

16

Page 17: CDK Workshop 2009 Intro Course Material

Chapter 6

Graph Properties

Graph theory is the most common representation in cheminformatics, and with quantummechanics, rule the informatics side of chemistry. The molecular graph follow graph theoryand defines atoms as molecules and bonds as edge between to atoms. This is by far the onlyoption, and the IBond allows for more complex representations, but we will focus on themolecular graph in this chapter.

6.1 Partitioning

If one is going to calculate graph properties, the first thing one often has to do, is to splitensure that one is looking at a fully connected graph. Since this is often in combination withensuring fully connected graphs, the ConnectivityChecker is a welcome tool. It allow partition-ing of the atoms and bonds in an IAtomContainer into molecules, organized into IMoleculeSet:

code/ConnectivityCheckerDemo.groovy

atomCon = new AtomContainer();atom1 = new Atom("C");atom2 = new Atom("C");atomCon.addAtom(atom1);atomCon.addAtom(atom2);moleculeSet = ConnectivityChecker.partitionIntoMolecules(atomCon);println "Number of isolated graphs: $moleculeSet.moleculeCount"

Which gives:

Number of isolated graphs: 2

6.2 Spanning Tree

The spanning tree of a graph, is subgraph with no cycles; that spans all atoms into a, still,fully connected graph:

code/SpanningTreeBondCount.groovy

17

Page 18: CDK Workshop 2009 Intro Course Material

println "Number of azulene bonds: $azulene.bondCount"treeBuilder = new SpanningTree(azulene)azuleneTree = treeBuilder.getSpanningTree();println "Number of tree bonds: $azuleneTree.bondCount"

which returns:

Number of azulene bonds: 11Number of tree bonds: 9

As a side effect, it also determines which bonds are ring bonds, and which are not:

code/SpanningTreeRingBonds.groovy

ethaneTree = new SpanningTree(ethane)println "[ethane]"println "Number of cyclic bonds: $ethaneTree.bondsCyclicCount"println "Number of acyclic bonds: $ethaneTree.bondsAcyclicCount"azuleneTree = new SpanningTree(azulene)println "[azulene]"println "Number of cyclic bonds: $azuleneTree.bondsCyclicCount"println "Number of acyclic bonds: $azuleneTree.bondsAcyclicCount"

giving

[ethane]Number of cyclic bonds: 0Number of acyclic bonds: 1[azulene]Number of cyclic bonds: 11Number of acyclic bonds: 0

18

Page 19: CDK Workshop 2009 Intro Course Material

Chapter 7

Missing Information

Missing information is common place in chemical file formats and line notations. In manycases this information is implicit to the representation, but recovering it is not always easy,requiring assumptions which may not be true. Examples of missing informations is the lackof bonds in XYZ files, and the removed double bond location information for aromatic ringsystems.

7.1 Reconnecting Atoms

XYZ files do not have bond information, and may look like:

5methaneC 0.25700 -0.36300 0.00000H 0.25700 0.72700 0.00000H 0.77100 -0.72700 0.89000H 0.77100 -0.72700 -0.89000H -0.77100 -0.72700 0.00000

Fortunately, we can reasonably assume bonds to have a certain lenght, and reasonablyunderstand how many connections and atom can have at most. Then, using the 3D coordinateinformation available from the XYZ file, an algorithm can deduce how the atoms must bebonded. The RebondTool does exactly that. And, it does it efficiently too, using a binarysearch tree, which allows it to scale to protein-sized molecules.

Now, the algorithm does need to know what reasonable bond lengths are, and for this wecan use the Jmol list of covalent radii, and we configure the atoms accordingly:

code/CovalentRadii.groovy

methane = new Molecule();methane.addAtom(new Atom("C", new Point3d(0.0, 0.0, 0.0)));methane.addAtom(new Atom("H", new Point3d(0.6, 0.6, 0.6)));methane.addAtom(new Atom("H", new Point3d(-0.6, -0.6, 0.6)));methane.addAtom(new Atom("H", new Point3d(0.6, -0.6, -0.6)));

19

Page 20: CDK Workshop 2009 Intro Course Material

methane.addAtom(new Atom("H", new Point3d(-0.6, 0.6, -0.6)));factory = AtomTypeFactory.getInstance("org/openscience/cdk/config/data/jmol_atomtypes.txt",methane.getBuilder()

);for (IAtom atom : methane.atoms()) {factory.configure(atom);println "$atom.symbol -> $atom.covalentRadius"

}

which configures and prints the atoms’ radii:

C -> 0.77H -> 0.32H -> 0.32H -> 0.32H -> 0.32

Then the RebondTool can be used to rebond the atoms:

code/RebondToolDemo.groovy

RebondTool rebonder = new RebondTool(2.0, 0.5, 0.5);rebonder.rebond(methane);println "Bond count: $methane.bondCount"

The number of bonds it found are reported in the last line:

Bond count: 4

7.2 Missing Hydrogens

Missing hydrogens can be added using the CDKHydrogenAdder. This class, however, expectsCDK atom types to be perceived, for which we can use the CDKAtomTypeMatcher:

code/AtomTypePerception.groovy

molecule = new Molecule();atom = new Atom(Elements.CARBON);molecule.addAtom(atom);matcher = CDKAtomTypeMatcher.getInstance(DefaultChemObjectBuilder.getInstance()

);type = matcher.findMatchingAtomType(molecule, atom);AtomTypeManipulator.configure(atom, type);println "Atom type: $type.atomTypeName"

This reports the perceived atom type for the carbon:

Atom type: C.sp3

20

Page 21: CDK Workshop 2009 Intro Course Material

7.2.1 Implicit Hydrogens

Implicit hydrogens are hydrogens that are not vertices in the molecular graph, but merely aproperty of the atom. They can be calculated with:

code/MissingHydrogens.groovy

adder = CDKHydrogenAdder.getInstance(DefaultChemObjectBuilder.getInstance()

);adder.addImplicitHydrogens(molecule);println "Atom count: $molecule.atomCount"println "Implicit hydrogens: $newAtom.hydrogenCount"

which reports:

Atom count: 1Implicit hydrogens: 4

21

Page 22: CDK Workshop 2009 Intro Course Material

Chapter 8

Input/Output

The CDK has functionality for extracting information from files in many different file for-mats. Unfortunately, hardly ever the full format specification is supported, but generally thechemical graph and 2D or 3D coordinates are extracted, not uncommonly complemented withformal or partial charge.

8.1 File Format Detection

Typically, one is fairly aware about the format of a file. Computer programs in general donot, however, but the CDK has a fairly accurate class for detecting the chemical format of afile. To detect the format of a file, the FormatFactory can be used:

code/GuessFormat.groovy

Reader stringReader = new StringReader("<molecule xmlns=’http://www.xml-cml.org/schema’/>"

);FormatFactory factory = new FormatFactory();IChemFormat format = factory.guessFormat(stringReader);System.out.println("Format: " + format.getFormatName());

This script properly recognized the file a Chemical Markup Language [1]:

Format: Chemical Markup Language

To learn if the CDK has a IChemObjectReader or IChemObjectWriter one can use the meth-ods getReaderClassName() and getWriterClassName() respectively:

code/HasReaderOrWriter.groovy

Reader stringReader = new StringReader("<molecule xmlns=’http://www.xml-cml.org/schema’/>"

);IChemFormat format = factory.guessFormat(stringReader);String readerClass = format.getReaderClassName();

22

Page 23: CDK Workshop 2009 Intro Course Material

String writerClass = format.getWriterClassName();System.out.println("Reader: " + readerClass);System.out.println("Writer: " + writerClass);

It reports:

Reader: org.openscience.cdk.io.CMLReaderWriter: org.openscience.cdk.io.CMLWriter

8.2 Example: Downloading Domoic Acid from PubChem

As an example, below will follow a small script that takes a PubChem compound identifier(CID) and downloads the corresponding ANS.1 XML file, parses it and counts the number ofatoms:

code/PubChemDownload.groovy

cid = 5282253reader = new PCCompoundXMLReader(new URL(

"http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=$cid&disopt=SaveXML").newInputStream()

)mol = reader.read(new NNMolecule())println "Atom count: $mol.atomCount"

It reports:

Atom count: 43

Bibliography

[1] P. Murray-Rust and H. S. Rzepa. Chemical markup, xml, and the worldwide web. 1. basicprinciples. J. Chem. Inf. Model., 39(6):928–942, November 1999.

23

Page 24: CDK Workshop 2009 Intro Course Material

Chapter 9

Keyword List

The keyword list is an overview of CDK functionality organized by keyword. For each key-word a reference is made to the CDK class that implements that functionality. To simplifythe list, the common prefix org.openscience.cdk has been removed. For example, the classorg.openscience.cdk.layout.StructureDiagramGenerator has to do with 2D coordinates.

2D layout.StructureDiagramGenerator

2D coordinates layout.StructureDiagramGenerator

2D-coordinateslayout.TemplateHandler

layout.OverlapResolver

3Dmodeling.builder3d.ModelBuilder3D

similarity.DistanceMoment

3D coordinates modeling.builder3d.ModelBuilder3D

3D isomorphismpharmacophore.PharmacophoreQueryAtom

pharmacophore.PharmacophoreBond

pharmacophore.PharmacophoreUtils

pharmacophore.PharmacophoreMatcher

pharmacophore.PharmacophoreAtom

pharmacophore.PharmacophoreAngleBond

pharmacophore.PharmacophoreQueryAngleBond

pharmacophore.PharmacophoreQueryBond

3D modelgeometry.AtomTools.add3DCoordinates1()

modeling.builder3d.AtomTetrahedralLigandPlacer3D.add3DCoordinatesForSinglyBondedLigands()

3D-coordinates modeling.forcefield.GeometricMinimizer

addingtools.CDKHydrogenAdder.addImplicitHydrogens()

tools.manipulator.AtomContainerManipulator.convertImplicitToExplicitHydrogens()

adjacency matrix graph.matrix.AdjacencyMatrix

amino acidAminoAcid

24

Page 25: CDK Workshop 2009 Intro Course Material

interfaces.IAminoAcid

amino acidsstuctures templates.AminoAcids

animationChemSequence

interfaces.IChemSequence

aromatic ringbond order adjustment tools.DeAromatizationTool

aromaticity detector aromaticity.AromaticityCalculator

ASNio.iterator.IteratingPCSubstancesXMLReader

io.iterator.IteratingPCCompoundASNReader

io.iterator.IteratingPCCompoundXMLReader

association Association

atomAtom

Bond

interfaces.IAtom

interfaces.IBond

typeAtomType

config.TXTBasedAtomTypeConfigurator

config.AtomTypeFactory

interfaces.IAtomType

valencytools.LonePairElectronChecker

tools.SaturationChecker

tools.SmilesValencyChecker

chemical validation validate.ValidationTest

atom coloringpartial charges renderer.color.PartialAtomicChargeColors

CPK renderer.color.CPKAtomColors

atom mapping Mapping

atom parityAtomParity

interfaces.IAtomParity

atom typeE-state atomtype.EStateAtomTypeMatcher

mmff94 modeling.builder3d.MMFF94BasedParameterSetReader

MM2 modeling.builder3d.MM2BasedParameterSetReader

atom typesSybyl atomtype.SybylAtomTypeMatcher

atomicinterfaces.IElement

qsar.descriptors.molecular.APolDescriptor

atomic number interfaces.IElement

AtomPlacer3D modeling.builder3d.AtomPlacer3D

25

Page 26: CDK Workshop 2009 Intro Course Material

BCUT qsar.descriptors.molecular.BCUTDescriptor

Binary Space Partitioning Tree graph.rebond.Bspt

biopolymerBioPolymer

interfaces.IBioPolymer

interfaces.IPDBPolymer

bondElectronContainer

Association

Bond

LonePair

interfaces.ILonePair

interfaces.IBond

recalculation graph.rebond.RebondTool

bond countrotatable qsar.descriptors.molecular.RotatableBondsCountDescriptor

bond creation geometry.BondTools.closeEnoughToBond()

bond orderCDKConstants

smiles.DeduceBondSystemTool

calculationtools.SaturationChecker.newSaturate()

tools.SmilesValencyChecker.saturate()

bond order adjustment tools.DeAromatizationTool

calculationtools.SaturationChecker.newSaturate()

tools.SmilesValencyChecker.saturate()

canonicalization graph.invariant.CanonicalLabeler

CAS numberindex.CASNumber

index.CASNumber.isValid()

CDK source code io.CDKSourceCodeWriter

center of massgeometry.GeometryTools.get2DCentreOfMass()

geometry.GeometryTools.get3DCentreOfMass()

charge distributioncharges.InductivePartialCharges

charges.GasteigerMarsiliPartialCharges

charges.GasteigerPEPEPartialCharges

chemical identifierio.INChIPlainTextReader

io.INChIReader

chemical validation validate.ValidationTest

chi chain index qsar.descriptors.molecular.ChiChainDescriptor

chi cluster index qsar.descriptors.molecular.ChiClusterDescriptor

chi path cluster index qsar.descriptors.molecular.ChiPathClusterDescriptor

chi path index qsar.descriptors.molecular.ChiPathDescriptor

26

Page 27: CDK Workshop 2009 Intro Course Material

CIF io.CIFReader

class convertor libio.cml.Convertor

classification qsar.model.R.CNNClassificationModel

CMLio.CMLWriter

io.CMLReader

io.iterator.event.EventCMLReader

libio.cml.Convertor

conformer conformation io.iterator.IteratingMDLConformerReader

connection matrix graph.matrix.ConnectionMatrix

connectivity graph.ConnectivityChecker

coordinate calculationgeometry.AtomTools.add3DCoordinates1()

modeling.builder3d.AtomTetrahedralLigandPlacer3D.add3DCoordinatesForSinglyBondedLigands()

coordinate generationgeometry.AtomTools.calculate3DCoordinatesForLigands()

modeling.builder3d.AtomTetrahedralLigandPlacer3D.get3DCoordinatesForLigands()

3D modeling.builder3d.ModelBuilder3D

CPK renderer.color.CPKAtomColors

creation tools.IDCreator

crystalCrystal

geometry.CrystalGeometryTools

interfaces.ICrystal

DBE tools.manipulator.MolecularFormulaManipulator.getDBE()

descriptorqsar.descriptors.molecular.RuleOfFiveDescriptor

qsar.descriptors.molecular.BCUTDescriptor

qsar.descriptors.molecular.WHIMDescriptor

qsar.descriptors.molecular.ChiChainDescriptor

qsar.descriptors.molecular.GravitationalIndexDescriptor

qsar.descriptors.molecular.XLogPDescriptor

qsar.descriptors.molecular.KappaShapeIndicesDescriptor

qsar.descriptors.molecular.RotatableBondsCountDescriptor

qsar.descriptors.molecular.ChiPathDescriptor

qsar.descriptors.molecular.ChiPathClusterDescriptor

qsar.descriptors.molecular.ChiClusterDescriptor

qsar.descriptors.molecular.ZagrebIndexDescriptor

qsar.descriptors.molecular.CarbonTypesDescriptor

qsar.descriptors.molecular.TPSADescriptor

diagonalization math.Matrix.diagonalize()

dictionarydict.OWLFile

dict.Dictionary

dict.DictionaryDatabase

dict.Entry

dict.OWLReact

27

Page 28: CDK Workshop 2009 Intro Course Material

dict.EntryReact

implicit CDK references dict.CDKDictionaryReferences

double bond equivalent tools.manipulator.MolecularFormulaManipulator.getDBE()

E-state atomtype.EStateAtomTypeMatcher

EAID number graph.invariant.HuLuIndexTool.getEAIDNumber()

electronBond

interfaces.IElectronContainer

interfaces.IBond

unpairedSingleElectron

interfaces.ISingleElectron

electronegativitiespartial equalization of orbital

charges.GasteigerMarsiliPartialCharges

charges.GasteigerPEPEPartialCharges

electronegativitycharges.InductivePartialCharges

charges.PiElectronegativity

charges.Electronegativity

tools.PeriodicTable

elementPeriodicTableElement

Element

config.IsotopeFactory

interfaces.IElement

tools.PeriodicTable

sorting tools.ElementComparator

estateconfig.fragments.EStateFragments

fingerprint.EStateFingerprinter

file formatio.SMILESWriter

io.GamessReader

INChIio.INChIPlainTextReader

io.INChIReader

XYZ io.XYZReader

Polymorph Predictor (tm) io.PMPReader

CMLio.CMLWriter

io.CMLReader

io.iterator.event.EventCMLReader

Mol2 io.Mol2Reader

Z-matrix io.ZMatrixReader

ShelX io.ShelXWriter

HIN io.HINReader

28

Page 29: CDK Workshop 2009 Intro Course Material

MDL SD file io.SDFWriter

ShelXL io.ShelXReader

PDB io.PDBReader

PubChem Compound ASNio.PCSubstanceXMLReader

io.PCCompoundASNReader

CIF io.CIFReader

mmCIF io.CIFReader

MDL RXNio.MDLRXNReader

io.MDLRXNV2000Reader

CDK source code io.CDKSourceCodeWriter

MDL RXN file io.MDLRXNWriter

MDL molfileio.MDLWriter

io.MDLV2000Reader

io.MDLReader

io.iterator.IteratingMDLReader

PubChem Compound XML io.PCCompoundXMLReader

SDFio.MDLV2000Reader

io.MDLReader

io.iterator.IteratingMDLReader

SMILESio.SMILESReader

io.iterator.IteratingSMILESReader

ASNio.iterator.IteratingPCSubstancesXMLReader

io.iterator.IteratingPCCompoundASNReader

io.iterator.IteratingPCCompoundXMLReader

file format SDF io.iterator.IteratingMDLConformerReader

fingerprintfingerprint.FingerprinterTool

fingerprint.MACCSFingerprinter

fingerprint.Fingerprinter

fingerprint.IFingerprinter

fingerprint.SubstructureFingerprinter

fingerprint.EStateFingerprinter

fingerprint.ExtendedFingerprinter

fingerprint.GraphOnlyFingerprinter

force field modeling.forcefield.MMFF94EnergyFunction

fractional coordinatescrystal geometry.CrystalGeometryTools

fragment config.fragments.EStateFragments

Gamess io.GamessReader

Gauss elimination math.Matrix.elimination()

Gaussian (tm)

29

Page 30: CDK Workshop 2009 Intro Course Material

input file io.program.GaussianInputWriter

Gaussian basis set math.qm.GaussiansBasis

generator smiles.SmilesGenerator

geometry modeling.forcefield.GeometricMinimizer

Gram-Schmidt algorithm math.Matrix.orthonormalize()

graphpath ringsearch.Path

graph matrixmolecular graph.matrix.IGraphMatrix

gravitational index qsar.descriptors.molecular.GravitationalIndexDescriptor

HIN io.HINReader

HOSE code tools.BremserOneSphereHOSECodePredictor

spherical atom search tools.HOSECodeGenerator

hydrogenremoval

tools.manipulator.AtomContainerManipulator.removeHydrogens()

tools.manipulator.AtomContainerManipulator.removeHydrogensPreserveMultiplyBonded()

tools.manipulator.AtomContainerManipulator.getHeavyAtoms()

tools.manipulator.MolecularFormulaManipulator.getHeavyElements()

hydrogensadding

tools.CDKHydrogenAdder.addImplicitHydrogens()

tools.manipulator.AtomContainerManipulator.convertImplicitToExplicitHydrogens()

idcreation tools.IDCreator

implicit CDK references dict.CDKDictionaryReferences

INChIio.INChIPlainTextReader

io.INChIReader

input file io.program.GaussianInputWriter

ionization potential qsar.descriptors.molecular.IPMolecularLearningDescriptor

isomorphism isomorphism.IsomorphismTester

isotopeIsotope

config.IsotopeFactory

interfaces.IIsotope

isotope pattern formula.IsotopePatternGenerator

IUPAC name iupac.parser.NomParser

jaccard similarity.Tanimoto

Jacobi algorithm math.Matrix.diagonalize()

join-the-dotsgeometry.BondTools.closeEnoughToBond()

graph.rebond.Bspt

JRI qsar.model.R2.RModel

Kappe shape index qsar.descriptors.molecular.KappaShapeIndicesDescriptor

Layoutlayout.StructureDiagramGenerator

30

Page 31: CDK Workshop 2009 Intro Course Material

layout.TemplateHandler

layout.OverlapResolver

line search modeling.forcefield.LineSearch

linear qsar.model.R.LinearRegressionModel

linear regression qsar.model.R2.LinearRegressionModel

Lipinski qsar.descriptors.molecular.RuleOfFiveDescriptor

lipophilicity qsar.descriptors.molecular.ALOGPDescriptor

log file io.GamessReader

logP qsar.descriptors.molecular.ALOGPDescriptor

lone-pairElectronContainer

LonePair

interfaces.ILonePair

mass interfaces.IIsotope

molecular tools.manipulator.AtomContainerManipulator.getNaturalExactMass()

mass number interfaces.IIsotope

MDL molfileio.MDLWriter

io.MDLV2000Reader

io.MDLReader

io.iterator.IteratingMDLReader

MDL molfile V3000 io.MDLV3000Reader

MDL RXNio.MDLRXNReader

io.MDLRXNV2000Reader

MDL RXN file io.MDLRXNWriter

MDL SD file io.SDFWriter

MDL V3000 io.MDLRXNV3000Reader

MM2 modeling.builder3d.MM2BasedParameterSetReader

mmCIF io.CIFReader

mmff94modeling.builder3d.MMFF94BasedParameterSetReader

modeling.forcefield.MMFF94EnergyFunction

Mol2 io.Mol2Reader

moleculargraph.matrix.IGraphMatrix

tools.manipulator.AtomContainerManipulator.getNaturalExactMass()

molecular formulaformula.MolecularFormula

formula.AdductFormula

formula.MolecularFormulaChecker

formula.MolecularFormulaRange

formula.MolecularFormulaSet

interfaces.IMolecularFormulaSet

interfaces.IAdductFormula

interfaces.IMolecularFormula

molecule

31

Page 32: CDK Workshop 2009 Intro Course Material

Molecule

MoleculeSet

molecular formula formula.MolecularFormulaChecker

moment of inertia qsar.descriptors.molecular.MomentOfInertiaDescriptor

monomerMonomer

interfaces.IMonomer

Morgan number graph.invariant.MorganNumbersTools

Murcko fragments tools.GenerateFragments

neural networkqsar.model.R.CNNClassificationModel

qsar.model.R.CNNRegressionModel

qsar.model.R2.CNNRegressionModel

Newton-Raphson modeling.forcefield.NewtonRaphsonMethod

notional coordinates geometry.CrystalGeometryTools.notionalToCartesian()

number qsar.descriptors.molecular.PetitjeanNumberDescriptor

mass interfaces.IIsotope

atomic interfaces.IElement

orbitalElectronContainer

Association

LonePair

interfaces.ILonePair

orthonormalization math.Matrix.orthonormalize()

output io.GamessReader

parser smiles.SmilesParser

partial atomic chargescharges.InductivePartialCharges

charges.GasteigerMarsiliPartialCharges

charges.GasteigerPEPEPartialCharges

partial charges renderer.color.PartialAtomicChargeColors

partial equalization of orbitalcharges.GasteigerMarsiliPartialCharges

charges.GasteigerPEPEPartialCharges

partial least squares qsar.model.R.PLSRegressionModel

path ringsearch.Path

PDB io.PDBReader

pdbpolymerinterfaces.IPDBMonomer

interfaces.IPDBAtom

interfaces.IPDBStructure

interfaces.IPDBPolymer

PEOE charges.GasteigerMarsiliPartialCharges

PEPE charges.GasteigerPEPEPartialCharges

periodic table tools.PeriodicTable

permutationgraph.AtomContainerPermutor

32

Page 33: CDK Workshop 2009 Intro Course Material

graph.AtomContainerBondPermutor

graph.AtomContainerAtomPermutor

Petit-Jeanshape index qsar.descriptors.molecular.PetitjeanShapeIndexDescriptor

number qsar.descriptors.molecular.PetitjeanNumberDescriptor

pharmacophorepharmacophore.PharmacophoreQueryAtom

pharmacophore.PharmacophoreBond

pharmacophore.PharmacophoreUtils

pharmacophore.PharmacophoreMatcher

pharmacophore.PharmacophoreAtom

pharmacophore.PharmacophoreAngleBond

pharmacophore.PharmacophoreQueryAngleBond

pharmacophore.PharmacophoreQueryBond

physical properties PhysicalConstants

PLS qsar.model.R.PLSRegressionModel

pocket protein.ProteinPocketFinder

polarizability charges.Polarizability

atomic qsar.descriptors.molecular.APolDescriptor

polymerBioPolymer

Polymer

interfaces.IBioPolymer

interfaces.IPolymer

interfaces.IPDBPolymer

protein.data.PDBStrand

protein.data.PDBPolymer

Polymorph Predictor (tm) io.PMPReader

prime numbers math.Primes

projection in 2D geometry.Projector

protein protein.ProteinPocketFinder

PubChemio.iterator.IteratingPCSubstancesXMLReader

io.iterator.IteratingPCCompoundASNReader

io.iterator.IteratingPCCompoundXMLReader

PubChem Compound ASNio.PCSubstanceXMLReader

io.PCCompoundASNReader

PubChem Compound XML io.PCCompoundXMLReader

Rqsar.model.R2.CNNRegressionModel

qsar.model.R2.RModel

qsar.model.R2.LinearRegressionModel

radial distribution function geometry.RDFCalculator

radicalSingleElectron

interfaces.ISingleElectron

33

Page 34: CDK Workshop 2009 Intro Course Material

radiusvanderwaals tools.PeriodicTable

RDF geometry.RDFCalculator

reactionReactionSet

ChemSequence

Reaction

ReactionScheme

MoleculeSet

interfaces.IReaction

interfaces.IChemSequence

interfaces.IReactionSet

atom mapping Mapping

rebondinggraph.rebond.RebondTool

graph.rebond.Bspt

recalculation graph.rebond.RebondTool

refractivity qsar.descriptors.molecular.ALOGPDescriptor

regressionqsar.model.R.CNNRegressionModel

qsar.model.R.PLSRegressionModel

linear qsar.model.R.LinearRegressionModel

removaltools.manipulator.AtomContainerManipulator.removeHydrogens()

tools.manipulator.AtomContainerManipulator.removeHydrogensPreserveMultiplyBonded()

tools.manipulator.AtomContainerManipulator.getHeavyAtoms()

tools.manipulator.MolecularFormulaManipulator.getHeavyElements()

ringRing

interfaces.IRing

set ofRingSet

interfaces.IRingSet

ring finding graph.SpanningTree

ring searchringsearch.FiguerasSSSRFinder

ringsearch.SSSRFinder

ringsearch.cyclebasis.SimpleCycle

rotatable qsar.descriptors.molecular.RotatableBondsCountDescriptor

RSS io.RssWriter

rule-of-five qsar.descriptors.molecular.RuleOfFiveDescriptor

saturationtools.LonePairElectronChecker

tools.SaturationChecker

SDFio.MDLV2000Reader

io.MDLReader

34

Page 35: CDK Workshop 2009 Intro Course Material

io.iterator.IteratingMDLReader

set ofRingSet

interfaces.IRingSet

shape index qsar.descriptors.molecular.PetitjeanShapeIndexDescriptor

ShelX io.ShelXWriter

ShelXL io.ShelXReader

similarityfingerprint.MACCSFingerprinter

fingerprint.Fingerprinter

fingerprint.SubstructureFingerprinter

fingerprint.EStateFingerprinter

fingerprint.ExtendedFingerprinter

fingerprint.GraphOnlyFingerprinter

3D similarity.DistanceMoment

tanimoto similarity.Tanimoto

smallest-set-of-ringsringsearch.FiguerasSSSRFinder

ringsearch.SSSRFinder

ringsearch.cyclebasis.SimpleCycle

SMARTSisomorphism.matchers.smarts.RingMembershipAtom

isomorphism.matchers.smarts.HybridizationNumberAtom

isomorphism.matchers.smarts.AromaticOrSingleQueryBond

isomorphism.matchers.smarts.TotalValencyAtom

isomorphism.matchers.smarts.ChiralityAtom

isomorphism.matchers.smarts.SMARTSBond

isomorphism.matchers.smarts.AnyAtom

isomorphism.matchers.smarts.ExplicitConnectionAtom

isomorphism.matchers.smarts.AromaticAtom

isomorphism.matchers.smarts.AtomicNumberAtom

isomorphism.matchers.smarts.AnyOrderQueryBond

isomorphism.matchers.smarts.OrderQueryBond

isomorphism.matchers.smarts.RingIdentifierAtom

isomorphism.matchers.smarts.RingBond

isomorphism.matchers.smarts.TotalConnectionAtom

isomorphism.matchers.smarts.FormalChargeAtom

isomorphism.matchers.smarts.DegreeAtom

isomorphism.matchers.smarts.RecursiveSmartsAtom

isomorphism.matchers.smarts.AliphaticAtom

isomorphism.matchers.smarts.AromaticQueryBond

isomorphism.matchers.smarts.PeriodicGroupNumberAtom

isomorphism.matchers.smarts.NonCHHeavyAtom

isomorphism.matchers.smarts.TotalRingConnectionAtom

isomorphism.matchers.smarts.TotalHCountAtom

isomorphism.matchers.smarts.LogicalOperatorAtom

isomorphism.matchers.smarts.AromaticSymbolAtom

35

Page 36: CDK Workshop 2009 Intro Course Material

isomorphism.matchers.smarts.SmallestRingAtom

isomorphism.matchers.smarts.MassAtom

isomorphism.matchers.smarts.LogicalOperatorBond

isomorphism.matchers.smarts.AliphaticSymbolAtom

isomorphism.matchers.smarts.ImplicitHCountAtom

isomorphism.matchers.smarts.HydrogenAtom

isomorphism.matchers.smarts.SMARTSAtom

isomorphism.matchers.smarts.ConnectionCountAtom

isomorphism.matchers.smarts.StereoBond

isomorphism.matchers.smarts.RingAtom

smiles.smarts.SMARTSQueryTool

smiles.smarts.parser.SMARTSParser

smiles.smarts.parser.ASTOrExpression

smiles.smarts.parser.ASTElement

smiles.smarts.parser.ASTChirality

smiles.smarts.parser.ASTPrimitiveAtomExpression

smiles.smarts.parser.ASTAtomicMass

smiles.smarts.parser.ASTNotBond

smiles.smarts.parser.ASTCharge

smiles.smarts.parser.ASTLowAndBond

smiles.smarts.parser.ASTAtom

smiles.smarts.parser.ASTLowAndExpression

SMARTS ASTsmiles.smarts.parser.ASTPeriodicGroupNumber

smiles.smarts.parser.ASTExplicitHighAndExpression

smiles.smarts.parser.ASTAtomicNumber

smiles.smarts.parser.ASTStart

smiles.smarts.parser.ASTOrBond

smiles.smarts.parser.ASTSmarts

smiles.smarts.parser.ASTImplicitHCount

smiles.smarts.parser.SimpleNode

smiles.smarts.parser.ASTAliphatic

smiles.smarts.parser.ASTHybrdizationNumber

smiles.smarts.parser.Node

smiles.smarts.parser.ASTSmallestRingSize

smiles.smarts.parser.ASTRingIdentifier

smiles.smarts.parser.ASTExplicitAtom

smiles.smarts.parser.ASTRingConnectivity

smiles.smarts.parser.ASTAromatic

smiles.smarts.parser.ASTReaction

smiles.smarts.parser.ASTTotalHCount

smiles.smarts.parser.ASTExplicitConnectivity

smiles.smarts.parser.ASTTotalConnectivity

smiles.smarts.parser.ASTSimpleBond

smiles.smarts.parser.ASTValence

smiles.smarts.parser.ASTImplicitHighAndExpression

smiles.smarts.parser.ASTGroup

36

Page 37: CDK Workshop 2009 Intro Course Material

smiles.smarts.parser.ASTImplicitHighAndBond

smiles.smarts.parser.ASTRecursiveSmartsExpression

smiles.smarts.parser.ASTNotExpression

smiles.smarts.parser.ASTRingMembership

smiles.smarts.parser.ASTAnyAtom

smiles.smarts.parser.ASTNonCHHeavyAtom

smiles.smarts.parser.ASTExplicitHighAndBond

smiles.smarts.parser.visitor.Smarts2MQLVisitor

smiles.smarts.parser.visitor.SmartsQueryVisitor

smiles.smarts.parser.visitor.SmartsDumpVisitor

SMILESio.SMILESReader

io.iterator.IteratingSMILESReader

generator smiles.SmilesGenerator

parser smiles.SmilesParser

sorting tools.ElementComparator

spanning tree graph.SpanningTree

spherical atom search tools.HOSECodeGenerator

stabilization charge charges.StabilizationCharges

stack io.cml.CMLStack

steepest descent modeling.forcefield.SteepestDescentsMethod

stereochemistryCDKConstants

AtomParity

interfaces.IAtomParity

structure diagram generation layout.TemplateHandler

Structure Diagram Generation (SDG) layout.StructureDiagramGenerator

structure generatorstructgen.RandomGenerator

structgen.VicinitySampler

stuctures templates.AminoAcids

substructure searchfingerprint.FingerprinterTool.isSubset()

smiles.smarts.SMARTSQueryTool

smiles.smarts.parser.SMARTSParser

Sybyl atomtype.SybylAtomTypeMatcher

tanimoto similarity.Tanimoto

templatestemplates.AminoAcids

templates.MoleculeFactory

templates.saturatedhydrocarbons.IsoAlkanes

topological bond order ctypes qsar.descriptors.molecular.CarbonTypesDescriptor

total polar surface area qsar.descriptors.molecular.TPSADescriptor

TPSA qsar.descriptors.molecular.TPSADescriptor

typeAtomType

config.TXTBasedAtomTypeConfigurator

37

Page 38: CDK Workshop 2009 Intro Course Material

config.AtomTypeFactory

interfaces.IAtomType

unpairedSingleElectron

interfaces.ISingleElectron

valencytools.LonePairElectronChecker

tools.SaturationChecker

tools.SmilesValencyChecker

validation index.CASNumber.isValid()

vanderwaals tools.PeriodicTable

WHIM qsar.descriptors.molecular.WHIMDescriptor

Wiener number qsar.descriptors.molecular.WienerNumbersDescriptor

XLogP qsar.descriptors.molecular.XLogPDescriptor

XYZ io.XYZReader

Z Matrix geometry.ZMatrixTools

Z-matrix io.ZMatrixReader

Zagreb index qsar.descriptors.molecular.ZagrebIndexDescriptor

38