databases and identification - la nuvola tv · brama training for technicians – module i, rome...

35
BRAMA training for technicians – Module I, Rome SESSION III – DATABASES Databases and IdentificationProf. Jacques Vervoort

Upload: vananh

Post on 12-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

BRAMA training for technicians – Module I, Rome

SESSION III – DATABASES

“ Databases and Identification”

Prof. Jacques Vervoort

BRAMA training for technicians – Module I, Rome 2

To be a master of spectra you need to be a master of structures in the first place.

(nist_msms) Vincristine260 310 360 410 460 510 560 610 660 710 760 810

0

50

100

265 353 395 455 513 538604

636

676

705

723

747

765

807

NHO

O

NOH

HOON

OO

N

O

O

O

� Complex MS data interpretations only possible with software� MS data obtained by hyphenated techniques (GC-MS, LC-MS)� Mass spectral database search and structure search routinely are used� Mass spectrometers deliver multidimensional data

For more information see http://fiehnlab.ucdavis.edu

BRAMA training for technicians – Module I, Rome 3Try Marvin Space via Webstart

Be prepared – visualize your structures

BRAMA training for technicians – Module I, Rome 4

Organic Chemistry Reminder

Molecular Formula

C3H7F

(mainlib) Propane, 2-fluoro-10 20 30 40 50 60 70

0

50

100

13 1927

3341

47

59

61

F

BRAMA training for technicians – Module I, Rome 5

Be prepared - StereoIsomersHow many stereoisomers can you expect from glucose (KEGG)?

Example calculated with MarvinView (via JAVA Webstart)

O

HO

HO

OH

OH

OH

Glucose

BRAMA training for technicians – Module I, Rome 6

Be prepared – Tautomers

How many tautomers can you expect?Important for mass spectral interpretations.

H3C O

O

CH3

Methyl acetate

Example calculated with MarvinView Start via WebStart

BRAMA training for technicians – Module I, Rome 7

Be prepared – Resonance (electron shifts)

What are possible resonant structures?Important for mass spectral interpretation (electron impact, electrospray)

OH

Phenol

Example calculated with MarvinView Start via WebStart

BRAMA training for technicians – Module I, Rome 8

Structure search – know what could be possible

How many compounds (isomer structures) are found in public databases?

http://www.chemspider.com/

BRAMA training for technicians – Module I, Rome 9

Chemical Structure Handling

Most common structure formats you need to know:

SMILES/SMARTS - Simplified Molecular Input Line Entry SpecificationSDF/MOL - Structure Data File InChI/InChIkey - IUPAC International Chemical Identifier PDB - Protein Data Bank CML - Chemical Markup Language

Some problems:

• Data format needs to be based on Open Standard (problem with SMILES, ok with CML)• Stereo and aromatic bond information needs to be saved (ok with SDF)• Format needs to be small in space for millions of compounds (ok with SMILES)• SMILES notation needs to be unique (problem with SMILES)• Structure representation should be portable and based on Open Standard (ok with CML)

O

HO

O

CH3

CH3

H3C

CH3

CH3

H3C

H3C

Moronic Acid - CID: 489941

BRAMA training for technicians – Module I, Rome 10

Chemical Structure Identifiers

Structure Identifiers are needed for uniquely identifying structuresImportant for searching chemical structures in text and databases

Structure Name – IUPAC name or common name

CAS RN – Chemical Abstracts identifier

PubChem ID – PubChem Compound ID

InChIKey – Short representation of InChI

InChI – IUPAC International Chemical Identifier

H3C

N

N

O

N

ON

CH3

CH3

1,3,7-trimethylpurine-2,6-dione

58-08-2

CID: 2519

InChiKey=RYYVLZVUVIJVGH-UHFFFAOYAW

InChI=1/C8H10N4O2/c1-10-4-9-6-

5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3

BRAMA training for technicians – Module I, Rome 11

SMILES structure format

Positive: Good for storing structures in single lineFast text based search possible; human readable

Negative: Many different SMILES codes existSMILES for same structure can be different (canonical or unique SMILES needed)

C

CC

CCC

CCCC

CCCCO

CCCCN

All those SMILES codes represent caffeine

[c]1([n+]([CH3])[c]([c]2([c]([n+]1[CH3])[n][cH][n+]2[CH3]))[O-])[O-] CN1C(=O)N(C)C(=O)C(N(C)C=N2)=C12 Cn1cnc2n(C)c(=O)n(C)c(=O)c12 Cn1cnc2c1c(=O)n(C)c(=O)n2C N1(C)C(=O)N(C)C2=C(C1=O)N(C)C=N2 O=C1C2=C(N=CN2C)N(C(=O)N1C)C CN1C=NC2=C1C(=O)N(C)C(=O)N2C

H3C

N

HC

N

O

NCH3

ON

CH3

InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3

Caffeine SMILES Source InChiI FAQ

BRAMA training for technicians – Module I, Rome 12

SDF/MOL structure format

Positive: established standard format; good for storing structures safelycan store 3D structure; can store metadata (boiling points, toxicity, mass spectra)

Negative: large file size, need compression

OpenBabel02240823422D

1 0 0 0 0 0 0 0 0 0999 V20000.0000 0.0000 0.0000 C 0 0 0 0 0

M END$$$$

OpenBabel02240823422D

2 1 0 0 0 0 0 0 0 0999 V20000.0000 0.0000 0.0000 C 0 0 0 0 00.0000 0.0000 0.0000 C 0 0 0 0 0

1 2 1 0 0 0M END$$$$

OpenBabel02240823422D

3 2 0 0 0 0 0 0 0 0999 V20000.0000 0.0000 0.0000 C 0 0 0 0 00.0000 0.0000 0.0000 C 0 0 0 0 00.0000 0.0000 0.0000 C 0 0 0 0 0

1 2 1 0 0 02 3 1 0 0 0

M END$$$$

Creator

Coordinates for 3D

Connection of atoms

BRAMA training for technicians – Module I, Rome 13

Molecules and mass spectra

Close relationship between molecular structure and mass spectra

Molecular structure is reflected in mass spectral features (peaks, peak heights and peak combinations)

Mass spectra reflect a state of gas phase ion physics and chemistry(rearrangements, fragmentations, bond cleavages)

(mainlib) tert-Butylaminotrimethylsilane20 40 60 80 100 120 140 160

0

50

100

2945 58

73

84 100114

130

145

SiNH

(mainlib) N,N-Diethyl-1,1,1-trimethylsilylamine20 40 60 80 100 120 140 160

0

50

100

2945

59

73

86 100 114

130

145

Si N

(replib) Silanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(trimethylsilyl)oxy]ethyl]-, [S-(R*,R*)]-20 40 60 80 100 120 140 160

0

50

100

4659

73

91 105

130

147160

O

N

Si

Si

Electron impact (70 eV) mass spectra; Source: NIST05

BRAMA training for technicians – Module I, Rome 14

Molecules and mass spectra

Similar structures may or may have not similar mass spectra

Silanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(trimethylsilyl)oxy]ethyl]-, [S-(R*,R*)]-N-Methylphenylethanolamine, bis(trimethylsilyl)-40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320

0

50

100

50

100

44

47 59

5965

73

73

91

91 102

105 114

116

130

132

147

147

163

163

179

179

188 204206

220

280

294

O

N

Si

Si

O

N

Si

Si

Electron impact (70 eV) mass spectra; Source: NIST05; Created using structure similarity search in NIST MS Search program

BRAMA training for technicians – Module I, Rome 15

Molecules and mass spectra

Similar mass spectra may or may have not similar structures

1-Tetradecene Cyclotetradecane10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210

0

50

100

50

100

15

27

27

29

29

32

41

4355

55

65

69

70 83

83

97

97

111

111

125

125139

140

153

154 168

168

196

196

Electron impact (70 eV) mass spectra; Source: NIST05; Created using spectral similarity search in NIST MS Search program

BRAMA training for technicians – Module I, Rome 16

Mass spectral databases I

Name Spectra count Type

NIST05 200,000 electron impact spectra (EI 70 eV)Wiley 8 400,000 electron impact spectra (EI 70 eV)Palisade 600K 600,000 electron impact spectra (EI 70 eV)

NIST MS/MS 5,200 MS/MS (ESI, +/-, 30-100V CID)MassFrontier 7,000 MSn, ESI, (Spectral Tree Library )

Important is data qualityAnnotation with CAS and Structure and FormulaLink to literature or publication usefulCurrently no large ESI,APPI,APCI libraries available (free or commercial)

BRAMA training for technicians – Module I, Rome 17

Mass spectral databases II

Smaller specialized libraries

Pfleger Maurer Weber (Drugs) MS+RI, 70eVMassFinder (Volatiles) MS+RI, 70eVRIZA DB (Toxicants) MS+RI, 70eVGolm DB (primary Metabolites) MS+RI, 70eVFiehnlib (primary Metabolites) MS+RI, 70eVMassBank (Metabolites) ESI, MSn , accurate massesAAFS (Drugs, Forensic,Toxicology), MS+RI, 70eVChemicalSoft (Drugs), MS/MS, MSE

_____________________________________________________________

In case of electron impact (EI) same GC-Column (DB-5, RTX-5, DB-1, OV-1)and temperature program must be used for matching retention indices

In case of ESI, APPI spectra (LC-MS) same mass spectrometer design and setup should be used (triple-quad, ion-trap, TOF, Q-TOF), collision energy

(riza_web) |RI|2583|KEY|1596|CAS|2385-85-5|FRML|Empty|CMPD|Mirex|230 250 270 290 310 330 350 370 390 410 430 450

0

50

100

237

272

332404

Cl

Cl

Cl

Cl

Cl

Cl

Cl

Cl

Cl

Cl Cl

Cl

BRAMA training for technicians – Module I, Rome 18

Searching Molecules on PubChem

Goto PubChem Structure Search

18 million compound DB (++)

BRAMA training for technicians – Module I, Rome 19

CAS SciFinder

• 33 million molecules and 60 million peptides/proteins

• largest reaction DB (14 million reactions) and literature DB

• substructure and similarity search of structures

• a must for chemists and biochemists/biologists

• no bulk download, no good Import/ Export, no Link outs

BRAMA training for technicians – Module I, Rome 20

Structure search in SciFinder

Retrieved 4000 papers

(refine search only MS and MALDI)

BRAMA training for technicians – Module I, Rome 21

Atomic Mass

Hexachlorobenzene (C6Cl6)

average mass - 284.7804 u

integer mass - 282.0 u

monoisotopic mass - 281.81312 u

Correct unit is [u] – unified atomic mass unit or [Da] Dalton see SI units1 u = 1 Da = 1/12th of mass of carbon 12C = 1.66053886 x 10-27 kg

C6Cl6: C6 Cl6 p(gss, s/p:40) Chrg 0R: 1000 Res.Pwr...

282 284 286 288 290 292 294 296m/z

0

10

20

30

40

50

60

70

80

90

100

Rel

ativ

e A

bund

ance

283.81

285.81

281.81

287.80

289.80284.81

286.81

282.82

288.81 291.80 292.80 294.80 295.80

Always (always) check molecular masses obtained from databases or publications.For mass spectrometry the monoisotopic mass is used.

Cl

Cl

Cl

Cl

Cl

Cl

InChIKey:CKAPSXZOOQJIBF-UHFFFAOYAV

BRAMA training for technicians – Module I, Rome 22

Mass Accuracy

50-200 ppmLinear IonTrap

3 - 5 ppmTriple Quad

3 - 5 ppmQ-TOF

3 - 5 ppmTOF-MS

1 - 2 ppmMagnetic Sector

0.5 - 1 ppmOrbitrap

0.1 - 1 ppmFT-ICR-MS

Mass AccuracyType

(10 ppm in Ultra-Zoom)

Instruments must be calibrated to obtain high mass accuracy.In case of FT-ICR-MS mass calibration can be stable over weeks.Post- mass calibration can be performed if calibrant was run with samples.Mass of electron becomes important at around 500 Da.

6E1)m

m-m(ppm

exp

calcexp +∗=

m(e-) = 0.00054858026 u = mass of electron

m(1H) = 1.0078246 u = mass of proton

BRAMA training for technicians – Module I, Rome 23

Resolving Power

High resolving power is helpful forseparation of species with almostsame mass (isobars).

High resolving power can not beused to distinguish betweenstructural isomers.

Example:C8H10N2O has 100,082,479 isomers.

RP = 1700

RP = 48,250

Example Solanine (CID=30185)

BRAMA training for technicians – Module I, Rome 24

Isotopic Pattern Generators

Elements can be a) monoisotopic (F, Na, P, I)b) polyisotopic (H, C, N, O, S, Cl, Br)

Isotopic pattern generators generate the isotopic abundances for a given mass value.Calculation is very time-consuming and based on Fast Fourier algorithms.

BRAMA training for technicians – Module I, Rome 25

Charge states

CID: 3081765MW = 1125.50082C50H72N13O15P

charge state 2

charge state 1

BRAMA training for technicians – Module I, Rome 26

Different charge states and peak resolutions

562 563 564 565 566 567 568m/z

0

10

20

30

40

50

60

70

80

90

1000

10

20

30

40

50

60

70

80

90

100562.75

563.25

563.76

564.26564.76 565.77 566.78 567.78

562.75

563.25

563.75

564.25564.76 565.76 566.76 567.26

C 50 H72 N13 O 15 P: C 50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 2R: 2000 Res .Pwr . @FWHM

C 50 H72 N13 O 15 P: C 50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 2R: 200000 Res .Pwr . @FWHM

2000 Resolving PowerCharge state 2

200,000 Resolving PowerCharge state 2

1125 1130 1135m/z

0

10

20

30

40

50

60

70

80

90

1000

10

20

30

40

50

60

70

80

90

1001125.50

1126.51

1127.52

1128.521130.54 1132.55 1134.55

1125.50

1126.50

1127.51

1128.511129.51 1131.52 1133.52 1135.53

C50 H72 N13 O 15 P: C50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 1R: 2000 Res .Pwr . @FWHM

C50 H72 N13 O 15 P: C50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 1R: 200000 Res .Pwr . @FWHM

2000 Resolving PowerCharge state 1

200,000 Resolving PowerCharge state 1

1125 1130 1135m/z

0

10

20

30

40

50

60

70

80

90

1000

10

20

30

40

50

60

70

80

90

100

1125 1130 1135m/z

0

10

20

30

40

50

60

70

80

90

1000

10

20

30

40

50

60

70

80

90

1001125.50

1126.51

1127.52

1128.521130.54 1132.55 1134.55

1125.50

1126.50

1127.51

1128.511129.51 1131.52 1133.52 1135.53

C50 H72 N13 O 15 P: C50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 1R: 2000 Res .Pwr . @FWHM

C50 H72 N13 O 15 P: C50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 1R: 200000 Res .Pwr . @FWHM

2000 Resolving PowerCharge state 1

200,000 Resolving PowerCharge state 1

Example of Phosphorylated Angiotensin isotopic pattern without adduct [M+H]+ simulated by Thermo XCalibur

0.51.0

BRAMA training for technicians – Module I, Rome 27

Molecular Formula Generators

Formula generators are used to create molecular formulae from accurate masses.Input requires 1) accurate isotopic mass (with or without adduct) and

2) error in ppm or mDa (milli Dalton)

Accurate mass

Mass error

Example MWTWIN

BRAMA training for technicians – Module I, Rome 28

The molecular formula space of small molecules

calculated by the Seven Golden Rules

Each molecular formula can expand to billions of structural isomers.Molecular Formula ≠ Molecular Isomerhttp://fiehnlab.ucdavis.edu/projects/Seven_Golden_Rules/

BRAMA training for technicians – Module I, Rome 29

Frequency distribution of molecular formulas

BRAMA training for technicians – Module I, Rome 30

Impact of mass accuracy on number of formulas

BRAMA training for technicians – Module I, Rome 31

Mass accuracy and isotopic pattern

Example:ESI-MS (+) of Solanine on a LTQResolving Power: 1700Mass Accuracy: 46 ppm

Isotopic Abundance Error: ±1.46%

C45H73NO15

MW = 867.49799

[M+H]+

BRAMA training for technicians – Module I, Rome 32

Isotopic abundances as orthogonal filter

BRAMA training for technicians – Module I, Rome 33

Tasks

( 1)

Calculate the number of isomers for C12H12

(2)

Generate the isotopic pattern for Chlorophyll a and Hexachlorobenzene.

(3) Find the molecular formula for the mass spectrum of the next page. http://www.ch.ic.ac.uk/java/applets/FormToM.htmlUse H=24, C=24, O=8 and others=4 in the settings. Include S !

Use the isotope generator to check which of the possible formula’s is the best to fit the pattern observed.

(4) Find the possible molecule(s) in SciFinder and in the National Library of Medicine.Which one is the most likely?http://chem.sis.nlm.nih.gov/chemidplus/ (note: use the formula with hyphen and use the letters alphabetically)https://scifinder.cas.org/scifinder/login.jsf

BRAMA training for technicians – Module I, Rome 34

Int=100

Int=20

Int=9Int=2

Int=2

BRAMA training for technicians – Module I, Rome 35

Webapplications

• Isotope calculator: • http://yanjunhua.tripod.com/pattern.htm• Mass to Formula and Formula to Mass:

http://www.ch.ic.ac.uk/java/applets/FormToM.html• Tutorial GC-MS:• http://eu.shimadzu.de/products/chromato/gcms/TutorialGCMS/default.aspx

• Databases:• Dictionary of Natural Products (there is a limited access because of lack of license)• http://dnp.chemnetbase.com/dictionary-search.do?method=view&id=2885722&si• Chemical lookup service:• http://cactus.nci.nih.gov/• SciFinder:• This needs to be activated through the university library link.

• Good website for Mass spectrometry background:• “The expanding role of Mass spectrometry in Biotechnology”• http://masspec.scripps.edu/book_toc.php