databases and identification - la nuvola tv · brama training for technicians – module i, rome...
TRANSCRIPT
BRAMA training for technicians – Module I, Rome
SESSION III – DATABASES
“ Databases and Identification”
Prof. Jacques Vervoort
BRAMA training for technicians – Module I, Rome 2
To be a master of spectra you need to be a master of structures in the first place.
(nist_msms) Vincristine260 310 360 410 460 510 560 610 660 710 760 810
0
50
100
265 353 395 455 513 538604
636
676
705
723
747
765
807
NHO
O
NOH
HOON
OO
N
O
O
O
� Complex MS data interpretations only possible with software� MS data obtained by hyphenated techniques (GC-MS, LC-MS)� Mass spectral database search and structure search routinely are used� Mass spectrometers deliver multidimensional data
For more information see http://fiehnlab.ucdavis.edu
BRAMA training for technicians – Module I, Rome 3Try Marvin Space via Webstart
Be prepared – visualize your structures
BRAMA training for technicians – Module I, Rome 4
Organic Chemistry Reminder
Molecular Formula
C3H7F
(mainlib) Propane, 2-fluoro-10 20 30 40 50 60 70
0
50
100
13 1927
3341
47
59
61
F
BRAMA training for technicians – Module I, Rome 5
Be prepared - StereoIsomersHow many stereoisomers can you expect from glucose (KEGG)?
Example calculated with MarvinView (via JAVA Webstart)
O
HO
HO
OH
OH
OH
Glucose
BRAMA training for technicians – Module I, Rome 6
Be prepared – Tautomers
How many tautomers can you expect?Important for mass spectral interpretations.
H3C O
O
CH3
Methyl acetate
Example calculated with MarvinView Start via WebStart
BRAMA training for technicians – Module I, Rome 7
Be prepared – Resonance (electron shifts)
What are possible resonant structures?Important for mass spectral interpretation (electron impact, electrospray)
OH
Phenol
Example calculated with MarvinView Start via WebStart
BRAMA training for technicians – Module I, Rome 8
Structure search – know what could be possible
How many compounds (isomer structures) are found in public databases?
http://www.chemspider.com/
BRAMA training for technicians – Module I, Rome 9
Chemical Structure Handling
Most common structure formats you need to know:
SMILES/SMARTS - Simplified Molecular Input Line Entry SpecificationSDF/MOL - Structure Data File InChI/InChIkey - IUPAC International Chemical Identifier PDB - Protein Data Bank CML - Chemical Markup Language
Some problems:
• Data format needs to be based on Open Standard (problem with SMILES, ok with CML)• Stereo and aromatic bond information needs to be saved (ok with SDF)• Format needs to be small in space for millions of compounds (ok with SMILES)• SMILES notation needs to be unique (problem with SMILES)• Structure representation should be portable and based on Open Standard (ok with CML)
O
HO
O
CH3
CH3
H3C
CH3
CH3
H3C
H3C
Moronic Acid - CID: 489941
BRAMA training for technicians – Module I, Rome 10
Chemical Structure Identifiers
Structure Identifiers are needed for uniquely identifying structuresImportant for searching chemical structures in text and databases
Structure Name – IUPAC name or common name
CAS RN – Chemical Abstracts identifier
PubChem ID – PubChem Compound ID
InChIKey – Short representation of InChI
InChI – IUPAC International Chemical Identifier
H3C
N
N
O
N
ON
CH3
CH3
1,3,7-trimethylpurine-2,6-dione
58-08-2
CID: 2519
InChiKey=RYYVLZVUVIJVGH-UHFFFAOYAW
InChI=1/C8H10N4O2/c1-10-4-9-6-
5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
BRAMA training for technicians – Module I, Rome 11
SMILES structure format
Positive: Good for storing structures in single lineFast text based search possible; human readable
Negative: Many different SMILES codes existSMILES for same structure can be different (canonical or unique SMILES needed)
C
CC
CCC
CCCC
CCCCO
CCCCN
All those SMILES codes represent caffeine
[c]1([n+]([CH3])[c]([c]2([c]([n+]1[CH3])[n][cH][n+]2[CH3]))[O-])[O-] CN1C(=O)N(C)C(=O)C(N(C)C=N2)=C12 Cn1cnc2n(C)c(=O)n(C)c(=O)c12 Cn1cnc2c1c(=O)n(C)c(=O)n2C N1(C)C(=O)N(C)C2=C(C1=O)N(C)C=N2 O=C1C2=C(N=CN2C)N(C(=O)N1C)C CN1C=NC2=C1C(=O)N(C)C(=O)N2C
H3C
N
HC
N
O
NCH3
ON
CH3
InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
Caffeine SMILES Source InChiI FAQ
BRAMA training for technicians – Module I, Rome 12
SDF/MOL structure format
Positive: established standard format; good for storing structures safelycan store 3D structure; can store metadata (boiling points, toxicity, mass spectra)
Negative: large file size, need compression
OpenBabel02240823422D
1 0 0 0 0 0 0 0 0 0999 V20000.0000 0.0000 0.0000 C 0 0 0 0 0
M END$$$$
OpenBabel02240823422D
2 1 0 0 0 0 0 0 0 0999 V20000.0000 0.0000 0.0000 C 0 0 0 0 00.0000 0.0000 0.0000 C 0 0 0 0 0
1 2 1 0 0 0M END$$$$
OpenBabel02240823422D
3 2 0 0 0 0 0 0 0 0999 V20000.0000 0.0000 0.0000 C 0 0 0 0 00.0000 0.0000 0.0000 C 0 0 0 0 00.0000 0.0000 0.0000 C 0 0 0 0 0
1 2 1 0 0 02 3 1 0 0 0
M END$$$$
Creator
Coordinates for 3D
Connection of atoms
BRAMA training for technicians – Module I, Rome 13
Molecules and mass spectra
Close relationship between molecular structure and mass spectra
Molecular structure is reflected in mass spectral features (peaks, peak heights and peak combinations)
Mass spectra reflect a state of gas phase ion physics and chemistry(rearrangements, fragmentations, bond cleavages)
(mainlib) tert-Butylaminotrimethylsilane20 40 60 80 100 120 140 160
0
50
100
2945 58
73
84 100114
130
145
SiNH
(mainlib) N,N-Diethyl-1,1,1-trimethylsilylamine20 40 60 80 100 120 140 160
0
50
100
2945
59
73
86 100 114
130
145
Si N
(replib) Silanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(trimethylsilyl)oxy]ethyl]-, [S-(R*,R*)]-20 40 60 80 100 120 140 160
0
50
100
4659
73
91 105
130
147160
O
N
Si
Si
Electron impact (70 eV) mass spectra; Source: NIST05
BRAMA training for technicians – Module I, Rome 14
Molecules and mass spectra
Similar structures may or may have not similar mass spectra
Silanamine, N,1,1,1-tetramethyl-N-[1-methyl-2-phenyl-2-[(trimethylsilyl)oxy]ethyl]-, [S-(R*,R*)]-N-Methylphenylethanolamine, bis(trimethylsilyl)-40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320
0
50
100
50
100
44
47 59
5965
73
73
91
91 102
105 114
116
130
132
147
147
163
163
179
179
188 204206
220
280
294
O
N
Si
Si
O
N
Si
Si
Electron impact (70 eV) mass spectra; Source: NIST05; Created using structure similarity search in NIST MS Search program
BRAMA training for technicians – Module I, Rome 15
Molecules and mass spectra
Similar mass spectra may or may have not similar structures
1-Tetradecene Cyclotetradecane10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210
0
50
100
50
100
15
27
27
29
29
32
41
4355
55
65
69
70 83
83
97
97
111
111
125
125139
140
153
154 168
168
196
196
Electron impact (70 eV) mass spectra; Source: NIST05; Created using spectral similarity search in NIST MS Search program
BRAMA training for technicians – Module I, Rome 16
Mass spectral databases I
Name Spectra count Type
NIST05 200,000 electron impact spectra (EI 70 eV)Wiley 8 400,000 electron impact spectra (EI 70 eV)Palisade 600K 600,000 electron impact spectra (EI 70 eV)
NIST MS/MS 5,200 MS/MS (ESI, +/-, 30-100V CID)MassFrontier 7,000 MSn, ESI, (Spectral Tree Library )
Important is data qualityAnnotation with CAS and Structure and FormulaLink to literature or publication usefulCurrently no large ESI,APPI,APCI libraries available (free or commercial)
BRAMA training for technicians – Module I, Rome 17
Mass spectral databases II
Smaller specialized libraries
Pfleger Maurer Weber (Drugs) MS+RI, 70eVMassFinder (Volatiles) MS+RI, 70eVRIZA DB (Toxicants) MS+RI, 70eVGolm DB (primary Metabolites) MS+RI, 70eVFiehnlib (primary Metabolites) MS+RI, 70eVMassBank (Metabolites) ESI, MSn , accurate massesAAFS (Drugs, Forensic,Toxicology), MS+RI, 70eVChemicalSoft (Drugs), MS/MS, MSE
_____________________________________________________________
In case of electron impact (EI) same GC-Column (DB-5, RTX-5, DB-1, OV-1)and temperature program must be used for matching retention indices
In case of ESI, APPI spectra (LC-MS) same mass spectrometer design and setup should be used (triple-quad, ion-trap, TOF, Q-TOF), collision energy
(riza_web) |RI|2583|KEY|1596|CAS|2385-85-5|FRML|Empty|CMPD|Mirex|230 250 270 290 310 330 350 370 390 410 430 450
0
50
100
237
272
332404
Cl
Cl
Cl
Cl
Cl
Cl
Cl
Cl
Cl
Cl Cl
Cl
BRAMA training for technicians – Module I, Rome 18
Searching Molecules on PubChem
Goto PubChem Structure Search
18 million compound DB (++)
BRAMA training for technicians – Module I, Rome 19
CAS SciFinder
• 33 million molecules and 60 million peptides/proteins
• largest reaction DB (14 million reactions) and literature DB
• substructure and similarity search of structures
• a must for chemists and biochemists/biologists
• no bulk download, no good Import/ Export, no Link outs
BRAMA training for technicians – Module I, Rome 20
Structure search in SciFinder
Retrieved 4000 papers
(refine search only MS and MALDI)
BRAMA training for technicians – Module I, Rome 21
Atomic Mass
Hexachlorobenzene (C6Cl6)
average mass - 284.7804 u
integer mass - 282.0 u
monoisotopic mass - 281.81312 u
Correct unit is [u] – unified atomic mass unit or [Da] Dalton see SI units1 u = 1 Da = 1/12th of mass of carbon 12C = 1.66053886 x 10-27 kg
C6Cl6: C6 Cl6 p(gss, s/p:40) Chrg 0R: 1000 Res.Pwr...
282 284 286 288 290 292 294 296m/z
0
10
20
30
40
50
60
70
80
90
100
Rel
ativ
e A
bund
ance
283.81
285.81
281.81
287.80
289.80284.81
286.81
282.82
288.81 291.80 292.80 294.80 295.80
Always (always) check molecular masses obtained from databases or publications.For mass spectrometry the monoisotopic mass is used.
Cl
Cl
Cl
Cl
Cl
Cl
InChIKey:CKAPSXZOOQJIBF-UHFFFAOYAV
BRAMA training for technicians – Module I, Rome 22
Mass Accuracy
50-200 ppmLinear IonTrap
3 - 5 ppmTriple Quad
3 - 5 ppmQ-TOF
3 - 5 ppmTOF-MS
1 - 2 ppmMagnetic Sector
0.5 - 1 ppmOrbitrap
0.1 - 1 ppmFT-ICR-MS
Mass AccuracyType
(10 ppm in Ultra-Zoom)
Instruments must be calibrated to obtain high mass accuracy.In case of FT-ICR-MS mass calibration can be stable over weeks.Post- mass calibration can be performed if calibrant was run with samples.Mass of electron becomes important at around 500 Da.
6E1)m
m-m(ppm
exp
calcexp +∗=
m(e-) = 0.00054858026 u = mass of electron
m(1H) = 1.0078246 u = mass of proton
BRAMA training for technicians – Module I, Rome 23
Resolving Power
High resolving power is helpful forseparation of species with almostsame mass (isobars).
High resolving power can not beused to distinguish betweenstructural isomers.
Example:C8H10N2O has 100,082,479 isomers.
RP = 1700
RP = 48,250
Example Solanine (CID=30185)
BRAMA training for technicians – Module I, Rome 24
Isotopic Pattern Generators
Elements can be a) monoisotopic (F, Na, P, I)b) polyisotopic (H, C, N, O, S, Cl, Br)
Isotopic pattern generators generate the isotopic abundances for a given mass value.Calculation is very time-consuming and based on Fast Fourier algorithms.
BRAMA training for technicians – Module I, Rome 25
Charge states
CID: 3081765MW = 1125.50082C50H72N13O15P
charge state 2
charge state 1
BRAMA training for technicians – Module I, Rome 26
Different charge states and peak resolutions
562 563 564 565 566 567 568m/z
0
10
20
30
40
50
60
70
80
90
1000
10
20
30
40
50
60
70
80
90
100562.75
563.25
563.76
564.26564.76 565.77 566.78 567.78
562.75
563.25
563.75
564.25564.76 565.76 566.76 567.26
C 50 H72 N13 O 15 P: C 50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 2R: 2000 Res .Pwr . @FWHM
C 50 H72 N13 O 15 P: C 50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 2R: 200000 Res .Pwr . @FWHM
2000 Resolving PowerCharge state 2
200,000 Resolving PowerCharge state 2
1125 1130 1135m/z
0
10
20
30
40
50
60
70
80
90
1000
10
20
30
40
50
60
70
80
90
1001125.50
1126.51
1127.52
1128.521130.54 1132.55 1134.55
1125.50
1126.50
1127.51
1128.511129.51 1131.52 1133.52 1135.53
C50 H72 N13 O 15 P: C50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 1R: 2000 Res .Pwr . @FWHM
C50 H72 N13 O 15 P: C50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 1R: 200000 Res .Pwr . @FWHM
2000 Resolving PowerCharge state 1
200,000 Resolving PowerCharge state 1
1125 1130 1135m/z
0
10
20
30
40
50
60
70
80
90
1000
10
20
30
40
50
60
70
80
90
100
1125 1130 1135m/z
0
10
20
30
40
50
60
70
80
90
1000
10
20
30
40
50
60
70
80
90
1001125.50
1126.51
1127.52
1128.521130.54 1132.55 1134.55
1125.50
1126.50
1127.51
1128.511129.51 1131.52 1133.52 1135.53
C50 H72 N13 O 15 P: C50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 1R: 2000 Res .Pwr . @FWHM
C50 H72 N13 O 15 P: C50 H72 N13 O 15 P 1p (gss, s /p:40) Chrg 1R: 200000 Res .Pwr . @FWHM
2000 Resolving PowerCharge state 1
200,000 Resolving PowerCharge state 1
Example of Phosphorylated Angiotensin isotopic pattern without adduct [M+H]+ simulated by Thermo XCalibur
0.51.0
BRAMA training for technicians – Module I, Rome 27
Molecular Formula Generators
Formula generators are used to create molecular formulae from accurate masses.Input requires 1) accurate isotopic mass (with or without adduct) and
2) error in ppm or mDa (milli Dalton)
Accurate mass
Mass error
Example MWTWIN
BRAMA training for technicians – Module I, Rome 28
The molecular formula space of small molecules
calculated by the Seven Golden Rules
Each molecular formula can expand to billions of structural isomers.Molecular Formula ≠ Molecular Isomerhttp://fiehnlab.ucdavis.edu/projects/Seven_Golden_Rules/
BRAMA training for technicians – Module I, Rome 31
Mass accuracy and isotopic pattern
Example:ESI-MS (+) of Solanine on a LTQResolving Power: 1700Mass Accuracy: 46 ppm
Isotopic Abundance Error: ±1.46%
C45H73NO15
MW = 867.49799
[M+H]+
BRAMA training for technicians – Module I, Rome 33
Tasks
( 1)
Calculate the number of isomers for C12H12
(2)
Generate the isotopic pattern for Chlorophyll a and Hexachlorobenzene.
(3) Find the molecular formula for the mass spectrum of the next page. http://www.ch.ic.ac.uk/java/applets/FormToM.htmlUse H=24, C=24, O=8 and others=4 in the settings. Include S !
Use the isotope generator to check which of the possible formula’s is the best to fit the pattern observed.
(4) Find the possible molecule(s) in SciFinder and in the National Library of Medicine.Which one is the most likely?http://chem.sis.nlm.nih.gov/chemidplus/ (note: use the formula with hyphen and use the letters alphabetically)https://scifinder.cas.org/scifinder/login.jsf
BRAMA training for technicians – Module I, Rome 35
Webapplications
• Isotope calculator: • http://yanjunhua.tripod.com/pattern.htm• Mass to Formula and Formula to Mass:
http://www.ch.ic.ac.uk/java/applets/FormToM.html• Tutorial GC-MS:• http://eu.shimadzu.de/products/chromato/gcms/TutorialGCMS/default.aspx
• Databases:• Dictionary of Natural Products (there is a limited access because of lack of license)• http://dnp.chemnetbase.com/dictionary-search.do?method=view&id=2885722&si• Chemical lookup service:• http://cactus.nci.nih.gov/• SciFinder:• This needs to be activated through the university library link.
• Good website for Mass spectrometry background:• “The expanding role of Mass spectrometry in Biotechnology”• http://masspec.scripps.edu/book_toc.php