intro to data preprocessing€¦ · data preprocessing workflow 20 gc-ms lc-ms detect masses from...

Post on 10-Aug-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Xiuxia Du, Ph.D. Department of Bioinformatics and Genomics

University of North Carolina at Charlotte

Introduction to Preprocessing of Untargeted Metabolomics Data

Outline• Raw untargeted LC/MS and GC/MS metabolomics data

- Profile and centroid data

- Mass vs. retention time map

- TIC

- EIC

• Principles of LC/MS and GC/MS data preprocessing

• Feature identification

- Identification of known compounds

- Identification of unknown compounds

2

3

Raw Untargeted LC/MS and GC/MS Metabolomics data

4

list of scans in raw files• MS scans in blue• MS/MS scans in red• # sequential number• @ retention time• MS level• type of spectrum

• p = profile• c = centroid• t = thresholded

• polarity of ionization• + = positive• - = negative• ? = unknown

List of mass spectra

One mass spectrum

5

One mass spectrum

6

One mass spectrum

7

Zoom in one mass spectrum

8

profile mode

Mass spectra in centroid mode

9

Mass spectra in centroid mode

10

Spectrum in centroid mode• Data files are much smaller than files in profile mode.

• We will use the centroid data for practicing data pre-processing using XCMS and MZmine 2.

11

LC-MS raw data in 3D

12

Raw data in 3D

13

3D to 2D• Direct processing of the 3D data is NOT trivial

• Instead, we examine 2D

- Mass vs. retention time

- Total ion current vs. retention time: TIC

- Ion current vs. retention time for a particular mass: EIC (Extracted Ion Chromatogram)

14

Mass vs. retention time map

15

TIC

16

EIC

17

EIC

18

19

Principles of LC/MS and GC/MS Data Preprocessing

Data preprocessing workflow

20

GC-MS

LC-MSdetect masses from

mass spectra

detect masses frommass spectra

constructEICs

constructEICs

detectchromatographic peaks

detectchromatographic peaks

deconvolution

annotation

databasesearch

databasesearch

1 2 3 4 5 6alignment /

correspondence

alignment / correspondence

Construct EICs

21

Select one EIC

22

One EIC

23

Detect EIC peaks

24

• Use wavelet transform

• Implemented in XCMS as the centWave method

Detect EIC peaks

25

mexican hat wavelet

−20 −10 0 10 20

−0.4

0.0

0.2

0.4

0.6

0.8

t

ψψs,ττ((

t))

s = 1s = 2s = 8

68

1012

1:dim(wCoefs)[1]

Scal

e

+

++

2850 2900 2950 3000 3050 3100 3150

02

46

810

Seconds

Inte

nsity

* 10

3

ChromatogramGaussian Fit

Tautenhahn, R.; Bottcher, C.; Neumann, S., Highly sensitive feature detection for high resolution LC/MS. BMC bioinformatics 2008, 9, 504.

Detected EIC peaks

26

27

LC/MS-specific Data Preprocessing

Find isotopes

28

Find isotopes

29

Find isotopes

30

31

Alignment

zoom in……

32

Alignment

33

Alignment

Peaks table after alignment

34

35

GC/MS-specific Data Preprocessing

GC-EI-MS

36

• Example: EI fragmentation of methanol

EI fragmentation

37

[CH3OH]•+ �! CH3O+ +H•

[CH3OH]•+ �! CH2O+ +H2

[CH3OH]•+ �! CH+3 + •OH

Deconvolution

38

39

40

41

ADAP-GC 2.0

ADAP-GC 2.0: Deconvolution of Coeluting Metabolites from GC/TOF-MS Data for Metabolomics Studies. Analytical chemistry 2012, 84 (15), 6619-29.

42

Feature identification

Feature identification• Apply statistics and machine learning to detect

discriminating peaks

• Identify discriminating peaks

43

Identification of known compounds• Screening search for compound ID based on LC-MS

data

- Searching monoisotopic mass and isotopic distribution against compound databases

• Library match for compound identification from both LC-MS/MS and GC-MS spectra

44

HMDB

45

HMDB

46

MS/MS or GC-MS spectra matching• Library match for compound identification from both

LC-MS/MS and GC-MS spectra

47

Identification of unknown compounds• MS-FINDER

• CSI:FingerID

• CFM-ID

• MetFrag

• MIDAS

• MAGMA

48

MetFrag

49

More on identification

50

More on identification

51

More on identification

52

More on identification

53

More on identification• Information we have for identification of compounds

based on MS/MS

- M+H

- Experimental isotopic identification

- MS/MS

54

55

More on identification

56

More on identification

57

More on identification

58

More on identification

59

More on identification

60

More on identification

61

More on identification

62

More on identification

63

More on identification

64

More on identification

65

More on identification

• Compare isotopic distributions

66

theoretical experimental

More on identification

67

More on identification

• CompareMS/MS

68

library

experimental

More on identification

69

Thank you!

top related