intro to data preprocessing€¦ · data preprocessing workflow 20 gc-ms lc-ms detect masses from...

69
Xiuxia Du, Ph.D. Department of Bioinformatics and Genomics University of North Carolina at Charlotte Introduction to Preprocessing of Untargeted Metabolomics Data

Upload: others

Post on 10-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Xiuxia Du, Ph.D. Department of Bioinformatics and Genomics

University of North Carolina at Charlotte

Introduction to Preprocessing of Untargeted Metabolomics Data

Page 2: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Outline• Raw untargeted LC/MS and GC/MS metabolomics data

- Profile and centroid data

- Mass vs. retention time map

- TIC

- EIC

• Principles of LC/MS and GC/MS data preprocessing

• Feature identification

- Identification of known compounds

- Identification of unknown compounds

2

Page 3: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

3

Raw Untargeted LC/MS and GC/MS Metabolomics data

Page 4: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

4

list of scans in raw files• MS scans in blue• MS/MS scans in red• # sequential number• @ retention time• MS level• type of spectrum

• p = profile• c = centroid• t = thresholded

• polarity of ionization• + = positive• - = negative• ? = unknown

List of mass spectra

Page 5: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

One mass spectrum

5

Page 6: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

One mass spectrum

6

Page 7: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

One mass spectrum

7

Page 8: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Zoom in one mass spectrum

8

profile mode

Page 9: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Mass spectra in centroid mode

9

Page 10: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Mass spectra in centroid mode

10

Page 11: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Spectrum in centroid mode• Data files are much smaller than files in profile mode.

• We will use the centroid data for practicing data pre-processing using XCMS and MZmine 2.

11

Page 12: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

LC-MS raw data in 3D

12

Page 13: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Raw data in 3D

13

Page 14: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

3D to 2D• Direct processing of the 3D data is NOT trivial

• Instead, we examine 2D

- Mass vs. retention time

- Total ion current vs. retention time: TIC

- Ion current vs. retention time for a particular mass: EIC (Extracted Ion Chromatogram)

14

Page 15: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Mass vs. retention time map

15

Page 16: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

TIC

16

Page 17: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

EIC

17

Page 18: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

EIC

18

Page 19: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

19

Principles of LC/MS and GC/MS Data Preprocessing

Page 20: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Data preprocessing workflow

20

GC-MS

LC-MSdetect masses from

mass spectra

detect masses frommass spectra

constructEICs

constructEICs

detectchromatographic peaks

detectchromatographic peaks

deconvolution

annotation

databasesearch

databasesearch

1 2 3 4 5 6alignment /

correspondence

alignment / correspondence

Page 21: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Construct EICs

21

Page 22: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Select one EIC

22

Page 23: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

One EIC

23

Page 24: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Detect EIC peaks

24

Page 25: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

• Use wavelet transform

• Implemented in XCMS as the centWave method

Detect EIC peaks

25

mexican hat wavelet

−20 −10 0 10 20

−0.4

0.0

0.2

0.4

0.6

0.8

t

ψψs,ττ((

t))

s = 1s = 2s = 8

68

1012

1:dim(wCoefs)[1]

Scal

e

+

++

2850 2900 2950 3000 3050 3100 3150

02

46

810

Seconds

Inte

nsity

* 10

3

ChromatogramGaussian Fit

Tautenhahn, R.; Bottcher, C.; Neumann, S., Highly sensitive feature detection for high resolution LC/MS. BMC bioinformatics 2008, 9, 504.

Page 26: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Detected EIC peaks

26

Page 27: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

27

LC/MS-specific Data Preprocessing

Page 28: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Find isotopes

28

Page 29: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Find isotopes

29

Page 30: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Find isotopes

30

Page 31: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

31

Alignment

zoom in……

Page 32: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

32

Alignment

Page 33: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

33

Alignment

Page 34: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Peaks table after alignment

34

Page 35: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

35

GC/MS-specific Data Preprocessing

Page 36: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

GC-EI-MS

36

Page 37: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

• Example: EI fragmentation of methanol

EI fragmentation

37

[CH3OH]•+ �! CH3O+ +H•

[CH3OH]•+ �! CH2O+ +H2

[CH3OH]•+ �! CH+3 + •OH

Page 38: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Deconvolution

38

Page 39: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

39

Page 40: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

40

Page 41: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

41

ADAP-GC 2.0

ADAP-GC 2.0: Deconvolution of Coeluting Metabolites from GC/TOF-MS Data for Metabolomics Studies. Analytical chemistry 2012, 84 (15), 6619-29.

Page 42: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

42

Feature identification

Page 43: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Feature identification• Apply statistics and machine learning to detect

discriminating peaks

• Identify discriminating peaks

43

Page 44: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Identification of known compounds• Screening search for compound ID based on LC-MS

data

- Searching monoisotopic mass and isotopic distribution against compound databases

• Library match for compound identification from both LC-MS/MS and GC-MS spectra

44

Page 45: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

HMDB

45

Page 46: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

HMDB

46

Page 47: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

MS/MS or GC-MS spectra matching• Library match for compound identification from both

LC-MS/MS and GC-MS spectra

47

Page 48: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

Identification of unknown compounds• MS-FINDER

• CSI:FingerID

• CFM-ID

• MetFrag

• MIDAS

• MAGMA

48

Page 49: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

MetFrag

49

Page 50: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

More on identification

50

Page 51: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

More on identification

51

Page 52: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

More on identification

52

Page 53: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

More on identification

53

Page 54: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

More on identification• Information we have for identification of compounds

based on MS/MS

- M+H

- Experimental isotopic identification

- MS/MS

54

Page 55: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

55

More on identification

Page 56: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

56

More on identification

Page 57: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

57

More on identification

Page 58: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

58

More on identification

Page 59: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

59

More on identification

Page 60: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

60

More on identification

Page 61: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

61

More on identification

Page 62: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

62

More on identification

Page 63: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

63

More on identification

Page 64: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

64

More on identification

Page 65: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

65

More on identification

Page 66: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

• Compare isotopic distributions

66

theoretical experimental

More on identification

Page 67: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

67

More on identification

Page 68: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

• CompareMS/MS

68

library

experimental

More on identification

Page 69: intro to data preprocessing€¦ · Data preprocessing workflow 20 GC-MS LC-MS detect masses from mass spectra detect masses from mass spectra construct EICs construct EICs detect

69

Thank you!