workflows in sieve and lipidsearch - thermo omics portal · workflows in sieve and lipidsearch i...

1

The world leader in serving science

Workflows in SIEVE and LipidSearch

I Orbi 5 - 2014

2


SIEVE v2.1

S tatistical

I terative

E xploratory

V isualization

E nvironment

3

SIEVE Overview

• SIEVE is label-free differential software

• Aids in discovering molecular changes between states

• Provides semi-quantitative measurement of differentially expressed proteins, metabolites and other compounds correlating with a disease state, drug response or other perturbation

• Proteomics

• Biomarker discovery in plasma, tissue, cell culture, urine, saliva

• Disulfide bond validation in purified protein

• Quantify samples that contain their own precursor label,

similar to SILAC

• Small Molecules

• Biomarker discovery in plasma, tissue, cell culture, urine, saliva

• Monitoring drug degradation due to environmental stresses

• Metabolomic and Lipidomic profiling

• Ingredient screening in Food and Safety

• Water purification monitoring

• Improving agriculture (tomatoes, whiskey, wine, corn)

• Direct infusion of olive oil using DART, looking for impurities

• Screening purified therapeutics for environmental and/or product modifications

Scanning electron microsope image of a cancerous (left) and normal

cell, showing differences in cell “brush”. Image courtesy Igor Sokolov.

4

Getting Started

• Minimal system requirements

• Microsoft Windows

• Windows 7 32/64 Professional SP1

• Windows XP 32 Professional SP1

• Microsoft .Net 4.0 (Extended)

• 2 GHz dual core processor

• 8 GB RAM or higher

• 500 GB hard drive

• Recommended system requirements

• 3.3 GHz processor

• 32 GB RAM

• 1TB RAID performance

• Installation requires administrator rights

• Free upgrade from v2.0 to v2.1 via the Thermo Omics Portal

• https://portal.thermo-brims.com/

• Not necessary to uninstall v2.0

https://portal.thermo-brims.com/



5

Statistically rigorous automated label-free LC/MS differential analysis platform

Applied to: any experiment that compares one group to another

State 1

Raw file

State 2

Raw file

State …

Raw file

Workflow

Align

Detect

Identify

Reports:

• Components

• Relative

Quantification

• Statistical Analysis

• Trend information

• Identification

SIEVE Analysis Platform

6

Creating a SIEVE Experiment

Initiate the wizard

7

Define Processing Method

Select the domain: proteomics or small molecules

8

Define Processing Method and Experiment Type

Select the detection algorithm: which one?

9

Two Signal Detection Algorithms – Which One to Use?

Classic Recursive Base Peak

Framing Component Extraction

Application •All sample types:

Proteomics

Small molecules

•Small molecules (charge state ≤2)

Instrumentation •High resolution instrument

•Low resolution instrument

•High resolution instrument

Experiment Type •Trend analysis

•Control Treated

•ROC analysis

•Single class analysis

•Trend analysis

Advanced

Processing

•Perfect pairs

•Targeted detection

•Direct infusion

•Background subtraction

•Data reduction by isotope and

adduct grouping

Limitations •NO background subtraction

•NO data reduction by isotope and

adduct grouping

•Requires charge state ≤2

•Requires high resolution data

10

SIEVE Workflow – Component Extraction

[M+H]+

[M+Na]+

[M+K]+

Automatically interpret spectra, reduce signal peaks into components

A total of 9 different ions

are observed for the same

molecule as isotopes and

adducts

11

Experiment Name

Identify experiment name

12

Select Raw Files

Drag and drop the raw files from the file explorer

13

Sort Raw Files

Click on the file name to sort the files

14

Multiple Solvent Blanks

If more than one solvent blank is present then “blank” files are averaged

15

~98% of lower intensity signals are eliminated

Sample - Solvent blank = Analyte signals Distinguish analyte signals

from noise

Background subtraction

is automatically

performed when

solvent blanks are

acquired.

Irrelevant solvent peaks

are removed from the

data that eliminates a

significant amount of

low level noise.

A significant step in

data reduction and a

critical part of the new

component detection

algorithm.

Component Extraction Background Subtraction

16

Scan Raw Files for Data Quality

Raw files show no errors

17

Define Analysis Groups

Identify group names by separating groups with a space

Select

alignment

reference

file

Ratio

group is

the control

group

(Fed/Fast)

The word

“blank” will

enable

background

subtraction

18

Define Search Parameters

Define retention time, mass range, and m/z width

Frame

width is

“automatic”

for

component

extraction

m/z width:

10 ppm is

+/- 5 ppm

Define

frame width

for framing

experiment

19

Select Scan Filter

SIEVE automatically selects the full MS scan type

Data with

both positive

and

negative

filters needs

to be

processed

separately

Used a lock mass?

Need to modify the filter string

to removing lock mass text

Example: FTMS + p ESI Full MS Lock Mass

20

Intensity

threshold is

initially set

from the mean

intensity of the

reference file

Define Main Component Extraction Parameters

• Review raw data in Qual Browser first

• Each data set is different requiring different settings

21

Define Identification Parameters

Three search types available: ChemSpider, Database Lookup or Defer

22

• ChemSpider • Free chemical structure database

• Over 470 data sources i.e. KEGG, Human Metabolome Database, etc.

• More than one data source can be used in the identification search when separated by a comma

• DB Lookup • Post peak detection lookup (different from seed file)

• DB Library Files in csv format

• Requirement: first column must have neutral exact mass

• All other columns are optional

• Defer • Can defer this setting in the wizard

• Identification can be later enabled within the parameters table

Identification Parameters

23

Complete Wizard Setup

Save file as .sdb

24

Review SIEVE Parameters Before Processing

• Reference File

• Check if reference file is displayed

• If not, enable through raw file collection

• Scan Filter

• Exclude lock mass in text string

• Update

• If modifications made in the parameters table, UPDATE

• Then run processing task

• Align

• Always align

• Even if bypassing align step

• SIEVE is reading in files

• Multiple instances of software allowed

25

Component Extraction - Use of Integration Parameters

• Peak Detection

• ICIS

• Genesis

• PPD (parameter-less)

• Peak Integration

• ICIS

• Genesis

• PPD

• None

• Peak areas generated

• Integration reflects entire

window

• Why use? Time

None

ICIS

Peak Integration

26

Set Integration Parameters

Optimize

parameters for

chromatographic

peaks

27

Unaligned Small Molecule Data

28

Aligned Small Molecule Data

Zoom in by placing a box over area with cursor

Zoom out by removing scroll bars

29

Frame Report View

30

Data Review Options

XIC

Peaks

Trend

Intensities

31

Frame Report

Right click on

any column

title to

access field

chooser

32

Results Review Options

Gel

View

PCA

CVs

Displayed

by group

33

Frames Table Filter

• Use filter table to reduce the number of components

• Filter on column headings

• Filter follows Boolean logic (and, or, not)

• Example 1: CV_E <20 and CV_H <20

• Example 2: Ratio_E < 0.5 or Ratio_E >1.5

• Example 3: Pvalue_E <0.05

• Example 4: Pick >0

34

Additional Tips

• Each dataset may be different

• Visually confirm alignment (may need to bypass alignment)

• Multiple iterations of peak detection may be necessary to optimize peak detection parameters

• Start with higher threshold and no peak integration for faster review

• Supplemental information provided

• Questions? Refer to the Thermo Omics Software Portal

• http://portal.thermo-brims.com/

35


Lipid Search v 4.0.20 Quick Fix release

36

LipidSearch Features

• Automated identification of lipids from biological samples

• Identification, relative quantitation, alignment

• Comprehensive database of >1.5 million lipid ions and predicted fragment ions

• Identification algorithms for product ion, precursor ion, and neutral loss scans

• Identification ranked by mass tolerance, then matched to predicted fragments and predicted retention time

• Suitable for multiple approaches for lipid analysis

• LC or nano-infusion (Shotgun)

• Untargeted and targeted profiling

• Compatible with data from various MS systems

• Thermo Q Exactive, hybrid Orbitrap, and TSQ instruments

37

Getting Started

• Recommended system requirements

• 64-bit operating system, Microsoft Windows 7/8

• Quad- or multi-core CPU, 3 GHz or higher

• 16 GB RAM or higher

• 500 GB hard drive or larger (SSD optional)

• Required programs

• Thermo Scientific MSFileReader 64-bit (need to uninstall if currently installed)

• Java runtime environment (JRE 1.6+)

• Microsoft Visual C 2010 runtime

• Microsoft Internet Explorer or Google Chrome

• Web-based graphical interface

• Installation requires administrator rights

38

Getting Started

• Tomcat Server

• Adjustable maximum memory allocated to server

• Installation

• Edit after installation

• Documentation

• User manual, installation instructions, tutorial files (C drive)

C:\lipidserach\lipidserach4.0\LipidSearchLauncher\LipidSearchLauncher.ini

39

Launcher

• Initiate the software via the desktop icon

• Tomcat server

• Open to launch LipidSearch

• Minimize Tomcat server to the taskbar

• Re-open server by clicking on icon

• http://localhost:8090/lipidsearch040/

Stop and start

server here

40

• Must request license key to register software

• Send information to ThermoMSLicensing.com

• Register key to activate software

License

41

Configuration

• Modify configuration to improve performance

• Increase buffer size to 70 – 80%

• If using > 3 GHz processor, increase the number of processes for

peak detection, identification, and quantification to 4

42

Step 1

Step 2

LipidSearch Workflow

43

Batch Creation for Identification and Quantitation

Select raw

files to be

processed

44

Identification Parameters - LC-MS/MS

Recalc Isotope:

ON for general search

OFF for low abundant ions

M-Score is based on the number of matches with

product ion peaks in the spectrum

General:

Triple Quadrupole

Q Exactive:

QE or Fusion (HCD)

Orbitrap: Fusion

(CID, MS2/MS3)

45

Quantitation Parameters

46

Filter Criteria for Displaying Raw File Results

4) ID Quality:

A: lipid class & FA were completely identified

B: lipid class & some FA were identified

C: lipid class or FA were identified

D: identification by other fragment ions (H20 loss)

1) Toprank:

displays lipids with

top score among

identified spectra

2) Main node:

main isomer peak

displays the largest

isomer based on

intensity, m-score

and t-score

3) FA priority:

shows the most likely fatty

acid chain combination if

lipid isomers have the

same score

1 2

3 4

47

Select Lipid Class

48

Select Adducts

49

Submit Batch

Successful submission

Unsuccessful submission

50

Data Processing – Search Job List Window

Export: Exacts the summary data in the results list

Download: Exacts the entire results file

51


Number of lipid groups

Number of lipid ions

Identification parameters window

Identification results window

52

Identification Results Summary

• The parameters applied

in the filter can be

modified then resubmitted

with the change filter

function

•This operation can be

performed in the job list

window or the in the

identification results

window

53


P Peak picking

I Identification Q Quantitation

54


In queue

Active

Canceled

Successful completion

Ended in failure

55


Number of lipid groups (sum composition)

Number of lipid ions (isomers)

56

Review Identification Results

57


Data sorted by LipidGroup, CalcMz, TopPos

58


•t-score: the difference between the theoretical LC-MS retention time (RP) calculated from the lipid

computational formula and the actual retention time [lower value increases reliability]

•m-score: based on the number of matches with product ion peaks in the MS2 spectrum [higher value]

•Occupancy rate: the ratio of MS2 spectrum peaks assigned to the lipid among all peaks [higher value]

•Grade: identification quality filter assigned A – D based on lipid class or fatty acid identification

59


Chromatographic peak Mass spectrum

60

Spectrum Details Screen – Data ID

Black =

unassigned ions

Red = MS2

matched ions

Green = MS3

matched ions

Precursor ion

Other spectra

where this lipid

is identified

61

Chromatogram Chart

Area score: 0.96

De-noising Smoothing Separating partially

overlapped peaks

Yellow = integrated area

62

Alignment Parameters

Set

alignment

parameters

Max peak area is

the default

Select Mean to

obtain group

average peak areas

63

Retention Time Tolerance

LC Experiment Types: Retention time tolerance

threshold for the peak tops of peaks deemed to be the

same lipid during alignment

r.t.1 – r.t.2 > R.T. tolerance

If the above is true then the peaks will not be aligned

Instead they are two separate records in the results list

Example results: LPC (18:1)

64

Select Raw Files for Alignment

Click select to

open the job list

tab

65

Select Raw Files for Alignment

Check box to include raw files to be aligned

Click

add

66

Group Allocation

•Selected files will appear

in alignment setup

•Define control and

sample groups

•In this example, wild type

is control and knock out is

sample group 1

67

Submit Alignment

Successful submission

68

Alignment Jobs List

M Merge (Alignment)

69

Alignment Results

Group information

can be expanded

and collapsed

Alignment layout is formatted similarly to the identification results

70

Normalization Option

Normalization by either internal standard or by individual lipid class

71

Lipid Specific Alignment Results

72

Acknowledgements

• Thermo Fisher Scientific

• Jennifer Sutton

• David Peake

• Ralf Tautenhahn

• Josef Ruzicka

73


Supplemental Material for SIEVE

74

Define Experiment Type

Experiment Types: •Two Sample Differential Analysis – A simple comparison between two states such as healthy and diseased. A ratio and p-value are calculated. •Control Compare Trend – This experiment is used for time course analysis or trend type experiments. One of the groups is defined as the control group and the others are compared to this control group. For each trend point, a ratio and p-value are calculated against a control group. •Differential Case Study with ROC Analysis – This experiment type is used to measure candidate marker’s capability of distinguishing between two classes. A large subject group (≥10) is recommended. Technical replicates are also recommended. •Non-differential Single Class Analysis – Allows for a quick assessment of the data to determine reproducibility and overall quality by using the CV processor. This analysis can also be used with SIEVE’s Perfect Pairs tool to find precursor pairs in a single raw file. This algorithm tags pairs of frames that are consistent with a designated mass difference. Applications include PTMs, ion and ion + adduct combinations, SILAC, and other precursor labeling methods.

75

Parameter Settings – Global

Change to force calculation if you want to see PCA plot for

large experiments.

Rawfile collection is where you can add/remove RAWs,

change the alignment reference file, change groups, color.

Scanfilter is the Full scan type that is used for the

analysis. If you have lock mass turned on and you see

two full scans in the wizard you need to change this

setting before running the analysis to include all of the full

scans. Change it so that the string includes FTMS + p

ESI Full, in this example. Remove all of the letters after

the Full

Bold line parameters are

the one that are changed

most often

Maximum number of threads for processing. Lowering this

value on 32 bit computers can bypass memory issues.

Retention start and stop can be used to eliminate un

wanted data.

After completely the wizard, the user must check to see if

the reference file parameter is populated. If a file is not

assigned the user must click on the rawfiles (collection)

tab and check the reference file in the box that appears,

Then select OK and the reference file parameter should

display the selected reference file.

76

Parameter Settings – Alignment

Check alignment to see if alignment is needed or not. You

have the option to bypass alignment if needed.

Minimum intensity threshold for alignment

Bold line parameters are the one that are changed most often

Alignment correlation bin size

Max retention time shift for alignment step in mins

The initial size of a title that correlates basepeak

alignment.

77

Signal to background noise threshold for

background correction (subtraction).

Suppress components that do not meet the

Background Signal to Noise criteria.

Base peak minimum intensity required for a

signal to be considered as a component.

Minimum number of scans across a

chromatographic peak to be considered.

Mass window for XIC in ppm (NOTE: 10ppm

= +/- 5 ppm, not +/-10ppm).

Algorithm used to determine peaks ICIS

because the parameters can be checked and

optimized by looking at the raw data in Qual

Browser.

Time in mins from a peak apex to restrict

seeking another peak.

Base peak minimum intensity requires for a

signal to be considered as a component from

a targetMZlist experiment.

List of component MZ’s to force find.


Parameter Settings – Basic Component

78

The condition or trend point that was

designated as the control. The control group

serves as the denominator for ratio

calculations (treatment/control).

Algorithm used for second pass peak

integration. Default is NONE but if ICIS is

used for peak detection the ICIS should be

used for peak integration. Parameters for

the different integration methods can be

found under the Workspace tab at the top of

the software. It is recommend that the user

look at their data in QualBroswer first and

optimize the peak integration parameters

(ICIS) and then apply these settings to

SIEVE. These parameters are very sensitive

to the chromatographic peak shape of the

data.


Parameter Settings – Frame

79

Skirt minimum basepeak intensity, default

should be set to 5 million but varies from

instrument to into. 5 million for QE and 2

million for Exactive.

Number of points for data smoothing.


Parameter Settings – Advanced Component

Minimum scans should always remain 2, do

not change this setting it is not base peak

minimum scans.

80

The charge used for ChemSpider and

DBLookup if a charge could not be

determined.

The maximum number of

frames/components to identify.

Multiple formulas may be assigned to each

component. MinFormulaScore is the

minimum number of formulas sent to

ChemSpider for identification or sent for

pathway analysis. Maximum number of

formulas report per component is 10.

Select the search type: ChemSpider,

DBLookup, None


Parameter Settings – Global Identification

81

ChemSpider identification only: Provide

adduct for mass calculation (+nH, -nH +K

+Na +NH4

ChemSpider databases used for search.

More than one database can be search at a

time and should be separated by commas.

Browse for the accurate mass library file

(csv).

Use either COMPMW to find accurate mass

ID’s based upon component molecular

weight, FRAMEMZ to find accurate mass ID’s

based upon frame MZ, or FORMUAL to id by

formula. For COMPMW search type adduct is

not necessary.

Accurate mass MZ tolerance for DBLookup

and ChemSpider searches (ppm).


Parameter Settings – Accurate Mass Identification

82

Time in mins from a peak apex to restrict seeking

another peak. Default setting is 0.2 which is too

large to separate isomers.

Max retention time shift for alignment step in

mins. Default setting is 0.2 which is too large in

this case to separate isomers.

Peak Integration is changed from NONE to ICIS.

Smoothing set to Points 3

Parameter Settings – Optimized

BPMinimumCounts changed to 1 million

Background SN changed to 10

SkirtBPInty set to 5 million

83

Frame Target Wizard – Seed File

•Incorporates a search for target ions of interest

•Identify .csv file in the setup wizard

•Alternatively identify .csv file in the parameters table as frame seed file

84

Frame Target Wizard

•Create the .csv file

•Minimally required columns include MZ, RTStart, and RTStop

•Additional information can be listed in columns, for example compound name

•The annotated column can be filtered using the frames table filter

85

Frame Target Wizard

•Assign columns within the .csv file

•Check the number of entries in the frame parameters table to confirm file is successfully

identified

86

Normalization

• If normalizing to a selected frame, make sure the desired m/z frame is highlighted

• The desired frame is then displayed as the frame to normalize to in the normalize tab

87

Alignment and Framing Experiment

•Associate data by isotopic cluster

•Select PRElement, PRRoot, and PRSize from the field chooser

•Sort by PRRoot

•Tip: filter on PRSize

PRElement

0 = 12C

1 = 13C

2 = 14C

PRRoot 12C peak

PRSize

number of

frames per

cluster

88

Elemental Composition

• SIEVE can generate

up to 10 possible

elemental compositions

for a given component

•Each composition is

scored and ranked

•The top ranked

composition is listed in

the main frame report

table

•Other possible

compositions can be

viewed in the Flex View

tab

workflows in sieve and lipidsearch - thermo omics portal · workflows in sieve and lipidsearch i...

Documents