workflows in sieve and lipidsearch - thermo omics portal · workflows in sieve and lipidsearch i...
TRANSCRIPT
1
The world leader in serving science
Workflows in SIEVE and LipidSearch
I Orbi 5 - 2014
2
The world leader in serving science
SIEVE v2.1
S tatistical
I terative
E xploratory
V isualization
E nvironment
3
SIEVE Overview
• SIEVE is label-free differential software
• Aids in discovering molecular changes between states
• Provides semi-quantitative measurement of differentially expressed proteins, metabolites and other compounds correlating with a disease state, drug response or other perturbation
• Proteomics
• Biomarker discovery in plasma, tissue, cell culture, urine, saliva
• Disulfide bond validation in purified protein
• Quantify samples that contain their own precursor label,
similar to SILAC
• Small Molecules
• Biomarker discovery in plasma, tissue, cell culture, urine, saliva
• Monitoring drug degradation due to environmental stresses
• Metabolomic and Lipidomic profiling
• Ingredient screening in Food and Safety
• Water purification monitoring
• Improving agriculture (tomatoes, whiskey, wine, corn)
• Direct infusion of olive oil using DART, looking for impurities
• Screening purified therapeutics for environmental and/or product modifications
Scanning electron microsope image of a cancerous (left) and normal
cell, showing differences in cell “brush”. Image courtesy Igor Sokolov.
4
Getting Started
• Minimal system requirements
• Microsoft Windows
• Windows 7 32/64 Professional SP1
• Windows XP 32 Professional SP1
• Microsoft .Net 4.0 (Extended)
• 2 GHz dual core processor
• 8 GB RAM or higher
• 500 GB hard drive
• Recommended system requirements
• 3.3 GHz processor
• 32 GB RAM
• 1TB RAID performance
• Installation requires administrator rights
• Free upgrade from v2.0 to v2.1 via the Thermo Omics Portal
• https://portal.thermo-brims.com/
• Not necessary to uninstall v2.0
5
Statistically rigorous automated label-free LC/MS differential analysis platform
Applied to: any experiment that compares one group to another
State 1
Raw file
State 2
Raw file
State …
Raw file
Workflow
Align
Detect
Identify
Reports:
• Components
• Relative
Quantification
• Statistical Analysis
• Trend information
• Identification
SIEVE Analysis Platform
6
Creating a SIEVE Experiment
Initiate the wizard
7
Define Processing Method
Select the domain: proteomics or small molecules
8
Define Processing Method and Experiment Type
Select the detection algorithm: which one?
9
Two Signal Detection Algorithms – Which One to Use?
Classic Recursive Base Peak
Framing Component Extraction
Application •All sample types:
Proteomics
Small molecules
•Small molecules (charge state ≤2)
Instrumentation •High resolution instrument
•Low resolution instrument
•High resolution instrument
Experiment Type •Trend analysis
•Control Treated
•ROC analysis
•Single class analysis
•Trend analysis
Advanced
Processing
•Perfect pairs
•Targeted detection
•Direct infusion
•Background subtraction
•Data reduction by isotope and
adduct grouping
Limitations •NO background subtraction
•NO data reduction by isotope and
adduct grouping
•Requires charge state ≤2
•Requires high resolution data
10
SIEVE Workflow – Component Extraction
[M+H]+
[M+Na]+
[M+K]+
Automatically interpret spectra, reduce signal peaks into components
A total of 9 different ions
are observed for the same
molecule as isotopes and
adducts
11
Experiment Name
Identify experiment name
12
Select Raw Files
Drag and drop the raw files from the file explorer
13
Sort Raw Files
Click on the file name to sort the files
14
Multiple Solvent Blanks
If more than one solvent blank is present then “blank” files are averaged
15
~98% of lower intensity signals are eliminated
Sample - Solvent blank = Analyte signals Distinguish analyte signals
from noise
Background subtraction
is automatically
performed when
solvent blanks are
acquired.
Irrelevant solvent peaks
are removed from the
data that eliminates a
significant amount of
low level noise.
A significant step in
data reduction and a
critical part of the new
component detection
algorithm.
Component Extraction Background Subtraction
16
Scan Raw Files for Data Quality
Raw files show no errors
17
Define Analysis Groups
Identify group names by separating groups with a space
Select
alignment
reference
file
Ratio
group is
the control
group
(Fed/Fast)
The word
“blank” will
enable
background
subtraction
18
Define Search Parameters
Define retention time, mass range, and m/z width
Frame
width is
“automatic”
for
component
extraction
m/z width:
10 ppm is
+/- 5 ppm
Define
frame width
for framing
experiment
19
Select Scan Filter
SIEVE automatically selects the full MS scan type
Data with
both positive
and
negative
filters needs
to be
processed
separately
Used a lock mass?
Need to modify the filter string
to removing lock mass text
Example: FTMS + p ESI Full MS Lock Mass
20
Intensity
threshold is
initially set
from the mean
intensity of the
reference file
Define Main Component Extraction Parameters
• Review raw data in Qual Browser first
• Each data set is different requiring different settings
21
Define Identification Parameters
Three search types available: ChemSpider, Database Lookup or Defer
22
• ChemSpider • Free chemical structure database
• Over 470 data sources i.e. KEGG, Human Metabolome Database, etc.
• More than one data source can be used in the identification search when separated by a comma
• DB Lookup • Post peak detection lookup (different from seed file)
• DB Library Files in csv format
• Requirement: first column must have neutral exact mass
• All other columns are optional
• Defer • Can defer this setting in the wizard
• Identification can be later enabled within the parameters table
Identification Parameters
23
Complete Wizard Setup
Save file as .sdb
24
Review SIEVE Parameters Before Processing
• Reference File
• Check if reference file is displayed
• If not, enable through raw file collection
• Scan Filter
• Exclude lock mass in text string
• Update
• If modifications made in the parameters table, UPDATE
• Then run processing task
• Align
• Always align
• Even if bypassing align step
• SIEVE is reading in files
• Multiple instances of software allowed
25
Component Extraction - Use of Integration Parameters
• Peak Detection
• ICIS
• Genesis
• PPD (parameter-less)
• Peak Integration
• ICIS
• Genesis
• PPD
• None
• Peak areas generated
• Integration reflects entire
window
• Why use? Time
None
ICIS
Peak Integration
26
Set Integration Parameters
Optimize
parameters for
chromatographic
peaks
27
Unaligned Small Molecule Data
28
Aligned Small Molecule Data
Zoom in by placing a box over area with cursor
Zoom out by removing scroll bars
29
Frame Report View
30
Data Review Options
XIC
Peaks
Trend
Intensities
31
Frame Report
Right click on
any column
title to
access field
chooser
32
Results Review Options
Gel
View
PCA
CVs
Displayed
by group
33
Frames Table Filter
• Use filter table to reduce the number of components
• Filter on column headings
• Filter follows Boolean logic (and, or, not)
• Example 1: CV_E <20 and CV_H <20
• Example 2: Ratio_E < 0.5 or Ratio_E >1.5
• Example 3: Pvalue_E <0.05
• Example 4: Pick >0
34
Additional Tips
• Each dataset may be different
• Visually confirm alignment (may need to bypass alignment)
• Multiple iterations of peak detection may be necessary to optimize peak detection parameters
• Start with higher threshold and no peak integration for faster review
• Supplemental information provided
• Questions? Refer to the Thermo Omics Software Portal
• http://portal.thermo-brims.com/
35
The world leader in serving science
Lipid Search v 4.0.20 Quick Fix release
36
LipidSearch Features
• Automated identification of lipids from biological samples
• Identification, relative quantitation, alignment
• Comprehensive database of >1.5 million lipid ions and predicted fragment ions
• Identification algorithms for product ion, precursor ion, and neutral loss scans
• Identification ranked by mass tolerance, then matched to predicted fragments and predicted retention time
• Suitable for multiple approaches for lipid analysis
• LC or nano-infusion (Shotgun)
• Untargeted and targeted profiling
• Compatible with data from various MS systems
• Thermo Q Exactive, hybrid Orbitrap, and TSQ instruments
37
Getting Started
• Recommended system requirements
• 64-bit operating system, Microsoft Windows 7/8
• Quad- or multi-core CPU, 3 GHz or higher
• 16 GB RAM or higher
• 500 GB hard drive or larger (SSD optional)
• Required programs
• Thermo Scientific MSFileReader 64-bit (need to uninstall if currently installed)
• Java runtime environment (JRE 1.6+)
• Microsoft Visual C 2010 runtime
• Microsoft Internet Explorer or Google Chrome
• Web-based graphical interface
• Installation requires administrator rights
38
Getting Started
• Tomcat Server
• Adjustable maximum memory allocated to server
• Installation
• Edit after installation
• Documentation
• User manual, installation instructions, tutorial files (C drive)
C:\lipidserach\lipidserach4.0\LipidSearchLauncher\LipidSearchLauncher.ini
39
Launcher
• Initiate the software via the desktop icon
• Tomcat server
• Open to launch LipidSearch
• Minimize Tomcat server to the taskbar
• Re-open server by clicking on icon
• http://localhost:8090/lipidsearch040/
Stop and start
server here
40
• Must request license key to register software
• Send information to ThermoMSLicensing.com
• Register key to activate software
License
41
Configuration
• Modify configuration to improve performance
• Increase buffer size to 70 – 80%
• If using > 3 GHz processor, increase the number of processes for
peak detection, identification, and quantification to 4
42
Step 1
Step 2
LipidSearch Workflow
43
Batch Creation for Identification and Quantitation
Select raw
files to be
processed
44
Identification Parameters - LC-MS/MS
Recalc Isotope:
ON for general search
OFF for low abundant ions
M-Score is based on the number of matches with
product ion peaks in the spectrum
General:
Triple Quadrupole
Q Exactive:
QE or Fusion (HCD)
Orbitrap: Fusion
(CID, MS2/MS3)
45
Quantitation Parameters
46
Filter Criteria for Displaying Raw File Results
4) ID Quality:
A: lipid class & FA were completely identified
B: lipid class & some FA were identified
C: lipid class or FA were identified
D: identification by other fragment ions (H20 loss)
1) Toprank:
displays lipids with
top score among
identified spectra
2) Main node:
main isomer peak
displays the largest
isomer based on
intensity, m-score
and t-score
3) FA priority:
shows the most likely fatty
acid chain combination if
lipid isomers have the
same score
1 2
3 4
47
Select Lipid Class
48
Select Adducts
49
Submit Batch
Successful submission
Unsuccessful submission
50
Data Processing – Search Job List Window
Export: Exacts the summary data in the results list
Download: Exacts the entire results file
51
Data Processing – Search Job List Window
Number of lipid groups
Number of lipid ions
Identification parameters window
Identification results window
52
Identification Results Summary
• The parameters applied
in the filter can be
modified then resubmitted
with the change filter
function
•This operation can be
performed in the job list
window or the in the
identification results
window
53
Data Processing – Search Job List Window
P Peak picking
I Identification Q Quantitation
54
Data Processing – Search Job List Window
In queue
Active
Canceled
Successful completion
Ended in failure
55
Data Processing – Search Job List Window
Number of lipid groups (sum composition)
Number of lipid ions (isomers)
56
Review Identification Results
57
Review Identification Results
Data sorted by LipidGroup, CalcMz, TopPos
58
Review Identification Results
•t-score: the difference between the theoretical LC-MS retention time (RP) calculated from the lipid
computational formula and the actual retention time [lower value increases reliability]
•m-score: based on the number of matches with product ion peaks in the MS2 spectrum [higher value]
•Occupancy rate: the ratio of MS2 spectrum peaks assigned to the lipid among all peaks [higher value]
•Grade: identification quality filter assigned A – D based on lipid class or fatty acid identification
59
Review Identification Results
Chromatographic peak Mass spectrum
60
Spectrum Details Screen – Data ID
Black =
unassigned ions
Red = MS2
matched ions
Green = MS3
matched ions
Precursor ion
Other spectra
where this lipid
is identified
61
Chromatogram Chart
Area score: 0.96
De-noising Smoothing Separating partially
overlapped peaks
Yellow = integrated area
62
Alignment Parameters
Set
alignment
parameters
Max peak area is
the default
Select Mean to
obtain group
average peak areas
63
Retention Time Tolerance
LC Experiment Types: Retention time tolerance
threshold for the peak tops of peaks deemed to be the
same lipid during alignment
r.t.1 – r.t.2 > R.T. tolerance
If the above is true then the peaks will not be aligned
Instead they are two separate records in the results list
Example results: LPC (18:1)
64
Select Raw Files for Alignment
Click select to
open the job list
tab
65
Select Raw Files for Alignment
Check box to include raw files to be aligned
Click
add
66
Group Allocation
•Selected files will appear
in alignment setup
•Define control and
sample groups
•In this example, wild type
is control and knock out is
sample group 1
67
Submit Alignment
Successful submission
68
Alignment Jobs List
M Merge (Alignment)
69
Alignment Results
Group information
can be expanded
and collapsed
Alignment layout is formatted similarly to the identification results
70
Normalization Option
Normalization by either internal standard or by individual lipid class
71
Lipid Specific Alignment Results
72
Acknowledgements
• Thermo Fisher Scientific
• Jennifer Sutton
• David Peake
• Ralf Tautenhahn
• Josef Ruzicka
73
The world leader in serving science
Supplemental Material for SIEVE
74
Define Experiment Type
Experiment Types: •Two Sample Differential Analysis – A simple comparison between two states such as healthy and diseased. A ratio and p-value are calculated. •Control Compare Trend – This experiment is used for time course analysis or trend type experiments. One of the groups is defined as the control group and the others are compared to this control group. For each trend point, a ratio and p-value are calculated against a control group. •Differential Case Study with ROC Analysis – This experiment type is used to measure candidate marker’s capability of distinguishing between two classes. A large subject group (≥10) is recommended. Technical replicates are also recommended. •Non-differential Single Class Analysis – Allows for a quick assessment of the data to determine reproducibility and overall quality by using the CV processor. This analysis can also be used with SIEVE’s Perfect Pairs tool to find precursor pairs in a single raw file. This algorithm tags pairs of frames that are consistent with a designated mass difference. Applications include PTMs, ion and ion + adduct combinations, SILAC, and other precursor labeling methods.
75
Parameter Settings – Global
Change to force calculation if you want to see PCA plot for
large experiments.
Rawfile collection is where you can add/remove RAWs,
change the alignment reference file, change groups, color.
Scanfilter is the Full scan type that is used for the
analysis. If you have lock mass turned on and you see
two full scans in the wizard you need to change this
setting before running the analysis to include all of the full
scans. Change it so that the string includes FTMS + p
ESI Full, in this example. Remove all of the letters after
the Full
Bold line parameters are
the one that are changed
most often
Maximum number of threads for processing. Lowering this
value on 32 bit computers can bypass memory issues.
Retention start and stop can be used to eliminate un
wanted data.
After completely the wizard, the user must check to see if
the reference file parameter is populated. If a file is not
assigned the user must click on the rawfiles (collection)
tab and check the reference file in the box that appears,
Then select OK and the reference file parameter should
display the selected reference file.
76
Parameter Settings – Alignment
Check alignment to see if alignment is needed or not. You
have the option to bypass alignment if needed.
Minimum intensity threshold for alignment
Bold line parameters are the one that are changed most often
Alignment correlation bin size
Max retention time shift for alignment step in mins
The initial size of a title that correlates basepeak
alignment.
77
Signal to background noise threshold for
background correction (subtraction).
Suppress components that do not meet the
Background Signal to Noise criteria.
Base peak minimum intensity required for a
signal to be considered as a component.
Minimum number of scans across a
chromatographic peak to be considered.
Mass window for XIC in ppm (NOTE: 10ppm
= +/- 5 ppm, not +/-10ppm).
Algorithm used to determine peaks ICIS
because the parameters can be checked and
optimized by looking at the raw data in Qual
Browser.
Time in mins from a peak apex to restrict
seeking another peak.
Base peak minimum intensity requires for a
signal to be considered as a component from
a targetMZlist experiment.
List of component MZ’s to force find.
Bold line parameters are the one that are changed most often
Parameter Settings – Basic Component
78
The condition or trend point that was
designated as the control. The control group
serves as the denominator for ratio
calculations (treatment/control).
Algorithm used for second pass peak
integration. Default is NONE but if ICIS is
used for peak detection the ICIS should be
used for peak integration. Parameters for
the different integration methods can be
found under the Workspace tab at the top of
the software. It is recommend that the user
look at their data in QualBroswer first and
optimize the peak integration parameters
(ICIS) and then apply these settings to
SIEVE. These parameters are very sensitive
to the chromatographic peak shape of the
data.
Bold line parameters are the one that are changed most often
Parameter Settings – Frame
79
Skirt minimum basepeak intensity, default
should be set to 5 million but varies from
instrument to into. 5 million for QE and 2
million for Exactive.
Number of points for data smoothing.
Bold line parameters are the one that are changed most often
Parameter Settings – Advanced Component
Minimum scans should always remain 2, do
not change this setting it is not base peak
minimum scans.
80
The charge used for ChemSpider and
DBLookup if a charge could not be
determined.
The maximum number of
frames/components to identify.
Multiple formulas may be assigned to each
component. MinFormulaScore is the
minimum number of formulas sent to
ChemSpider for identification or sent for
pathway analysis. Maximum number of
formulas report per component is 10.
Select the search type: ChemSpider,
DBLookup, None
Bold line parameters are the one that are changed most often
Parameter Settings – Global Identification
81
ChemSpider identification only: Provide
adduct for mass calculation (+nH, -nH +K
+Na +NH4
ChemSpider databases used for search.
More than one database can be search at a
time and should be separated by commas.
Browse for the accurate mass library file
(csv).
Use either COMPMW to find accurate mass
ID’s based upon component molecular
weight, FRAMEMZ to find accurate mass ID’s
based upon frame MZ, or FORMUAL to id by
formula. For COMPMW search type adduct is
not necessary.
Accurate mass MZ tolerance for DBLookup
and ChemSpider searches (ppm).
Bold line parameters are the one that are changed most often
Parameter Settings – Accurate Mass Identification
82
Time in mins from a peak apex to restrict seeking
another peak. Default setting is 0.2 which is too
large to separate isomers.
Max retention time shift for alignment step in
mins. Default setting is 0.2 which is too large in
this case to separate isomers.
Peak Integration is changed from NONE to ICIS.
Smoothing set to Points 3
Parameter Settings – Optimized
BPMinimumCounts changed to 1 million
Background SN changed to 10
SkirtBPInty set to 5 million
83
Frame Target Wizard – Seed File
•Incorporates a search for target ions of interest
•Identify .csv file in the setup wizard
•Alternatively identify .csv file in the parameters table as frame seed file
84
Frame Target Wizard
•Create the .csv file
•Minimally required columns include MZ, RTStart, and RTStop
•Additional information can be listed in columns, for example compound name
•The annotated column can be filtered using the frames table filter
85
Frame Target Wizard
•Assign columns within the .csv file
•Check the number of entries in the frame parameters table to confirm file is successfully
identified
86
Normalization
• If normalizing to a selected frame, make sure the desired m/z frame is highlighted
• The desired frame is then displayed as the frame to normalize to in the normalize tab
87
Alignment and Framing Experiment
•Associate data by isotopic cluster
•Select PRElement, PRRoot, and PRSize from the field chooser
•Sort by PRRoot
•Tip: filter on PRSize
PRElement
0 = 12C
1 = 13C
2 = 14C
PRRoot 12C peak
PRSize
number of
frames per
cluster
88
Elemental Composition
• SIEVE can generate
up to 10 possible
elemental compositions
for a given component
•Each composition is
scored and ranked
•The top ranked
composition is listed in
the main frame report
table
•Other possible
compositions can be
viewed in the Flex View
tab