openms in the compound discoverertm platform …...large-scale analysis of non-targeted lc-ms...

1
Large-scale analysis of non-targeted LC-MS metabolomics data with OpenMS in the Compound Discoverer TM platform Fabian Aicheler 1 , Timo Sachsenberg 1 , Erhan Kenar 1 , Sebastian Kusch 2 , Hans Grensemann 2 , Oliver Kohlbacher 1 1 Center for Bioinformatics, Tübingen, Germany; 2 Thermo Fisher Scientific GmbH, Germany OVERVIEW Purpose: Integration of a pipeline for feature identi- fication and quantification into Thermo Fisher Com- pound Discoverer TM (CD). Implementation: Performance of a feature detec- tion algorithm was demonstrated on dilution series. An OpenMS [1] pipeline was constructed around this algorithm and wrapped into a CD community node. Pipeline output was incorporated into the CD report- ing format. Results: We announce the first integration of an au- tomated workflow for metabolite quantification into the novel CD platform, providing a community extension that enables the differential analysis of multiple LC-MS runs. I NTRODUCTION Label-free quantification of small molecules using LC-MS has become a standard analytical technol- ogy. Complex LC-MS datasets require automated proc- essing such as mass trace detection and assembly of isotopic traces to features, followed by quantifica- tion and identification of compounds. Recently, Kenar et al. [2] presented a sensi- tive feature detection algorithm which results in reproducible metabolite quantification for small molecules. CD is a new mass spectrometry analysis platform scheduled for a release later this year. Analogous to Proteome Discoverer, which is tailored to proteins, CD is adapted for small molecule analysis. CD has been designed to allow integration of external tools and algorithms as so-called community nodes. Our aim was the integration of a metabolic feature identification and quantification pipeline into CD. Besides creating the necessary interfaces, consis- tent presentation of CD results and their export for straightforward downstream analysis outside of CD were declared goals. I MPLEMENTATION CC Sources: tinyurl.com/lm85xsu, tinyurl.com/ldtvazo. Figure 1: The OpenMS pipeline is implemented as commu- nity node in CD. Result export as tables allows downstream analysis outside CD, for example in Knime. The open-source software library OpenMS allows for rapid development of mass spectrometry algorithms and tools. It includes methods for retention time align- ment and feature linking [3]. We expanded this toolset with a novel algorithm for feature detection and non- targeted quantification of small molecule LC-MS data. Our OpenMS metabolite quantification workflow was encapsulated in a single CD node. Evaluations of the method by Kenar et al. included human plasma sam- ples with spiked-in metabolites. RT m/z RT m/z Intensity RT mass trace detection peak separation RT m/z RT m/z +1/z +2/z ΔRT RT m/z feature assembly hypothesis testing (a) (b) (d) (c) T 0 T 1 T 2 T 0,1 T 1,1 T 2,1 T 0,2 T 1,2 T 2,2 Figure 2: Methodology overview of the feature detection algorithm. R ESULTS Figure 3: OpenMS community node in a CD example work- flow. OpenMS results are incorporated into the data pro- cessing and visualization capabilities of CD, offering tightly integrated presentation and downstream analy- sis in the CD software. Restriction to Thermo Fisher instruments allowed optimized parameter choices for the OpenMS algorithms. Figure 4: Result view of CD with integrated OpenMS results. In the evaluation of the feature detection algorithm, correlations above 0.98 between feature intensities and corresponding compound concentrations were re- ported. Figure 5: Correlations for chosen metabolites in dilution experiments. To assess the quality of our small molecule detection pipeline, we investigated reproducibility in terms of fea- ture recurrence over multiple measurements. Our inte- grated feature detection algorithm was compared with XCMS/CAMERA in a dilution series (33 MS runs). A time series of Prazosin metabolism in rats was used to compare our method with the feature detection algo- rithm provided by CD (6 MS runs). Found in # OpenMS XCMS samples Camera 1-5 5590 1837 6-9 591 242 10-13 258 115 14-17 183 78 18-21 124 82 22-25 124 52 26-29 128 52 30-33 744 341 Found in # OpenMS Component samples Elucidator 1 5369 5777 2 2480 2684 3 1719 1434 4 2288 1543 5 1491 903 6 1446 1173 Table 1: Left: Reproducible features for OpenMS and XCMS/CAMERA (dilution series, 33 MS runs). Right: Re- producible features for OpenMS and Component Elucidator (Prazosin time series, 6 MS runs). 19032 11385 21786 OpenMS Component Elucidator Figure 6: Overlap of detected features measured in a time series of Prazosin metabolism. CD results can be exported to tabular file formats, allowing downstream analysis outside of CD. Here we use the KoNstanz Information MinEr (KNIME) [4]. KNIME supports a multitude of processing modules for cheminformatics, machine learning and statistics. A downstream analysis workflow in KNIME (See Figure 7) allows elaborate analysis of CD results. Figure 7: Example analysis in KNIME. C ONCLUSION We successfully integrated a robust, sensitive fea- ture quantification method into CD, enabling joint analysis of multiple runs. Reduction of parameters and integration into CD significantly improved accessibility of this state of the art metabolite quantification workflow. Source code (C#) of our community node will be freely available under an open-source license par- allel to the release of CD. R EFERENCES [1] Sturm et al. OpenMS - an open-source software frame- work for mass spectrometry. BMC Bioinformatics, 9:163, 2008. [2] Kenar et al. Automated label-free quantification of metabolites from liquid chromatography-mass spectrom- etry data. Mol. Cell. Proteomics, 13(1):348–59, 2014. [3] Weisser et al. An automated pipeline for high-throughput label-free quantitative proteomics. J. Proteome Res., 12(4):1628–1644, 2013. [4] Berthold et al. KNIME: The Konstanz Information Miner. In Studies in Classification, Data Analysis, and Knowl- edge Organization (GfKL 2007). Springer, 2007.

Upload: others

Post on 27-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OpenMS in the Compound DiscovererTM platform …...Large-scale analysis of non-targeted LC-MS metabolomics data with OpenMS in the Compound Discoverer TM platform Fabian Aicheler 1

Large-scale analysis of non-targeted LC-MS metabolomics data withOpenMS in the Compound DiscovererTM platform

Fabian Aicheler1, Timo Sachsenberg1, Erhan Kenar1, Sebastian Kusch2, Hans Grensemann2, Oliver Kohlbacher11Center for Bioinformatics, Tübingen, Germany; 2Thermo Fisher Scientific GmbH, Germany

OVERVIEWPurpose: Integration of a pipeline for feature identi-fication and quantification into Thermo Fisher Com-pound DiscovererTM (CD).Implementation: Performance of a feature detec-tion algorithm was demonstrated on dilution series.An OpenMS [1] pipeline was constructed around thisalgorithm and wrapped into a CD community node.Pipeline output was incorporated into the CD report-ing format.Results: We announce the first integration of an au-tomated workflow for metabolite quantification into thenovel CD platform, providing a community extensionthat enables the differential analysis of multiple LC-MSruns.

INTRODUCTION• Label-free quantification of small molecules using

LC-MS has become a standard analytical technol-ogy.

• Complex LC-MS datasets require automated proc-essing such as mass trace detection and assemblyof isotopic traces to features, followed by quantifica-tion and identification of compounds.

• Recently, Kenar et al. [2] presented a sensi-tive feature detection algorithm which results inreproducible metabolite quantification for smallmolecules.

• CD is a new mass spectrometry analysis platformscheduled for a release later this year. Analogous toProteome Discoverer, which is tailored to proteins,CD is adapted for small molecule analysis. CD hasbeen designed to allow integration of external toolsand algorithms as so-called community nodes.

• Our aim was the integration of a metabolic featureidentification and quantification pipeline into CD.Besides creating the necessary interfaces, consis-tent presentation of CD results and their export forstraightforward downstream analysis outside of CDwere declared goals.

IMPLEMENTATION

CC© Sources: tinyurl.com/lm85xsu, tinyurl.com/ldtvazo.

Figure 1: The OpenMS pipeline is implemented as commu-nity node in CD. Result export as tables allows downstreamanalysis outside CD, for example in Knime.

The open-source software library OpenMS allows forrapid development of mass spectrometry algorithmsand tools. It includes methods for retention time align-ment and feature linking [3]. We expanded this toolsetwith a novel algorithm for feature detection and non-targeted quantification of small molecule LC-MS data.Our OpenMS metabolite quantification workflow wasencapsulated in a single CD node. Evaluations of themethod by Kenar et al. included human plasma sam-ples with spiked-in metabolites.

RT

m/z

RT

m/z

Intensity

RT

mass trace detection peak separation

RT

m/z

RT

m/z

+1/z+2/z

ΔRT

RT

m/z

feature assembly hypothesis testing

(a) (b)

(d) (c)

T0 T1 T2

T0,1 T1,1 T2,1

T0,2 T1,2 T2,2

Figure 2: Methodology overview of the feature detectionalgorithm.

RESULTS

Figure 3: OpenMS community node in a CD example work-flow.

OpenMS results are incorporated into the data pro-cessing and visualization capabilities of CD, offeringtightly integrated presentation and downstream analy-sis in the CD software. Restriction to Thermo Fisherinstruments allowed optimized parameter choices forthe OpenMS algorithms.

Figure 4: Result view of CD with integrated OpenMS results.

In the evaluation of the feature detection algorithm,correlations above 0.98 between feature intensitiesand corresponding compound concentrations were re-ported.

Figure 5: Correlations for chosen metabolites in dilutionexperiments.

To assess the quality of our small molecule detectionpipeline, we investigated reproducibility in terms of fea-ture recurrence over multiple measurements. Our inte-grated feature detection algorithm was compared withXCMS/CAMERA in a dilution series (33 MS runs). Atime series of Prazosin metabolism in rats was used tocompare our method with the feature detection algo-rithm provided by CD (6 MS runs).

Found in # OpenMS XCMSsamples Camera

1-5 5590 18376-9 591 24210-13 258 11514-17 183 7818-21 124 8222-25 124 5226-29 128 5230-33 744 341

Found in # OpenMS Componentsamples Elucidator

1 5369 57772 2480 26843 1719 14344 2288 15435 1491 9036 1446 1173

Table 1: Left: Reproducible features for OpenMS andXCMS/CAMERA (dilution series, 33 MS runs). Right: Re-producible features for OpenMS and Component Elucidator(Prazosin time series, 6 MS runs).

19032 1138521786

OpenMS Component Elucidator

Figure 6: Overlap of detected features measured in a timeseries of Prazosin metabolism.

CD results can be exported to tabular file formats,allowing downstream analysis outside of CD. Herewe use the KoNstanz Information MinEr (KNIME) [4].KNIME supports a multitude of processing modules forcheminformatics, machine learning and statistics. Adownstream analysis workflow in KNIME (See Figure7) allows elaborate analysis of CD results.

Figure 7: Example analysis in KNIME.

CONCLUSION• We successfully integrated a robust, sensitive fea-

ture quantification method into CD, enabling jointanalysis of multiple runs.

• Reduction of parameters and integration into CDsignificantly improved accessibility of this state ofthe art metabolite quantification workflow.

• Source code (C#) of our community node will befreely available under an open-source license par-allel to the release of CD.

REFERENCES

[1] Sturm et al. OpenMS - an open-source software frame-work for mass spectrometry. BMC Bioinformatics, 9:163,2008.

[2] Kenar et al. Automated label-free quantification ofmetabolites from liquid chromatography-mass spectrom-etry data. Mol. Cell. Proteomics, 13(1):348–59, 2014.

[3] Weisser et al. An automated pipeline for high-throughputlabel-free quantitative proteomics. J. Proteome Res.,12(4):1628–1644, 2013.

[4] Berthold et al. KNIME: The Konstanz Information Miner.In Studies in Classification, Data Analysis, and Knowl-edge Organization (GfKL 2007). Springer, 2007.