dynamiclocalizedsnv,peaksnv,andpartialpeaksnv...

15
Research Article Dynamic Localized SNV, Peak SNV, and Partial Peak SNV: Novel Standardization Methods for Preprocessing of Spectroscopic Data Used in Predictive Modeling Emily Grisanti , 1,2 Maria Totska, 2 Stefan Huber, 2 Christina Krick Calderon, 2 Monika Hohmann, 2 Dominic Lingenfelser, 2 and Matthias Otto 1 1 Institute of Analytical Chemistry, TU Bergakademie Freiberg, Leipziger Str. 29, 09599 Freiberg, Germany 2 Robert Bosch GmbH, Renningen, 70465 Stuttgart, Germany Correspondence should be addressed to Emily Grisanti; [email protected] Received 15 March 2018; Accepted 10 July 2018; Published 28 October 2018 Academic Editor: K. S. V. Krishna Rao Copyright©2018EmilyGrisantietal.isisanopenaccessarticledistributedundertheCreativeCommonsAttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. An essential part of multivariate analysis in spectroscopic context is preprocessing. e aim of preprocessing is to remove scattering phenomena or disturbances in the spectra due to measurement geometry in order to improve subsequent predictive models. Especially in vibrational spectroscopy, the Standard Normal Variate (SNV) transformation has become very popular and is widely used in many practical applications, but standardization is not always ideal when performed across the full spectrum. Herein, three different new standardization techniques are presented that apply SNV to defined regions rather than to the full spectrum: Dynamic Localized SNV (DLSNV), Peak SNV (PSNV) and Partial Peak SNV (PPSNV). DLSNV is an extension of the Localized SNV (LSNV), which allows a dynamic starting point of the localized windows on which the SNV is executed in- dividually. Peak and Partial Peak SNV are based on picking regions from the spectra with a high correlation to the target value and perform SNV on these essential regions to ensure optimal scatter correction. All proposed methods are able to significantly improve the model performance in cross validation and robustness tests compared to SNV. e prediction errors could be reduced by up to 16% and 29% compared with LSNV for two regression models. 1.Introduction Chemometric approaches are becoming increasingly pop- ular as they enable more comprehensive extraction of rel- evant information out of complex data provided by modern instrumental analytics. At the same time, advances in data analysis make it possible to reduce the size of the instrument hardware by compensating for the missing measurement quality of miniaturized instruments. In combination with multivariate calibration, the development of models based on low-cost analytics, such as vibrational spectroscopy, al- lows the development of models that predict parameters usually determined with cost-intensive measuring in- struments or complex methods. Monitoring the alcoholic fermentation [1] and determining the viscosity of engine oil [2, 3] or proteins in milk [4] by spectroscopic means become thus feasible. It has also been possible to determine specific viscosity modifiers and pour point depressant additive compounds in engine oils [5] by FTIR, which is due to the fact that the concentration of a component follows, according to the Lambert–Beer Law, a linear dependency on the light absorbance of the medium [6, 7]. Preprocessing methods play a decisive role for the per- formance of these models, as spectra can be influenced by various disturbing factors that interfere with the significance of the measurement [8–11]. e main influence comes from the measuring geometry, which includes the sample thickness, the distance from the detector to sample, the contact pressure, and the angle from the light source to sample [12, 13]. e elimination of scattering effects by particles of different size and distribution also plays a major role in preprocessing. Different spectroscopic measurement techniques suffer from different major disturbing factors. In near-infrared spectroscopy, it is usually a constant or linear baseline Hindawi Journal of Spectroscopy Volume 2018, Article ID 5037572, 14 pages https://doi.org/10.1155/2018/5037572

Upload: others

Post on 24-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

Research ArticleDynamic Localized SNV Peak SNV and Partial Peak SNVNovel Standardization Methods for Preprocessing ofSpectroscopic Data Used in Predictive Modeling

Emily Grisanti 12 Maria Totska2 Stefan Huber2 Christina Krick Calderon2

Monika Hohmann2 Dominic Lingenfelser2 and Matthias Otto1

1Institute of Analytical Chemistry TU Bergakademie Freiberg Leipziger Str 29 09599 Freiberg Germany2Robert Bosch GmbH Renningen 70465 Stuttgart Germany

Correspondence should be addressed to Emily Grisanti emilygrisantideboschcom

Received 15 March 2018 Accepted 10 July 2018 Published 28 October 2018

Academic Editor K S V Krishna Rao

Copyright copy 2018 Emily Grisanti et al(is is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

An essential part of multivariate analysis in spectroscopic context is preprocessing (e aim of preprocessing is to removescattering phenomena or disturbances in the spectra due to measurement geometry in order to improve subsequent predictivemodels Especially in vibrational spectroscopy the Standard Normal Variate (SNV) transformation has become very popular andis widely used in many practical applications but standardization is not always ideal when performed across the full spectrumHerein three different new standardization techniques are presented that apply SNV to defined regions rather than to the fullspectrum Dynamic Localized SNV (DLSNV) Peak SNV (PSNV) and Partial Peak SNV (PPSNV) DLSNV is an extension of theLocalized SNV (LSNV) which allows a dynamic starting point of the localized windows on which the SNV is executed in-dividually Peak and Partial Peak SNV are based on picking regions from the spectra with a high correlation to the target value andperform SNV on these essential regions to ensure optimal scatter correction All proposed methods are able to significantlyimprove the model performance in cross validation and robustness tests compared to SNV (e prediction errors could bereduced by up to 16 and 29 compared with LSNV for two regression models

1 Introduction

Chemometric approaches are becoming increasingly pop-ular as they enable more comprehensive extraction of rel-evant information out of complex data provided by moderninstrumental analytics At the same time advances in dataanalysis make it possible to reduce the size of the instrumenthardware by compensating for the missing measurementquality of miniaturized instruments In combination withmultivariate calibration the development of models basedon low-cost analytics such as vibrational spectroscopy al-lows the development of models that predict parametersusually determined with cost-intensive measuring in-struments or complex methods Monitoring the alcoholicfermentation [1] and determining the viscosity of engine oil[2 3] or proteins in milk [4] by spectroscopic means becomethus feasible It has also been possible to determine specific

viscosity modifiers and pour point depressant additivecompounds in engine oils [5] by FTIR which is due to thefact that the concentration of a component followsaccording to the LambertndashBeer Law a linear dependency onthe light absorbance of the medium [6 7]

Preprocessing methods play a decisive role for the per-formance of these models as spectra can be influenced byvarious disturbing factors that interfere with the significance ofthe measurement [8ndash11] (e main influence comes from themeasuring geometry which includes the sample thickness thedistance from the detector to sample the contact pressure andthe angle from the light source to sample [12 13] (eelimination of scattering effects by particles of different sizeand distribution also plays a major role in preprocessing

Different spectroscopic measurement techniques sufferfrom different major disturbing factors In near-infraredspectroscopy it is usually a constant or linear baseline

HindawiJournal of SpectroscopyVolume 2018 Article ID 5037572 14 pageshttpsdoiorg10115520185037572

offset due to scattering light Raman spectra often showpolynomial fluorescence background and for mid-infraredspectra the sample thickness and thus the spectroscopicresponse plays a crucial role [14 15] (e information aboutthe sample is present in the shape of the spectrum andindependent of the offset (additive effect) and the scaling ofthe absolute signal intensity (multiplicative effect) (e taskof preprocessing is to remove these interfering factors fromthe informative part of the spectrum and there are differentapproaches for this

A method for eliminating constant offset terms is tocalculate the first derivative [9] (is procedure can be ex-tended to higher-order derivatives also eliminating offsetterms with linear or quadratic baseline curves (e disad-vantage of calculating the deviation of a spectrum is thatnoise effects are amplified

Multiplicative signal correction (MSC) is another toolwhich can deal with the two major effects A referencespectrum in most cases represented by the mean spectrumof the calibration data set is defined and the spectra arecorrected for the baseline and the multiplicative amplifi-cation effects [16 17] (e approach is associated with theKubelkandashMunk theory which takes optical phenomenacaused by light scattering into account [18 19] For eachspectrum the two correction parameters are estimated viaa least squares regression calculation

Standard normal variate (SNV) removes a constant offsetterm by subtracting the mean value of the full spectrum andbrings all spectra to the same scale by subsequent division bythe standard deviation of the full spectrum [20] Due to itssimplicity SNV is a popular preprocessing method [21] SNVand MSC usually yield similar results and are often regardedas exchangeable [22] Since no extra regression step is neededfor the SNV transformation to estimate the correction pa-rameters in the following the focus lies on SNV as themodelsshould be kept as simple as possible

Some efforts have been made to optimize standardiza-tion techniques A piecewise MSC (PMSC) method has beenproposed by Isaksson and Kowalski [23] which significantlyimproved the predictive power of several regression modelsbased on near-infrared transmittance spectra A LocalizedSNV (LSNV) approach has been introduced by Bi et alperforming the SNV not on the full spectrum but on sub-sequent sequences [24] (is strategy also yielded verypromising results in several regression cases based onbenchmark NIR data sets In the following a dynamicversion of the LSNV algorithm called DLSNV is presentedBy allowing for a dynamic starting point of the first andsubsequent SNV windows it is more flexible to align theSNV to important vibrational bands in the spectra PSNVand PPSNV are based on the idea that the standardizationcan be optimized when performed on distinct wavenumberwindows across highly specific regions of the spectrum

2 Experimental

As a sample set data originated from an investigation aboutaging and interaction phenomena in Automatic Trans-mission Fluids (ATF) were used Many ATF samples have

been stored for different periods at several temperatures toproduce artificially aged samples

(e aim of the presented study was to transfer in-formation coming from a highly specific costly and com-plex measurement method (High-Performance LiquidChromatography coupled with Quadrupole Time-of-Flight-Mass Spectrometry (HPLC-QToF-MS)) to datameasured with a low-cost flexible tabletop instrument(Fourier-Transform Infrared (FTIR) spectrometer)(is wasachieved by analyzing each sample coming from the storageexperiment and determining the additive response signals inthese samples by HPLC-QToF-MS By using these additiveresponses as reference values a calibration model wascreated in order to be able to predict the concentration of theadditive compounds in the samples by evaluating the FTIRspectra (e new standardization techniques proposed hereare being tested for the regression models

21 Additive Compounds Two additive compounds fromtwo different ATF oils were analyzed

Within ATF A an unsaturated ethoxylated amineknown as friction modifierWithin ATF B a bis-tert-butyl-hydroxytoluene (BHT)derivate known as phenolic antioxidant

22 Samples and Experiments For the investigation ofdegradation phenomena in ATFs a comprehensive storageexperiment had been set up (e effects of different materialson ATFs and the impact of temperature on oil aging should beanalyzed (erefore the ATFs were stored under variousconditions in an oven (ree parameters had been varied thestorage temperature the storage time and added materials(e storage times had been adjusted to the temperatures sothat a comparable load according to Arrhenius Law could beexpected (e parameters are listed in Table 1

For all timetemperature combinations three interactionexperiments have been conducted

(i) storage with pure oil(ii) storage with oil plus copper alloy chips(iii) storage with oil plus chips from copper alloy iron

and PA66

(e samples were prepared by storing 100ml fresh oil ina glass jar with a screw cap (e lid had been manipulatedwith a central hole that allowed air exchange

23 Sample Measurements

231 FTIR (e FTIR spectra were collected in transmis-sion with a Bruker Alpha instrument in combination withthe QuickSnapTM transmission sample compartment inthe wavenumber region ranging from 4000 to 600 cmminus1 witha spectral resolution of 4 cmminus1

(e samples were measured without any special samplepreparation with two different setups (1) a droplet of ATFbetween two potassium bromide (KBr) discs separated by

2 Journal of Spectroscopy

a teflon spacer with the thickness of about 50 μm and (2)fixed KBr cuvette of 100 μm thickness filled with ATF

After each sample measurement the KBr discs and thecuvette were rinsed several times with petroleum ether inorder to prevent cross contamination (e cuvette was driedwith N2 gas after rinsing and the KBr discs were dried un-der ambient air For the measurement type (1) 4 spectra persample were recorded and for type (2) one spectrum persample was recorded

Due to the sample layer thickness the hydrocarbon bandsare saturated and therefore the spectra had to be cut in thewavenumber regions between 3000 and 2815 cmminus1 (C-Hstretching mode) and between 1491 and 1424 cmminus1 (C-Hbending and rocking mode) Additionally the CO2 bandswere eliminated by cutting out the region from 2387 to2285 cmminus1 as well(e spectra of ATFA are shown in Figure 1in transmission without any preprocessing as measured inFigure 1(b) after truncation and SNV transformation and inFigure 1(c) SNV transformed after calculating the absorbancespectra by using A minuslog(T) In Figure 2 the same diagramsare shown for ATF B In both cases two series of curves can bediscriminated from the raw spectra by the eye (e blue seriescomes from measurement type (1) and the red set comesfrom the cuvette measurements (2) To combine the two datasets from the measurement setups (1) and (2) are challengingtasks for a predictive model as the main variance is due to thethickness variation(e data set demonstrates the importanceof suitable and sophisticated preprocessing methods in orderto eliminate the difference in the spectra induced by thevarying sample thickness (e standardization techniquespresented here are able to meet this need

232 Liquid Chromatography Coupled with Mass Spectrometry(e measurements for the determination of the additivecompound signals were performed with an Agilent liquidchromatograph 1260 coupled with a high-resolution QToF6540 mass spectrometer with methanolwaterammoniumacetate and isopropanol as an eluent Ionization was carriedout bymeans of electrospray (ESI)(e final compound peakarea data set was created using the Agilent MassHunterQualitative Analysis B 0600 analysis software

(e response signals of the additive compounds arestandardized by subtracting mean and dividing by standarddeviation in order to bring all signal values on the same scale(e standardized signals are depicted in Figure 3

3 Methods

31 Implementation (e proposed novel standardizationmethods and respective optimization processes were imple-mented via Python scripts

32 Regression AlgorithmmdashRidge For the prediction theridge regression estimator implemented in the Python scikit-learn framework for machine learning applications was used[25] It is a linear model which solves a regression task via theleast squares loss function J(w) with L2 regularization [26]Regularization is an approach to minimize the issue ofoverfitting which is particularly important for high-dimensional data such as FTIR spectra by controlling thequadratic sum of the model coefficient w (is is done byadding the penalizing term L2 weighted by the hyper pa-rameter λ

λw2 λ 1113944m

j1w

2j (1)

(us the loss function is defined as

J(w) 1113944n

i1yi minusyipred1113872 1113873

2+ λw2 (2)

where yi stands for the reference value of the ith sample andyipred for the prediction of this sample Since the perfor-mance of the preprocessing methods has to be assessedindependently from the actually used predictive regressionmodel the same regression model with identical hyper-parameter λ was applied to the various preprocessed datasets For the regression of the friction modifier compound ofATF A λ 5 and for the antioxidant of ATF B λ 3 wasused (ese parameters turned out to be the best choicesregarding cross validation and robustness for the SNVtransformed data set in a previously conducted internalstudy

33 Model Performance Evaluation To assess the perfor-mance of our models two different approaches were chosennamely the predictive power under cross validation andnoise addition

331 Cross Validation For cross validation the mean fromthe different measurements of one sample was calculated(e sample set was randomly divided 50 times into a cali-bration and validation set by taking 70 of the data astraining samples and 30 as test samples in each validationiteration with different combinations Each separation runwas provided with a unique random seed to ensure that thedata set was split into the same training and test sets for eachmodel enabling better comparability of results between thedifferent models

332 Robustness against Noise In order to assess the modelperformance under noisy input spectra the model wascalibrated by the full original data set Random Gaussian-distributed white noise was added to each data point (eseperturbed samples were predicted by the model and theprediction error was monitored (is was done for differ-ent noise levels (e random numbers added to each datapoint were generated by a standard normal distributed(mean μ 0 and standard deviation σ 1) random

TABLE 1

Temperature (degC) Storage time (h)120 500 1000 2000 3000140 105 210 415 625160 25 50 105 165

Journal of Spectroscopy 3

number generator e noise levels were de13ned by thefactors (005 010 015 020 025 030 035 040 and 045)which were multiplied with the output of the randomnumber generator For each noise level 50 simulated noisydata sets were generated and predicted by the pretrainedmodel in order to be able to make well-founded statementsabout the model performance under noise perturbation

e noise robustness workow is a very helpful tool toinvestigate whether a good calibration error is a real advantageor if the model ran into over13tting Using the same regressionalgorithm twice with dishyerent regularization parameters λ thelower regularized model will generate a lower initial calibration

error than the more stringent regularized model But if themodels are tested for robustness the latter tends to have a lowererror slope when the noise level increases

34 Evaluation Metrics e built-in functions R2 score andmean squared error (MSE) of the scikit-learn frameworkwere used as performance metrics

341 Mean Squared Error (MSE) e mean squared error(MSE) of a prediction is calculated by the squared dishyerencesbetween the predicted value yipred and the reference value yi

3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)

12

10

08

06

04

02

(a)

3500 2500 1500Wavenumbers (cmndash1)

Tran

smiss

ion

(au

) 2

0ndash1ndash2ndash3

1

(b)

3500 2500 1500

Abs

orba

nce (

au)

Wavenumbers (cmndash1)

86420

(c)

Figure 1 FTIR spectra of ATF (a) Raw full transmission spectra without any preprocessing e two data sets with dishyerent measurementsetups can be discriminated by eye e blue spectra originate from measurement type (1) with two KBr discs separated by a Teon spacerand the red set of curves originates from the cuvette measurement (2) (b) SNV-transformed transmission spectra after truncation of thesaturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation

3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)

14

12

10

08

06

02

(a)

04

3500 2500 1500

Abs

orba

nce (

au)

Wavenumbers (cmndash1)

86420

(c)

3500 2500 1500Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)0

ndash1

ndash2

ndash3

1

(b)

Figure 2 FTIR spectra of ATF B (a) Raw full transmission spectra without any preprocessing (b) SNV-transformed transmission spectraafter truncation of the saturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation

4 Journal of Spectroscopy

of the ith sample For a given data set with n samples theMSEis the average value over all samples It follows the followingformula [27]

MSE y ypred( ) 1nsumn

i1yi minusyipred( )

2 (3)

e best possible MSE value is 0 and small values aredesirable as the deviation from the correct prediction is lowFrom MSE the root-mean-squared error (RMSE) was cal-culated by taking the square root e RMSE value has thesame dimension as the original reference target values

342 R2 Coecient of Determination R2 describes theportion of the variance in the target values (dependentvariables) that can be predicted from the spectra (in-dependent variables) by the model [28] e best possiblescore for R2 is 10 R2 gets 00 for a constant model whichpredicts a constant value disregarding of the input featuresFor linear regression modeling with intercept R2 is equal tothe square of Pearson correlation coecient between pre-dicted and reference target values [29] For a data setcomprising n samples the R2 score is given as

R2 y ypred( ) 1minussumni1 yi minusyipred( )

2

sumni1 yi minusypred( )2 (4)

where yipred is the model prediction of the ith sample whichhas a reference value yi and ypred is the mean value of allpredictions

ypred 1nsumn

i1yipred (5)

35 Standard Normal Variate Each spectrumx (x1 x2 xk) with k measured data points istransformed to the standardized form z (z1 z2 zk)by bringing the spectra to zero mean and unit varianceFor this purpose the mean spectrum x is subtracted fromeach data point xi and divided by the standard deviation

zi xi minus x

sumkj xi minusx( )2kradic (6)

with

x 1ksumk

j

xj (7)

351 Dynamic Localized SNV (DLSNV) e DLSNVworkow is based on the SNV-transformed spectra data set(Figure 4(a)) To calculate the DLSNV data the spectra aredivided into multiple regions On each of these regionsstandardization is performed To adjust the windows toimportant areas in the spectrum a starting point can bede13ned In Figure 4(b) the DLSNV spectra are shown witha starting point of 100 and a window size of 300 pixels

DLSNV algorithm

(i) Perform SNV on a window of the spectrum rangingfrom 13rst data point to the sth one

(ii) Subdivide spectra from sth data point into windowsof all the same size ws

To optimize the two parameters window size ws andstarting point s a three-step approach is performed In eachstep the predictive power of the model is assessed via the

Storage time (h)

Stan

dard

ized

mol

ecul

e sig

nals

(au

)

2

1

0

ndash1

ndash2

ndash3

ndash4

Pure oilCopper chipsMaterial mix

100 200 300 400 500 600 700

(a)

2

1

0

ndash1

ndash2

ndash3

Storage time (h)

Stan

dard

ized

mol

ecul

e sig

nals

(au

)

Pure oilCopper chipsMaterial mix

100 200 300 400 500 600 700

(b)

Figure 3 Standardized additive responses used as target value for the FTIR regressionmodel for (a) the friction modi13er compound and (b)the antioxidant plotted against time for storage temperature 140degC for all three storage experiments measured by HPLC-QToF

Journal of Spectroscopy 5

coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows

(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1

(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt

(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt

In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model

In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level

352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected

(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows

PSNV algorithm

(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows

To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window

agg and calculate the centroid of the agglomeratedPOIs

(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and

50 data points and choose aggopt according tomaximal R2

After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)

SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Window SNV

Wavenumbers (cmndash1)2500

Find optimalwindow size and

start point

(b)(a) (c)

Final window SNV

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)

095

095

095

097

097

097

R2 Optimization process

R2

R2

Window size 1st run

Starting point

Window size 2nd run

100 200 300 400 500

0 20 40 60 80 100 120

50 60 70 80 90 100 110 120

Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV

6 Journal of Spectroscopy

353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows

PPSNV algorithm

(i) Perform SNV across the POIs with a left and rightmargin of pw

e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01

Wavenumbers (cmndash1) Wavenumbers (cmndash1)

(b)(a)Wavenumbers (cmndash1)

2500

3500 3450 3400 3350 3500 3450 3400 3350

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)(c)

Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level

R2

Agglomeration window size

Optimization

(c)Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform

peak SNV

Wavenumbers (cmndash1)2500

Model coefficientsselected peaks

(b)(a)

10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Final peak SNV

(d)

Find optimalagglomerationwindow size 095

096

Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV

Journal of Spectroscopy 7

Reference friction modifier peak volume

Nonlinearity

ndash3

ndash2

ndash1

0

1

2

Calibration samplesValidation samples

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Decreasein additive concentration

ndash3 ndash2 ndash1 0 1 2

(a)

Calibration samplesValidation samples

Reference friction modifier peak volume

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

ndash3 ndash2 ndash1 0 1 2

(b)

Calibration samplesValidation samples

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2

(c)

Calibration samplesValidation samples

ndash3 ndash2 ndash1 0 1 2

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volume

(d)

Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower

R2

Window size0 50 100 150 200

Optimization

(c)

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform PartPeak SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Model coefficientsselected peaks

(b)(a)

150010000Data points

500

Final part peak SNV

(d)

Find optimalwindow size

092

097

Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV

8 Journal of Spectroscopy

(3) Perform PPSNV across the peaks with the windowsize pw

(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2

4 Results and Discussion

41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye

In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved

e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV

It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra

For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP

RMSE

P Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

+ ++

+ +

+++

++++

00

02

04

06

08

10

12

14

16

(a) (b)

Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV

Journal of Spectroscopy 9

e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation

411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or

(a) (b)

RMSE

P

Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00

02

04

06

08

10

12

14

+

+

++

++

+

++

+

Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV

Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed

MethodFriction modi13er ATF A Antioxidant ATF B

Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)

10 Journal of Spectroscopy

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 2: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

offset due to scattering light Raman spectra often showpolynomial fluorescence background and for mid-infraredspectra the sample thickness and thus the spectroscopicresponse plays a crucial role [14 15] (e information aboutthe sample is present in the shape of the spectrum andindependent of the offset (additive effect) and the scaling ofthe absolute signal intensity (multiplicative effect) (e taskof preprocessing is to remove these interfering factors fromthe informative part of the spectrum and there are differentapproaches for this

A method for eliminating constant offset terms is tocalculate the first derivative [9] (is procedure can be ex-tended to higher-order derivatives also eliminating offsetterms with linear or quadratic baseline curves (e disad-vantage of calculating the deviation of a spectrum is thatnoise effects are amplified

Multiplicative signal correction (MSC) is another toolwhich can deal with the two major effects A referencespectrum in most cases represented by the mean spectrumof the calibration data set is defined and the spectra arecorrected for the baseline and the multiplicative amplifi-cation effects [16 17] (e approach is associated with theKubelkandashMunk theory which takes optical phenomenacaused by light scattering into account [18 19] For eachspectrum the two correction parameters are estimated viaa least squares regression calculation

Standard normal variate (SNV) removes a constant offsetterm by subtracting the mean value of the full spectrum andbrings all spectra to the same scale by subsequent division bythe standard deviation of the full spectrum [20] Due to itssimplicity SNV is a popular preprocessing method [21] SNVand MSC usually yield similar results and are often regardedas exchangeable [22] Since no extra regression step is neededfor the SNV transformation to estimate the correction pa-rameters in the following the focus lies on SNV as themodelsshould be kept as simple as possible

Some efforts have been made to optimize standardiza-tion techniques A piecewise MSC (PMSC) method has beenproposed by Isaksson and Kowalski [23] which significantlyimproved the predictive power of several regression modelsbased on near-infrared transmittance spectra A LocalizedSNV (LSNV) approach has been introduced by Bi et alperforming the SNV not on the full spectrum but on sub-sequent sequences [24] (is strategy also yielded verypromising results in several regression cases based onbenchmark NIR data sets In the following a dynamicversion of the LSNV algorithm called DLSNV is presentedBy allowing for a dynamic starting point of the first andsubsequent SNV windows it is more flexible to align theSNV to important vibrational bands in the spectra PSNVand PPSNV are based on the idea that the standardizationcan be optimized when performed on distinct wavenumberwindows across highly specific regions of the spectrum

2 Experimental

As a sample set data originated from an investigation aboutaging and interaction phenomena in Automatic Trans-mission Fluids (ATF) were used Many ATF samples have

been stored for different periods at several temperatures toproduce artificially aged samples

(e aim of the presented study was to transfer in-formation coming from a highly specific costly and com-plex measurement method (High-Performance LiquidChromatography coupled with Quadrupole Time-of-Flight-Mass Spectrometry (HPLC-QToF-MS)) to datameasured with a low-cost flexible tabletop instrument(Fourier-Transform Infrared (FTIR) spectrometer)(is wasachieved by analyzing each sample coming from the storageexperiment and determining the additive response signals inthese samples by HPLC-QToF-MS By using these additiveresponses as reference values a calibration model wascreated in order to be able to predict the concentration of theadditive compounds in the samples by evaluating the FTIRspectra (e new standardization techniques proposed hereare being tested for the regression models

21 Additive Compounds Two additive compounds fromtwo different ATF oils were analyzed

Within ATF A an unsaturated ethoxylated amineknown as friction modifierWithin ATF B a bis-tert-butyl-hydroxytoluene (BHT)derivate known as phenolic antioxidant

22 Samples and Experiments For the investigation ofdegradation phenomena in ATFs a comprehensive storageexperiment had been set up (e effects of different materialson ATFs and the impact of temperature on oil aging should beanalyzed (erefore the ATFs were stored under variousconditions in an oven (ree parameters had been varied thestorage temperature the storage time and added materials(e storage times had been adjusted to the temperatures sothat a comparable load according to Arrhenius Law could beexpected (e parameters are listed in Table 1

For all timetemperature combinations three interactionexperiments have been conducted

(i) storage with pure oil(ii) storage with oil plus copper alloy chips(iii) storage with oil plus chips from copper alloy iron

and PA66

(e samples were prepared by storing 100ml fresh oil ina glass jar with a screw cap (e lid had been manipulatedwith a central hole that allowed air exchange

23 Sample Measurements

231 FTIR (e FTIR spectra were collected in transmis-sion with a Bruker Alpha instrument in combination withthe QuickSnapTM transmission sample compartment inthe wavenumber region ranging from 4000 to 600 cmminus1 witha spectral resolution of 4 cmminus1

(e samples were measured without any special samplepreparation with two different setups (1) a droplet of ATFbetween two potassium bromide (KBr) discs separated by

2 Journal of Spectroscopy

a teflon spacer with the thickness of about 50 μm and (2)fixed KBr cuvette of 100 μm thickness filled with ATF

After each sample measurement the KBr discs and thecuvette were rinsed several times with petroleum ether inorder to prevent cross contamination (e cuvette was driedwith N2 gas after rinsing and the KBr discs were dried un-der ambient air For the measurement type (1) 4 spectra persample were recorded and for type (2) one spectrum persample was recorded

Due to the sample layer thickness the hydrocarbon bandsare saturated and therefore the spectra had to be cut in thewavenumber regions between 3000 and 2815 cmminus1 (C-Hstretching mode) and between 1491 and 1424 cmminus1 (C-Hbending and rocking mode) Additionally the CO2 bandswere eliminated by cutting out the region from 2387 to2285 cmminus1 as well(e spectra of ATFA are shown in Figure 1in transmission without any preprocessing as measured inFigure 1(b) after truncation and SNV transformation and inFigure 1(c) SNV transformed after calculating the absorbancespectra by using A minuslog(T) In Figure 2 the same diagramsare shown for ATF B In both cases two series of curves can bediscriminated from the raw spectra by the eye (e blue seriescomes from measurement type (1) and the red set comesfrom the cuvette measurements (2) To combine the two datasets from the measurement setups (1) and (2) are challengingtasks for a predictive model as the main variance is due to thethickness variation(e data set demonstrates the importanceof suitable and sophisticated preprocessing methods in orderto eliminate the difference in the spectra induced by thevarying sample thickness (e standardization techniquespresented here are able to meet this need

232 Liquid Chromatography Coupled with Mass Spectrometry(e measurements for the determination of the additivecompound signals were performed with an Agilent liquidchromatograph 1260 coupled with a high-resolution QToF6540 mass spectrometer with methanolwaterammoniumacetate and isopropanol as an eluent Ionization was carriedout bymeans of electrospray (ESI)(e final compound peakarea data set was created using the Agilent MassHunterQualitative Analysis B 0600 analysis software

(e response signals of the additive compounds arestandardized by subtracting mean and dividing by standarddeviation in order to bring all signal values on the same scale(e standardized signals are depicted in Figure 3

3 Methods

31 Implementation (e proposed novel standardizationmethods and respective optimization processes were imple-mented via Python scripts

32 Regression AlgorithmmdashRidge For the prediction theridge regression estimator implemented in the Python scikit-learn framework for machine learning applications was used[25] It is a linear model which solves a regression task via theleast squares loss function J(w) with L2 regularization [26]Regularization is an approach to minimize the issue ofoverfitting which is particularly important for high-dimensional data such as FTIR spectra by controlling thequadratic sum of the model coefficient w (is is done byadding the penalizing term L2 weighted by the hyper pa-rameter λ

λw2 λ 1113944m

j1w

2j (1)

(us the loss function is defined as

J(w) 1113944n

i1yi minusyipred1113872 1113873

2+ λw2 (2)

where yi stands for the reference value of the ith sample andyipred for the prediction of this sample Since the perfor-mance of the preprocessing methods has to be assessedindependently from the actually used predictive regressionmodel the same regression model with identical hyper-parameter λ was applied to the various preprocessed datasets For the regression of the friction modifier compound ofATF A λ 5 and for the antioxidant of ATF B λ 3 wasused (ese parameters turned out to be the best choicesregarding cross validation and robustness for the SNVtransformed data set in a previously conducted internalstudy

33 Model Performance Evaluation To assess the perfor-mance of our models two different approaches were chosennamely the predictive power under cross validation andnoise addition

331 Cross Validation For cross validation the mean fromthe different measurements of one sample was calculated(e sample set was randomly divided 50 times into a cali-bration and validation set by taking 70 of the data astraining samples and 30 as test samples in each validationiteration with different combinations Each separation runwas provided with a unique random seed to ensure that thedata set was split into the same training and test sets for eachmodel enabling better comparability of results between thedifferent models

332 Robustness against Noise In order to assess the modelperformance under noisy input spectra the model wascalibrated by the full original data set Random Gaussian-distributed white noise was added to each data point (eseperturbed samples were predicted by the model and theprediction error was monitored (is was done for differ-ent noise levels (e random numbers added to each datapoint were generated by a standard normal distributed(mean μ 0 and standard deviation σ 1) random

TABLE 1

Temperature (degC) Storage time (h)120 500 1000 2000 3000140 105 210 415 625160 25 50 105 165

Journal of Spectroscopy 3

number generator e noise levels were de13ned by thefactors (005 010 015 020 025 030 035 040 and 045)which were multiplied with the output of the randomnumber generator For each noise level 50 simulated noisydata sets were generated and predicted by the pretrainedmodel in order to be able to make well-founded statementsabout the model performance under noise perturbation

e noise robustness workow is a very helpful tool toinvestigate whether a good calibration error is a real advantageor if the model ran into over13tting Using the same regressionalgorithm twice with dishyerent regularization parameters λ thelower regularized model will generate a lower initial calibration

error than the more stringent regularized model But if themodels are tested for robustness the latter tends to have a lowererror slope when the noise level increases

34 Evaluation Metrics e built-in functions R2 score andmean squared error (MSE) of the scikit-learn frameworkwere used as performance metrics

341 Mean Squared Error (MSE) e mean squared error(MSE) of a prediction is calculated by the squared dishyerencesbetween the predicted value yipred and the reference value yi

3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)

12

10

08

06

04

02

(a)

3500 2500 1500Wavenumbers (cmndash1)

Tran

smiss

ion

(au

) 2

0ndash1ndash2ndash3

1

(b)

3500 2500 1500

Abs

orba

nce (

au)

Wavenumbers (cmndash1)

86420

(c)

Figure 1 FTIR spectra of ATF (a) Raw full transmission spectra without any preprocessing e two data sets with dishyerent measurementsetups can be discriminated by eye e blue spectra originate from measurement type (1) with two KBr discs separated by a Teon spacerand the red set of curves originates from the cuvette measurement (2) (b) SNV-transformed transmission spectra after truncation of thesaturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation

3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)

14

12

10

08

06

02

(a)

04

3500 2500 1500

Abs

orba

nce (

au)

Wavenumbers (cmndash1)

86420

(c)

3500 2500 1500Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)0

ndash1

ndash2

ndash3

1

(b)

Figure 2 FTIR spectra of ATF B (a) Raw full transmission spectra without any preprocessing (b) SNV-transformed transmission spectraafter truncation of the saturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation

4 Journal of Spectroscopy

of the ith sample For a given data set with n samples theMSEis the average value over all samples It follows the followingformula [27]

MSE y ypred( ) 1nsumn

i1yi minusyipred( )

2 (3)

e best possible MSE value is 0 and small values aredesirable as the deviation from the correct prediction is lowFrom MSE the root-mean-squared error (RMSE) was cal-culated by taking the square root e RMSE value has thesame dimension as the original reference target values

342 R2 Coecient of Determination R2 describes theportion of the variance in the target values (dependentvariables) that can be predicted from the spectra (in-dependent variables) by the model [28] e best possiblescore for R2 is 10 R2 gets 00 for a constant model whichpredicts a constant value disregarding of the input featuresFor linear regression modeling with intercept R2 is equal tothe square of Pearson correlation coecient between pre-dicted and reference target values [29] For a data setcomprising n samples the R2 score is given as

R2 y ypred( ) 1minussumni1 yi minusyipred( )

2

sumni1 yi minusypred( )2 (4)

where yipred is the model prediction of the ith sample whichhas a reference value yi and ypred is the mean value of allpredictions

ypred 1nsumn

i1yipred (5)

35 Standard Normal Variate Each spectrumx (x1 x2 xk) with k measured data points istransformed to the standardized form z (z1 z2 zk)by bringing the spectra to zero mean and unit varianceFor this purpose the mean spectrum x is subtracted fromeach data point xi and divided by the standard deviation

zi xi minus x

sumkj xi minusx( )2kradic (6)

with

x 1ksumk

j

xj (7)

351 Dynamic Localized SNV (DLSNV) e DLSNVworkow is based on the SNV-transformed spectra data set(Figure 4(a)) To calculate the DLSNV data the spectra aredivided into multiple regions On each of these regionsstandardization is performed To adjust the windows toimportant areas in the spectrum a starting point can bede13ned In Figure 4(b) the DLSNV spectra are shown witha starting point of 100 and a window size of 300 pixels

DLSNV algorithm

(i) Perform SNV on a window of the spectrum rangingfrom 13rst data point to the sth one

(ii) Subdivide spectra from sth data point into windowsof all the same size ws

To optimize the two parameters window size ws andstarting point s a three-step approach is performed In eachstep the predictive power of the model is assessed via the

Storage time (h)

Stan

dard

ized

mol

ecul

e sig

nals

(au

)

2

1

0

ndash1

ndash2

ndash3

ndash4

Pure oilCopper chipsMaterial mix

100 200 300 400 500 600 700

(a)

2

1

0

ndash1

ndash2

ndash3

Storage time (h)

Stan

dard

ized

mol

ecul

e sig

nals

(au

)

Pure oilCopper chipsMaterial mix

100 200 300 400 500 600 700

(b)

Figure 3 Standardized additive responses used as target value for the FTIR regressionmodel for (a) the friction modi13er compound and (b)the antioxidant plotted against time for storage temperature 140degC for all three storage experiments measured by HPLC-QToF

Journal of Spectroscopy 5

coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows

(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1

(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt

(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt

In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model

In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level

352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected

(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows

PSNV algorithm

(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows

To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window

agg and calculate the centroid of the agglomeratedPOIs

(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and

50 data points and choose aggopt according tomaximal R2

After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)

SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Window SNV

Wavenumbers (cmndash1)2500

Find optimalwindow size and

start point

(b)(a) (c)

Final window SNV

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)

095

095

095

097

097

097

R2 Optimization process

R2

R2

Window size 1st run

Starting point

Window size 2nd run

100 200 300 400 500

0 20 40 60 80 100 120

50 60 70 80 90 100 110 120

Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV

6 Journal of Spectroscopy

353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows

PPSNV algorithm

(i) Perform SNV across the POIs with a left and rightmargin of pw

e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01

Wavenumbers (cmndash1) Wavenumbers (cmndash1)

(b)(a)Wavenumbers (cmndash1)

2500

3500 3450 3400 3350 3500 3450 3400 3350

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)(c)

Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level

R2

Agglomeration window size

Optimization

(c)Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform

peak SNV

Wavenumbers (cmndash1)2500

Model coefficientsselected peaks

(b)(a)

10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Final peak SNV

(d)

Find optimalagglomerationwindow size 095

096

Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV

Journal of Spectroscopy 7

Reference friction modifier peak volume

Nonlinearity

ndash3

ndash2

ndash1

0

1

2

Calibration samplesValidation samples

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Decreasein additive concentration

ndash3 ndash2 ndash1 0 1 2

(a)

Calibration samplesValidation samples

Reference friction modifier peak volume

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

ndash3 ndash2 ndash1 0 1 2

(b)

Calibration samplesValidation samples

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2

(c)

Calibration samplesValidation samples

ndash3 ndash2 ndash1 0 1 2

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volume

(d)

Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower

R2

Window size0 50 100 150 200

Optimization

(c)

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform PartPeak SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Model coefficientsselected peaks

(b)(a)

150010000Data points

500

Final part peak SNV

(d)

Find optimalwindow size

092

097

Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV

8 Journal of Spectroscopy

(3) Perform PPSNV across the peaks with the windowsize pw

(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2

4 Results and Discussion

41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye

In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved

e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV

It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra

For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP

RMSE

P Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

+ ++

+ +

+++

++++

00

02

04

06

08

10

12

14

16

(a) (b)

Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV

Journal of Spectroscopy 9

e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation

411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or

(a) (b)

RMSE

P

Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00

02

04

06

08

10

12

14

+

+

++

++

+

++

+

Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV

Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed

MethodFriction modi13er ATF A Antioxidant ATF B

Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)

10 Journal of Spectroscopy

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 3: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

a teflon spacer with the thickness of about 50 μm and (2)fixed KBr cuvette of 100 μm thickness filled with ATF

After each sample measurement the KBr discs and thecuvette were rinsed several times with petroleum ether inorder to prevent cross contamination (e cuvette was driedwith N2 gas after rinsing and the KBr discs were dried un-der ambient air For the measurement type (1) 4 spectra persample were recorded and for type (2) one spectrum persample was recorded

Due to the sample layer thickness the hydrocarbon bandsare saturated and therefore the spectra had to be cut in thewavenumber regions between 3000 and 2815 cmminus1 (C-Hstretching mode) and between 1491 and 1424 cmminus1 (C-Hbending and rocking mode) Additionally the CO2 bandswere eliminated by cutting out the region from 2387 to2285 cmminus1 as well(e spectra of ATFA are shown in Figure 1in transmission without any preprocessing as measured inFigure 1(b) after truncation and SNV transformation and inFigure 1(c) SNV transformed after calculating the absorbancespectra by using A minuslog(T) In Figure 2 the same diagramsare shown for ATF B In both cases two series of curves can bediscriminated from the raw spectra by the eye (e blue seriescomes from measurement type (1) and the red set comesfrom the cuvette measurements (2) To combine the two datasets from the measurement setups (1) and (2) are challengingtasks for a predictive model as the main variance is due to thethickness variation(e data set demonstrates the importanceof suitable and sophisticated preprocessing methods in orderto eliminate the difference in the spectra induced by thevarying sample thickness (e standardization techniquespresented here are able to meet this need

232 Liquid Chromatography Coupled with Mass Spectrometry(e measurements for the determination of the additivecompound signals were performed with an Agilent liquidchromatograph 1260 coupled with a high-resolution QToF6540 mass spectrometer with methanolwaterammoniumacetate and isopropanol as an eluent Ionization was carriedout bymeans of electrospray (ESI)(e final compound peakarea data set was created using the Agilent MassHunterQualitative Analysis B 0600 analysis software

(e response signals of the additive compounds arestandardized by subtracting mean and dividing by standarddeviation in order to bring all signal values on the same scale(e standardized signals are depicted in Figure 3

3 Methods

31 Implementation (e proposed novel standardizationmethods and respective optimization processes were imple-mented via Python scripts

32 Regression AlgorithmmdashRidge For the prediction theridge regression estimator implemented in the Python scikit-learn framework for machine learning applications was used[25] It is a linear model which solves a regression task via theleast squares loss function J(w) with L2 regularization [26]Regularization is an approach to minimize the issue ofoverfitting which is particularly important for high-dimensional data such as FTIR spectra by controlling thequadratic sum of the model coefficient w (is is done byadding the penalizing term L2 weighted by the hyper pa-rameter λ

λw2 λ 1113944m

j1w

2j (1)

(us the loss function is defined as

J(w) 1113944n

i1yi minusyipred1113872 1113873

2+ λw2 (2)

where yi stands for the reference value of the ith sample andyipred for the prediction of this sample Since the perfor-mance of the preprocessing methods has to be assessedindependently from the actually used predictive regressionmodel the same regression model with identical hyper-parameter λ was applied to the various preprocessed datasets For the regression of the friction modifier compound ofATF A λ 5 and for the antioxidant of ATF B λ 3 wasused (ese parameters turned out to be the best choicesregarding cross validation and robustness for the SNVtransformed data set in a previously conducted internalstudy

33 Model Performance Evaluation To assess the perfor-mance of our models two different approaches were chosennamely the predictive power under cross validation andnoise addition

331 Cross Validation For cross validation the mean fromthe different measurements of one sample was calculated(e sample set was randomly divided 50 times into a cali-bration and validation set by taking 70 of the data astraining samples and 30 as test samples in each validationiteration with different combinations Each separation runwas provided with a unique random seed to ensure that thedata set was split into the same training and test sets for eachmodel enabling better comparability of results between thedifferent models

332 Robustness against Noise In order to assess the modelperformance under noisy input spectra the model wascalibrated by the full original data set Random Gaussian-distributed white noise was added to each data point (eseperturbed samples were predicted by the model and theprediction error was monitored (is was done for differ-ent noise levels (e random numbers added to each datapoint were generated by a standard normal distributed(mean μ 0 and standard deviation σ 1) random

TABLE 1

Temperature (degC) Storage time (h)120 500 1000 2000 3000140 105 210 415 625160 25 50 105 165

Journal of Spectroscopy 3

number generator e noise levels were de13ned by thefactors (005 010 015 020 025 030 035 040 and 045)which were multiplied with the output of the randomnumber generator For each noise level 50 simulated noisydata sets were generated and predicted by the pretrainedmodel in order to be able to make well-founded statementsabout the model performance under noise perturbation

e noise robustness workow is a very helpful tool toinvestigate whether a good calibration error is a real advantageor if the model ran into over13tting Using the same regressionalgorithm twice with dishyerent regularization parameters λ thelower regularized model will generate a lower initial calibration

error than the more stringent regularized model But if themodels are tested for robustness the latter tends to have a lowererror slope when the noise level increases

34 Evaluation Metrics e built-in functions R2 score andmean squared error (MSE) of the scikit-learn frameworkwere used as performance metrics

341 Mean Squared Error (MSE) e mean squared error(MSE) of a prediction is calculated by the squared dishyerencesbetween the predicted value yipred and the reference value yi

3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)

12

10

08

06

04

02

(a)

3500 2500 1500Wavenumbers (cmndash1)

Tran

smiss

ion

(au

) 2

0ndash1ndash2ndash3

1

(b)

3500 2500 1500

Abs

orba

nce (

au)

Wavenumbers (cmndash1)

86420

(c)

Figure 1 FTIR spectra of ATF (a) Raw full transmission spectra without any preprocessing e two data sets with dishyerent measurementsetups can be discriminated by eye e blue spectra originate from measurement type (1) with two KBr discs separated by a Teon spacerand the red set of curves originates from the cuvette measurement (2) (b) SNV-transformed transmission spectra after truncation of thesaturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation

3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)

14

12

10

08

06

02

(a)

04

3500 2500 1500

Abs

orba

nce (

au)

Wavenumbers (cmndash1)

86420

(c)

3500 2500 1500Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)0

ndash1

ndash2

ndash3

1

(b)

Figure 2 FTIR spectra of ATF B (a) Raw full transmission spectra without any preprocessing (b) SNV-transformed transmission spectraafter truncation of the saturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation

4 Journal of Spectroscopy

of the ith sample For a given data set with n samples theMSEis the average value over all samples It follows the followingformula [27]

MSE y ypred( ) 1nsumn

i1yi minusyipred( )

2 (3)

e best possible MSE value is 0 and small values aredesirable as the deviation from the correct prediction is lowFrom MSE the root-mean-squared error (RMSE) was cal-culated by taking the square root e RMSE value has thesame dimension as the original reference target values

342 R2 Coecient of Determination R2 describes theportion of the variance in the target values (dependentvariables) that can be predicted from the spectra (in-dependent variables) by the model [28] e best possiblescore for R2 is 10 R2 gets 00 for a constant model whichpredicts a constant value disregarding of the input featuresFor linear regression modeling with intercept R2 is equal tothe square of Pearson correlation coecient between pre-dicted and reference target values [29] For a data setcomprising n samples the R2 score is given as

R2 y ypred( ) 1minussumni1 yi minusyipred( )

2

sumni1 yi minusypred( )2 (4)

where yipred is the model prediction of the ith sample whichhas a reference value yi and ypred is the mean value of allpredictions

ypred 1nsumn

i1yipred (5)

35 Standard Normal Variate Each spectrumx (x1 x2 xk) with k measured data points istransformed to the standardized form z (z1 z2 zk)by bringing the spectra to zero mean and unit varianceFor this purpose the mean spectrum x is subtracted fromeach data point xi and divided by the standard deviation

zi xi minus x

sumkj xi minusx( )2kradic (6)

with

x 1ksumk

j

xj (7)

351 Dynamic Localized SNV (DLSNV) e DLSNVworkow is based on the SNV-transformed spectra data set(Figure 4(a)) To calculate the DLSNV data the spectra aredivided into multiple regions On each of these regionsstandardization is performed To adjust the windows toimportant areas in the spectrum a starting point can bede13ned In Figure 4(b) the DLSNV spectra are shown witha starting point of 100 and a window size of 300 pixels

DLSNV algorithm

(i) Perform SNV on a window of the spectrum rangingfrom 13rst data point to the sth one

(ii) Subdivide spectra from sth data point into windowsof all the same size ws

To optimize the two parameters window size ws andstarting point s a three-step approach is performed In eachstep the predictive power of the model is assessed via the

Storage time (h)

Stan

dard

ized

mol

ecul

e sig

nals

(au

)

2

1

0

ndash1

ndash2

ndash3

ndash4

Pure oilCopper chipsMaterial mix

100 200 300 400 500 600 700

(a)

2

1

0

ndash1

ndash2

ndash3

Storage time (h)

Stan

dard

ized

mol

ecul

e sig

nals

(au

)

Pure oilCopper chipsMaterial mix

100 200 300 400 500 600 700

(b)

Figure 3 Standardized additive responses used as target value for the FTIR regressionmodel for (a) the friction modi13er compound and (b)the antioxidant plotted against time for storage temperature 140degC for all three storage experiments measured by HPLC-QToF

Journal of Spectroscopy 5

coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows

(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1

(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt

(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt

In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model

In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level

352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected

(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows

PSNV algorithm

(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows

To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window

agg and calculate the centroid of the agglomeratedPOIs

(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and

50 data points and choose aggopt according tomaximal R2

After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)

SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Window SNV

Wavenumbers (cmndash1)2500

Find optimalwindow size and

start point

(b)(a) (c)

Final window SNV

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)

095

095

095

097

097

097

R2 Optimization process

R2

R2

Window size 1st run

Starting point

Window size 2nd run

100 200 300 400 500

0 20 40 60 80 100 120

50 60 70 80 90 100 110 120

Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV

6 Journal of Spectroscopy

353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows

PPSNV algorithm

(i) Perform SNV across the POIs with a left and rightmargin of pw

e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01

Wavenumbers (cmndash1) Wavenumbers (cmndash1)

(b)(a)Wavenumbers (cmndash1)

2500

3500 3450 3400 3350 3500 3450 3400 3350

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)(c)

Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level

R2

Agglomeration window size

Optimization

(c)Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform

peak SNV

Wavenumbers (cmndash1)2500

Model coefficientsselected peaks

(b)(a)

10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Final peak SNV

(d)

Find optimalagglomerationwindow size 095

096

Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV

Journal of Spectroscopy 7

Reference friction modifier peak volume

Nonlinearity

ndash3

ndash2

ndash1

0

1

2

Calibration samplesValidation samples

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Decreasein additive concentration

ndash3 ndash2 ndash1 0 1 2

(a)

Calibration samplesValidation samples

Reference friction modifier peak volume

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

ndash3 ndash2 ndash1 0 1 2

(b)

Calibration samplesValidation samples

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2

(c)

Calibration samplesValidation samples

ndash3 ndash2 ndash1 0 1 2

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volume

(d)

Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower

R2

Window size0 50 100 150 200

Optimization

(c)

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform PartPeak SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Model coefficientsselected peaks

(b)(a)

150010000Data points

500

Final part peak SNV

(d)

Find optimalwindow size

092

097

Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV

8 Journal of Spectroscopy

(3) Perform PPSNV across the peaks with the windowsize pw

(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2

4 Results and Discussion

41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye

In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved

e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV

It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra

For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP

RMSE

P Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

+ ++

+ +

+++

++++

00

02

04

06

08

10

12

14

16

(a) (b)

Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV

Journal of Spectroscopy 9

e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation

411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or

(a) (b)

RMSE

P

Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00

02

04

06

08

10

12

14

+

+

++

++

+

++

+

Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV

Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed

MethodFriction modi13er ATF A Antioxidant ATF B

Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)

10 Journal of Spectroscopy

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 4: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

number generator e noise levels were de13ned by thefactors (005 010 015 020 025 030 035 040 and 045)which were multiplied with the output of the randomnumber generator For each noise level 50 simulated noisydata sets were generated and predicted by the pretrainedmodel in order to be able to make well-founded statementsabout the model performance under noise perturbation

e noise robustness workow is a very helpful tool toinvestigate whether a good calibration error is a real advantageor if the model ran into over13tting Using the same regressionalgorithm twice with dishyerent regularization parameters λ thelower regularized model will generate a lower initial calibration

error than the more stringent regularized model But if themodels are tested for robustness the latter tends to have a lowererror slope when the noise level increases

34 Evaluation Metrics e built-in functions R2 score andmean squared error (MSE) of the scikit-learn frameworkwere used as performance metrics

341 Mean Squared Error (MSE) e mean squared error(MSE) of a prediction is calculated by the squared dishyerencesbetween the predicted value yipred and the reference value yi

3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)

12

10

08

06

04

02

(a)

3500 2500 1500Wavenumbers (cmndash1)

Tran

smiss

ion

(au

) 2

0ndash1ndash2ndash3

1

(b)

3500 2500 1500

Abs

orba

nce (

au)

Wavenumbers (cmndash1)

86420

(c)

Figure 1 FTIR spectra of ATF (a) Raw full transmission spectra without any preprocessing e two data sets with dishyerent measurementsetups can be discriminated by eye e blue spectra originate from measurement type (1) with two KBr discs separated by a Teon spacerand the red set of curves originates from the cuvette measurement (2) (b) SNV-transformed transmission spectra after truncation of thesaturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation

3500 3000 2500 2000 1500 1000Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)

14

12

10

08

06

02

(a)

04

3500 2500 1500

Abs

orba

nce (

au)

Wavenumbers (cmndash1)

86420

(c)

3500 2500 1500Wavenumbers (cmndash1)

Tran

smiss

ion

(au

)0

ndash1

ndash2

ndash3

1

(b)

Figure 2 FTIR spectra of ATF B (a) Raw full transmission spectra without any preprocessing (b) SNV-transformed transmission spectraafter truncation of the saturated C-H vibrational regions and CO2 areas and (c) SNV-transformed absorbance spectra after truncation

4 Journal of Spectroscopy

of the ith sample For a given data set with n samples theMSEis the average value over all samples It follows the followingformula [27]

MSE y ypred( ) 1nsumn

i1yi minusyipred( )

2 (3)

e best possible MSE value is 0 and small values aredesirable as the deviation from the correct prediction is lowFrom MSE the root-mean-squared error (RMSE) was cal-culated by taking the square root e RMSE value has thesame dimension as the original reference target values

342 R2 Coecient of Determination R2 describes theportion of the variance in the target values (dependentvariables) that can be predicted from the spectra (in-dependent variables) by the model [28] e best possiblescore for R2 is 10 R2 gets 00 for a constant model whichpredicts a constant value disregarding of the input featuresFor linear regression modeling with intercept R2 is equal tothe square of Pearson correlation coecient between pre-dicted and reference target values [29] For a data setcomprising n samples the R2 score is given as

R2 y ypred( ) 1minussumni1 yi minusyipred( )

2

sumni1 yi minusypred( )2 (4)

where yipred is the model prediction of the ith sample whichhas a reference value yi and ypred is the mean value of allpredictions

ypred 1nsumn

i1yipred (5)

35 Standard Normal Variate Each spectrumx (x1 x2 xk) with k measured data points istransformed to the standardized form z (z1 z2 zk)by bringing the spectra to zero mean and unit varianceFor this purpose the mean spectrum x is subtracted fromeach data point xi and divided by the standard deviation

zi xi minus x

sumkj xi minusx( )2kradic (6)

with

x 1ksumk

j

xj (7)

351 Dynamic Localized SNV (DLSNV) e DLSNVworkow is based on the SNV-transformed spectra data set(Figure 4(a)) To calculate the DLSNV data the spectra aredivided into multiple regions On each of these regionsstandardization is performed To adjust the windows toimportant areas in the spectrum a starting point can bede13ned In Figure 4(b) the DLSNV spectra are shown witha starting point of 100 and a window size of 300 pixels

DLSNV algorithm

(i) Perform SNV on a window of the spectrum rangingfrom 13rst data point to the sth one

(ii) Subdivide spectra from sth data point into windowsof all the same size ws

To optimize the two parameters window size ws andstarting point s a three-step approach is performed In eachstep the predictive power of the model is assessed via the

Storage time (h)

Stan

dard

ized

mol

ecul

e sig

nals

(au

)

2

1

0

ndash1

ndash2

ndash3

ndash4

Pure oilCopper chipsMaterial mix

100 200 300 400 500 600 700

(a)

2

1

0

ndash1

ndash2

ndash3

Storage time (h)

Stan

dard

ized

mol

ecul

e sig

nals

(au

)

Pure oilCopper chipsMaterial mix

100 200 300 400 500 600 700

(b)

Figure 3 Standardized additive responses used as target value for the FTIR regressionmodel for (a) the friction modi13er compound and (b)the antioxidant plotted against time for storage temperature 140degC for all three storage experiments measured by HPLC-QToF

Journal of Spectroscopy 5

coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows

(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1

(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt

(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt

In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model

In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level

352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected

(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows

PSNV algorithm

(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows

To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window

agg and calculate the centroid of the agglomeratedPOIs

(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and

50 data points and choose aggopt according tomaximal R2

After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)

SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Window SNV

Wavenumbers (cmndash1)2500

Find optimalwindow size and

start point

(b)(a) (c)

Final window SNV

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)

095

095

095

097

097

097

R2 Optimization process

R2

R2

Window size 1st run

Starting point

Window size 2nd run

100 200 300 400 500

0 20 40 60 80 100 120

50 60 70 80 90 100 110 120

Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV

6 Journal of Spectroscopy

353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows

PPSNV algorithm

(i) Perform SNV across the POIs with a left and rightmargin of pw

e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01

Wavenumbers (cmndash1) Wavenumbers (cmndash1)

(b)(a)Wavenumbers (cmndash1)

2500

3500 3450 3400 3350 3500 3450 3400 3350

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)(c)

Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level

R2

Agglomeration window size

Optimization

(c)Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform

peak SNV

Wavenumbers (cmndash1)2500

Model coefficientsselected peaks

(b)(a)

10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Final peak SNV

(d)

Find optimalagglomerationwindow size 095

096

Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV

Journal of Spectroscopy 7

Reference friction modifier peak volume

Nonlinearity

ndash3

ndash2

ndash1

0

1

2

Calibration samplesValidation samples

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Decreasein additive concentration

ndash3 ndash2 ndash1 0 1 2

(a)

Calibration samplesValidation samples

Reference friction modifier peak volume

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

ndash3 ndash2 ndash1 0 1 2

(b)

Calibration samplesValidation samples

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2

(c)

Calibration samplesValidation samples

ndash3 ndash2 ndash1 0 1 2

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volume

(d)

Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower

R2

Window size0 50 100 150 200

Optimization

(c)

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform PartPeak SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Model coefficientsselected peaks

(b)(a)

150010000Data points

500

Final part peak SNV

(d)

Find optimalwindow size

092

097

Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV

8 Journal of Spectroscopy

(3) Perform PPSNV across the peaks with the windowsize pw

(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2

4 Results and Discussion

41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye

In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved

e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV

It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra

For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP

RMSE

P Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

+ ++

+ +

+++

++++

00

02

04

06

08

10

12

14

16

(a) (b)

Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV

Journal of Spectroscopy 9

e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation

411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or

(a) (b)

RMSE

P

Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00

02

04

06

08

10

12

14

+

+

++

++

+

++

+

Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV

Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed

MethodFriction modi13er ATF A Antioxidant ATF B

Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)

10 Journal of Spectroscopy

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 5: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

of the ith sample For a given data set with n samples theMSEis the average value over all samples It follows the followingformula [27]

MSE y ypred( ) 1nsumn

i1yi minusyipred( )

2 (3)

e best possible MSE value is 0 and small values aredesirable as the deviation from the correct prediction is lowFrom MSE the root-mean-squared error (RMSE) was cal-culated by taking the square root e RMSE value has thesame dimension as the original reference target values

342 R2 Coecient of Determination R2 describes theportion of the variance in the target values (dependentvariables) that can be predicted from the spectra (in-dependent variables) by the model [28] e best possiblescore for R2 is 10 R2 gets 00 for a constant model whichpredicts a constant value disregarding of the input featuresFor linear regression modeling with intercept R2 is equal tothe square of Pearson correlation coecient between pre-dicted and reference target values [29] For a data setcomprising n samples the R2 score is given as

R2 y ypred( ) 1minussumni1 yi minusyipred( )

2

sumni1 yi minusypred( )2 (4)

where yipred is the model prediction of the ith sample whichhas a reference value yi and ypred is the mean value of allpredictions

ypred 1nsumn

i1yipred (5)

35 Standard Normal Variate Each spectrumx (x1 x2 xk) with k measured data points istransformed to the standardized form z (z1 z2 zk)by bringing the spectra to zero mean and unit varianceFor this purpose the mean spectrum x is subtracted fromeach data point xi and divided by the standard deviation

zi xi minus x

sumkj xi minusx( )2kradic (6)

with

x 1ksumk

j

xj (7)

351 Dynamic Localized SNV (DLSNV) e DLSNVworkow is based on the SNV-transformed spectra data set(Figure 4(a)) To calculate the DLSNV data the spectra aredivided into multiple regions On each of these regionsstandardization is performed To adjust the windows toimportant areas in the spectrum a starting point can bede13ned In Figure 4(b) the DLSNV spectra are shown witha starting point of 100 and a window size of 300 pixels

DLSNV algorithm

(i) Perform SNV on a window of the spectrum rangingfrom 13rst data point to the sth one

(ii) Subdivide spectra from sth data point into windowsof all the same size ws

To optimize the two parameters window size ws andstarting point s a three-step approach is performed In eachstep the predictive power of the model is assessed via the

Storage time (h)

Stan

dard

ized

mol

ecul

e sig

nals

(au

)

2

1

0

ndash1

ndash2

ndash3

ndash4

Pure oilCopper chipsMaterial mix

100 200 300 400 500 600 700

(a)

2

1

0

ndash1

ndash2

ndash3

Storage time (h)

Stan

dard

ized

mol

ecul

e sig

nals

(au

)

Pure oilCopper chipsMaterial mix

100 200 300 400 500 600 700

(b)

Figure 3 Standardized additive responses used as target value for the FTIR regressionmodel for (a) the friction modi13er compound and (b)the antioxidant plotted against time for storage temperature 140degC for all three storage experiments measured by HPLC-QToF

Journal of Spectroscopy 5

coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows

(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1

(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt

(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt

In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model

In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level

352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected

(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows

PSNV algorithm

(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows

To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window

agg and calculate the centroid of the agglomeratedPOIs

(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and

50 data points and choose aggopt according tomaximal R2

After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)

SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Window SNV

Wavenumbers (cmndash1)2500

Find optimalwindow size and

start point

(b)(a) (c)

Final window SNV

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)

095

095

095

097

097

097

R2 Optimization process

R2

R2

Window size 1st run

Starting point

Window size 2nd run

100 200 300 400 500

0 20 40 60 80 100 120

50 60 70 80 90 100 110 120

Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV

6 Journal of Spectroscopy

353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows

PPSNV algorithm

(i) Perform SNV across the POIs with a left and rightmargin of pw

e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01

Wavenumbers (cmndash1) Wavenumbers (cmndash1)

(b)(a)Wavenumbers (cmndash1)

2500

3500 3450 3400 3350 3500 3450 3400 3350

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)(c)

Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level

R2

Agglomeration window size

Optimization

(c)Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform

peak SNV

Wavenumbers (cmndash1)2500

Model coefficientsselected peaks

(b)(a)

10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Final peak SNV

(d)

Find optimalagglomerationwindow size 095

096

Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV

Journal of Spectroscopy 7

Reference friction modifier peak volume

Nonlinearity

ndash3

ndash2

ndash1

0

1

2

Calibration samplesValidation samples

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Decreasein additive concentration

ndash3 ndash2 ndash1 0 1 2

(a)

Calibration samplesValidation samples

Reference friction modifier peak volume

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

ndash3 ndash2 ndash1 0 1 2

(b)

Calibration samplesValidation samples

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2

(c)

Calibration samplesValidation samples

ndash3 ndash2 ndash1 0 1 2

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volume

(d)

Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower

R2

Window size0 50 100 150 200

Optimization

(c)

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform PartPeak SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Model coefficientsselected peaks

(b)(a)

150010000Data points

500

Final part peak SNV

(d)

Find optimalwindow size

092

097

Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV

8 Journal of Spectroscopy

(3) Perform PPSNV across the peaks with the windowsize pw

(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2

4 Results and Discussion

41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye

In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved

e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV

It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra

For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP

RMSE

P Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

+ ++

+ +

+++

++++

00

02

04

06

08

10

12

14

16

(a) (b)

Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV

Journal of Spectroscopy 9

e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation

411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or

(a) (b)

RMSE

P

Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00

02

04

06

08

10

12

14

+

+

++

++

+

++

+

Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV

Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed

MethodFriction modi13er ATF A Antioxidant ATF B

Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)

10 Journal of Spectroscopy

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 6: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

coecient of determination R2 e prediction performanceof the chosen window size in combination with the re-gression model is benchmarked by 13tting the same model tothe single SNV data indicated by a red line in Figure 4(c)e optimization steps can be summarized as follows

(1) Perform LSNV with window sizes from 50 to 500pixels and determine R2 for all window sizes Findthe optimal window size wsopt1

(2) Perform LSNV with optimal window size of step 1wsopt 1 vary the starting point from 0 to 2middotwsopt 1 andselect the optimal starting point sopt

(3) Perform LSNV with optimal starting point sopt withwindow sizes from 50 to 2middotwsopt 1 in order to 13nd thebest combination of window size wsopt 2 and startingpoint sopt

In Figure 4(d) the 13nal DLSNV spectra after optimi-zation are shown Note that jumps can occur between theindividual standardization windows since the mean value ofthis current window is subtracted for each window How-ever this does not ashyect the regression model

In Figure 5(a) the ATF A samples are shown withSNVperformed on the entire spectral region and in Figure 5(b)the same spectra are depicted after DLSNV optimizationFigure 5(c) shows a zoom-in view of the highlighted region ofFigure 5(a) and in Figure 5(d) the same region is depicted afterDLSNV optimization e baseline is removed for the exactspectra sequence and thus peaks are aligned in a way that thedishyerent aging levels of the samples can already be recognizedby eyee shown snipped spectrum is the phenolic antioxidantregionus the decrease of this band can be associatedwith theaging level Magenta indicates (relatively) fresh sampleswhereas red indicates a strong degradation level

352 Peak SNV e idea behind the Peak SNV method isto standardize the important areas of the spectrum in-dependently of each other e optimization workow forPSNV is shown in Figure 6 starting from the single SNVtransformed data set Data points with a high correlationwith the target values (points of interest POI) are selected

(Figure 6(a)) and the SNV transformation is performed onwindows around the centroids Once the POIs are identi13edthe PSNV transformation is conducted as follows

PSNV algorithm

(i) Subdivide spectra into sequences ranging from halfthe distance from the previous POI to half thedistance to the next one (Figure 6(b)) SNV isperformed across these windows

To 13nd the POI an initial regression model is 13tted tothe data In order to identify important regions of thespectra the model coecients are assessed e normalizedabsolute values of the coecient vector are fed into a peak-picking algorithm Since it may occur that POIs are in closeproximity an agglomeration of the POIs is conducted inorder to prevent from very narrow standardization windowsPeak centroids are calculated via the mean value of thecombined POIs e task for the optimization process is to13nd the best window for POI agglomeration aggopt whichis done by analyzing the calibration R2 for each agglom-eration window and picking the window size with maximalcorrelation between the predicted and reference target(Figure 6(c)) e steps are summarized as follows

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01(3) Combine peaks which are within a certain window

agg and calculate the centroid of the agglomeratedPOIs

(4) Perform PSNV across the centroid of the POIs(5) Evaluate performance via R2 for agg between 10 and

50 data points and choose aggopt according tomaximal R2

After optimization each window has an individualwindow size and range over the peak centroid of im-portant signals in the spectrum On these windows SNVtransformation provides an optimal baseline and scattereshyect removal e optimized spectrum is shown inFigure 6(d)

SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Window SNV

Wavenumbers (cmndash1)2500

Find optimalwindow size and

start point

(b)(a) (c)

Final window SNV

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)

095

095

095

097

097

097

R2 Optimization process

R2

R2

Window size 1st run

Starting point

Window size 2nd run

100 200 300 400 500

0 20 40 60 80 100 120

50 60 70 80 90 100 110 120

Figure 4 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Single SNV (b) Dynamic LocalizedSNV with starting point 100 and window 300 for visualization (c) three-stage optimization process for window starting point and 13nalwindow optimization and (d) optimized DLSNV

6 Journal of Spectroscopy

353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows

PPSNV algorithm

(i) Perform SNV across the POIs with a left and rightmargin of pw

e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01

Wavenumbers (cmndash1) Wavenumbers (cmndash1)

(b)(a)Wavenumbers (cmndash1)

2500

3500 3450 3400 3350 3500 3450 3400 3350

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)(c)

Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level

R2

Agglomeration window size

Optimization

(c)Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform

peak SNV

Wavenumbers (cmndash1)2500

Model coefficientsselected peaks

(b)(a)

10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Final peak SNV

(d)

Find optimalagglomerationwindow size 095

096

Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV

Journal of Spectroscopy 7

Reference friction modifier peak volume

Nonlinearity

ndash3

ndash2

ndash1

0

1

2

Calibration samplesValidation samples

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Decreasein additive concentration

ndash3 ndash2 ndash1 0 1 2

(a)

Calibration samplesValidation samples

Reference friction modifier peak volume

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

ndash3 ndash2 ndash1 0 1 2

(b)

Calibration samplesValidation samples

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2

(c)

Calibration samplesValidation samples

ndash3 ndash2 ndash1 0 1 2

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volume

(d)

Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower

R2

Window size0 50 100 150 200

Optimization

(c)

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform PartPeak SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Model coefficientsselected peaks

(b)(a)

150010000Data points

500

Final part peak SNV

(d)

Find optimalwindow size

092

097

Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV

8 Journal of Spectroscopy

(3) Perform PPSNV across the peaks with the windowsize pw

(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2

4 Results and Discussion

41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye

In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved

e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV

It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra

For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP

RMSE

P Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

+ ++

+ +

+++

++++

00

02

04

06

08

10

12

14

16

(a) (b)

Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV

Journal of Spectroscopy 9

e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation

411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or

(a) (b)

RMSE

P

Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00

02

04

06

08

10

12

14

+

+

++

++

+

++

+

Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV

Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed

MethodFriction modi13er ATF A Antioxidant ATF B

Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)

10 Journal of Spectroscopy

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 7: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

353 Partial Peak SNV e idea behind Partial Peak SNV issimilar to PSNV picking the regions of the spectrum whichshow a high correlation with the target values agglomeratingPOIs in close proximity and standardizing these importantspectral features (Figure 7(a)) But unlike for PSNV not onlythe whole spectrum is 13nally taken into account but alsoa small window around the POI It may occur that the samedata point appears several times in dishyerent standardizations(see overlapping regions in Figures 7(b) and 7(d)) Due to thisworkow the PPSNV spectrum may have more data points(due to overlapping) or less (because not the entire spectrum istaken into account) than those of the original spectrum APPSNV spectrum is calculated as follows

PPSNV algorithm

(i) Perform SNV across the POIs with a left and rightmargin of pw

e optimization focuses on the adjustment of thewindow size pw around the POIs in which the SNV isapplied for maximal predictive power in calibration (Fig-ure 7(c)) e optimization process is divided into thefollowing steps

(1) Fit the data set to the target values (only calibration)(2) Pick peaks from the normalized model coecient

vector (|w|max(|w|)) threshold for peaks 01

Wavenumbers (cmndash1) Wavenumbers (cmndash1)

(b)(a)Wavenumbers (cmndash1)

2500

3500 3450 3400 3350 3500 3450 3400 3350

3500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

(d)(c)

Figure 5 Demonstration of the improvement of peak alignment for DLSNV (a) SNV-transformed transmission spectra with marked zoomlevel of (c) (b) Optimized DLSV spectra with marked zoom area of (d) Magenta indicates (relatively) fresh samples whereas red indicatesa strong degradation level

R2

Agglomeration window size

Optimization

(c)Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform

peak SNV

Wavenumbers (cmndash1)2500

Model coefficientsselected peaks

(b)(a)

10 20 30 40 503500 3000 2000 10003500 3000 2000 1000 3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Final peak SNV

(d)

Find optimalagglomerationwindow size 095

096

Figure 6 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal agglomeration window size and (d) optimized PSNV

Journal of Spectroscopy 7

Reference friction modifier peak volume

Nonlinearity

ndash3

ndash2

ndash1

0

1

2

Calibration samplesValidation samples

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Decreasein additive concentration

ndash3 ndash2 ndash1 0 1 2

(a)

Calibration samplesValidation samples

Reference friction modifier peak volume

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

ndash3 ndash2 ndash1 0 1 2

(b)

Calibration samplesValidation samples

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2

(c)

Calibration samplesValidation samples

ndash3 ndash2 ndash1 0 1 2

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volume

(d)

Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower

R2

Window size0 50 100 150 200

Optimization

(c)

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform PartPeak SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Model coefficientsselected peaks

(b)(a)

150010000Data points

500

Final part peak SNV

(d)

Find optimalwindow size

092

097

Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV

8 Journal of Spectroscopy

(3) Perform PPSNV across the peaks with the windowsize pw

(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2

4 Results and Discussion

41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye

In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved

e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV

It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra

For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP

RMSE

P Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

+ ++

+ +

+++

++++

00

02

04

06

08

10

12

14

16

(a) (b)

Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV

Journal of Spectroscopy 9

e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation

411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or

(a) (b)

RMSE

P

Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00

02

04

06

08

10

12

14

+

+

++

++

+

++

+

Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV

Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed

MethodFriction modi13er ATF A Antioxidant ATF B

Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)

10 Journal of Spectroscopy

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 8: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

Reference friction modifier peak volume

Nonlinearity

ndash3

ndash2

ndash1

0

1

2

Calibration samplesValidation samples

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Decreasein additive concentration

ndash3 ndash2 ndash1 0 1 2

(a)

Calibration samplesValidation samples

Reference friction modifier peak volume

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

ndash3 ndash2 ndash1 0 1 2

(b)

Calibration samplesValidation samples

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volumendash3 ndash2 ndash1 0 1 2

(c)

Calibration samplesValidation samples

ndash3 ndash2 ndash1 0 1 2

ndash3

ndash2

ndash1

0

1

2

Pred

icte

d fri

ctio

n m

odifi

er p

eak

volu

me

Reference friction modifier peak volume

(d)

Figure 8 Cross validation results for (a) SNV (b) Dynamic Localized SNV (c) Peak SNV and (d) Partial Peak SNV preprocessed spectra inthe regression case of the friction modi13er additive e reference values were measured by HPLC-QToF Red dots indicate prediction ofcalibration samples and blue dots represent prediction of the hold-out validation samples e dashed arrow in (a) indicates the increasingdegradation of the samples with increasing time as the friction modi13er additive concentration gets lower

R2

Window size0 50 100 150 200

Optimization

(c)

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Divide spectrum in peak windows

Agglomeratepeaks andperform PartPeak SNV

3500 3000 2000 1000Wavenumbers (cmndash1)

2500

Model coefficientsselected peaks

(b)(a)

150010000Data points

500

Final part peak SNV

(d)

Find optimalwindow size

092

097

Figure 7 Demonstration of the workow and optimization process for Dynamic Localized SNV (a) Picked peaks of normalized absolutecoecient vector and indication of the POIs in one spectrum (b) spectrum separation according to agglomerated peaks (c) optimizationprocess in order to 13nd optimal window size around the POIs and (d) optimized PPSNV

8 Journal of Spectroscopy

(3) Perform PPSNV across the peaks with the windowsize pw

(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2

4 Results and Discussion

41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye

In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved

e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV

It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra

For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP

RMSE

P Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

+ ++

+ +

+++

++++

00

02

04

06

08

10

12

14

16

(a) (b)

Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV

Journal of Spectroscopy 9

e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation

411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or

(a) (b)

RMSE

P

Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00

02

04

06

08

10

12

14

+

+

++

++

+

++

+

Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV

Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed

MethodFriction modi13er ATF A Antioxidant ATF B

Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)

10 Journal of Spectroscopy

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 9: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

(3) Perform PPSNV across the peaks with the windowsize pw

(4) Evaluate the performance via R2 for pw between 1and 200 data points and choose pwopt according tomaximal R2

4 Results and Discussion

41 Cross Validation In Figure 8(a) the cross validationrecovery function for predictions of the SNV preprocessedspectra of the regression on the friction modi13er compoundis shown A 50-fold cross validation strategy with a cali-brationvalidation splitting of 7030 was used Red dotsrepresent the prediction of calibration and blue dots rep-resent validation samples It is obvious that the linear modelstruggles to predict the high and low compound intensityregions correctly e nonlinearity is visualized by an arrowand a dashed line to guide the eye

In Figure 8 also the cross validation recovery functionfor predictions after Dynamic Localized SNV (Figure 8(b))Peak SNV (Figure 8(c)) and Partial Peak SNV (Figure 8(d))optimization are shown e saturation eshyect in the lowintensity area of the compound response is almost com-pletely removed in the latter three cases It is also notablethat the scattering around the green bisecting line is sig-ni13cantly reduced us the con13dence interval for thepredictions is improved

e RMSEP values during the cross validation of theregression of the friction modi13er component are sum-marized in Figure 9 in a box-and-whisker plot representa-tion e red line indicates the median within the boxes theinterquartile range (IQR) (contains 50 of the data) isdepicted and the margins of the whiskers representQ1 minus 15 middot IQR and Q3 + 15 middot IQR for the lower and upperbound respectively (Q1 means the smallest 25 of the dataset are smaller than this value and Q3 means the smallest75 are smaller than this value) Subplot Figure 9(a) refers tothe transmission spectra and Figure 9(b) refers to the ab-sorbance spectra e labels are associated with (1) withoutstandardization (2) single SNV transformation on the fullspectral range (3) Localized SNV (4) Dynamic LocalizedSNV (5) Peak SNV and (6) Partial Peak SNV

It is noticeable that the RMSEP is very poor in case of thecrude transmission spectra and that SNV has a very usefulimpact on them whereas the improvement after SNV is lowfor absorbance spectra

For all sophisticated optimized standardization ap-proaches DLSNV PSNV and PPSNV the median and thescattering around the median of RMSEP decreases drasti-cally with respect to the SNV-transformed full spectra butalso LSNV seems to be a reasonable choice DLSNV onabsorbance spectra is characterized by the lowest medianand the smallest scattering con13rmed by Table 2 summa-rizing the mean values and standard deviation of RMSEP

RMSE

P Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)

+ ++

+ +

+++

++++

00

02

04

06

08

10

12

14

16

(a) (b)

Figure 9 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy (50 foldsrandom train test split of 7030 of the data) for the friction modi13er compound of ATF A In (a) the RMSEP values for the transmissionspectra are shown and in (b) the RMSEP values for the absorbance spectra are shown Boxplot (1) is without standardization (2) is witha single SNV transformation on the full spectrum (3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimizedPSNV and (6) is with optimized PPSNV

Journal of Spectroscopy 9

e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation

411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or

(a) (b)

RMSE

P

Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00

02

04

06

08

10

12

14

+

+

++

++

+

++

+

Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV

Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed

MethodFriction modi13er ATF A Antioxidant ATF B

Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)

10 Journal of Spectroscopy

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 10: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

e summarized RMSEPs of the regression to the antiox-idant additive of ATF B are shown in Figure 10 in a box-and-whisker plot representation where Figure 10(a) refers to thetransmission spectra and Figure 10(b) to the absorbance spectraIn this case DLSNV PSNV and PPSNV reduce both themedian and the scattering around themedian enormously whencompared with SNV on full spectra e best performance isachieved by PPSNV conducted on the transmission spectracon13rmed by Table 2 On transmission spectra DLSNV andPPSNV perform better than LSNV but PSNV only has a pos-itive eshyect when compared to SNV In relation to LSNV usingPSNV the predictive power is reduced e fact that PPSNV isthe best choice for this regression use case suggests that it isbene13cial to only use spectral regions with high correlation withthe target value and drop regions without or low correlation

411 Noise Robustness In Figure 11 the performance of theregression model for the prediction of noisy spectra isshown for the friction modi13er In subplot Figure 11(a)the curves for all preprocessings are depicted and inFigure 11(b) a zoomed view is shown Without any pre-processing the initial calibration error for both trans-mission and absorbance spectra is very poor and rises veryfast with the increasing noise level factor Although thesophisticated preprocessing methods LSNV DLSNVPNSV and PPSNV show a lower initial calibration errorthe slope of the error is lower than for SNV In Figure 11(b)the trend of the transmission spectra having low errorsteepness is visible One may say that the three proposedstandardization techniques show a very similar noise ro-bustness behavior and are signi13cantly better than none or

(a) (b)

RMSE

P

Proposedmethods

Proposedmethods

(1) (2) (3) (4) (5) (6) (1) (2) (3) (4) (5) (6)00

02

04

06

08

10

12

14

+

+

++

++

+

++

+

Figure 10 Box-and-whisker plot representation of the root-mean-squared error of prediction of the cross validation strategy for thephenolic antioxidant compound of ATF B In (a) the RMSEP values for the transmission spectra are shown and in (b) the RMSEP valuesfor the absorbance spectra are shown Boxplot (1) is without standardization (2) is with a single SNV transformation on the full spectrum(3) is with optimized LSNV (4) is with optimized Dynamic DLSNV (5) is with optimized PSNV and (6) is with optimized PPSNV

Table 2 Summary of the model performances described by the mean value and the standard deviation of the RMSEP values during crossvalidation e relative improvements and respective p-values compared with LSNV are also listed

MethodFriction modi13er ATF A Antioxidant ATF B

Transmission Absorbance Transmission AbsorbanceRaw 083plusmn 022 053plusmn 016 090plusmn 014 070plusmn 011Single SNV 034plusmn 014 042plusmn 017 061plusmn 011 070plusmn 013LSNV 030plusmn 007 031plusmn 007 024plusmn 004 024plusmn 004DLSNV 028plusmn 006 026 plusmn 005 021plusmn 004 021plusmn 004Rel improvement 9 (plt 005) 16 (plt 0001) 13 (plt 0001) 13 (plt 0001)PSNV 028plusmn 008 028plusmn 006 033plusmn 006 031plusmn 007Rel improvement 8 (pgt 005) 9 (plt 005) minus41 (plt 0001) minus33 (plt 0001)PPSNV 026plusmn 006 026plusmn 006 017 plusmn 004 024plusmn 005Rel improvement 13 (plt 001) 15 (plt 0001) 29 (plt 0001) minus3 (pgt 005)

10 Journal of Spectroscopy

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 11: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

SNV preprocessing indicating that the model did notoverfit the data as mentioned in Section 332

In Figure 12 the performance of the regression model ofthe antioxidant for the prediction of noisy spectra is shown(e absorbance spectra with or without preprocessing showa similar noise trend as the SNV-transformed spectra InFigure 12(b) the localized versions are shown in a zoomedview (e PPSNV preprocessing on the transmission spectrais characterized by the flattest noise dependency (ese resutsdemonstrate the superiority of the PPSNVmethod in this usecase As mentioned above PSNV is not advantageous in thisapplication and shows the lowest noise immunity but it ispreferable to the SNV across the whole spectrum

42 Summary (e optimized parameters for the pre-processing methods are summarized in Table 3 (e LSNVoptimization process selects the same window size asDLSNV (us the second window size run has no influenceon the final result in these two cases but the starting pointproduces an improvement

As already mentioned in Table 2 the cross validationperformances of the tested methods are summarized as meanvalues and standard deviation for all cross validation runs Forthe friction modifier the performances of DLSNV PSNVand PPSNV are very similar Table 2 also lists the relative

improvements against the benchmark preprocessing LSNVaccompanied by corresponding p values from a two-sidedt-test which tests the significance of the mean values beingdifferent (the deviation for relative improvements when thesame mean value is given due to the fact that the improve-ments were calculated from exact values rather than roundedvalues)

(e best mean RMSEP value for the regression model forthe frictionmodifier of 026 is produced by DLSNV based onabsorbance spectra (e antioxidant compound is modeledbest by PPSNV preprocessing of the transmission spectraand yields a very low prediction error of 017

To summarize one may say that all proposed methodsperformed very well reducing both mean and standarddeviation of the cross validation error compared with SNVPSNV is not reasonable for the antioxidant additive as theperformance is poor compared with the benchmark pre-processing method LSNV

Which preprocessing method is the best depends on theactual regression use case but in general it is shown thatPPSNV outperforms PSNV (is suggests that it is bene-ficial to drop spectral regions showing low or no de-pendency on the target value and to only consider highlycorrelated peaks

For the antioxidant compound PPSNV yielded anenormous improvement (is could be explained as the

Noise levelfactor

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

14

12

10

08

RMSE

06

04

02

0001 02 03 04

(a)

RMSE

Noise levelfactor

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

045

040

035

030

025

020

01501 02 03 04

(b)

Figure 11 RMSE as a function of the noise level factor for the regression of the friction modifier compound of ATF A calibrated by theunperturbed full data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50repetitions of noise addition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Journal of Spectroscopy 11

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 12: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

phenolic aging inhibitor is a compound with very narrowvibrational band in the ATF B and thus does not have a greatimpact when the SNV is carried out across the entirespectrum (is may lead to a suboptimal alignment of thisband In case of novel standardizations the SNV is opti-mized to the high correlative bands and scatter effects can becompensated for these exact regions

(e fact that PSNV is unsuccessful for the antioxidantmay be because the POIs are not centered to the middle ofthe SNVwindow andmay have large left and right margins ifthey are far away from other POIs As a result they may notbe optimally standardized (is is shown in Figure 6(b)where the single POI at about 2700 cmminus1 has a large singleSNV window

(e study provides an overview of model performanceswhen using transmission or absorbance spectra suggestingthat both cases can lead to valid regressionmodels Howeverfor quantitative models built on transmission spectra theSNV is vital whereas in the absorbance case the predictivepower does not depend on SNV transformation In absor-bance spectra the influence of the baseline constant is re-duced because high transmission values are converted intolow absorbance values

To conclude DLSNV PSNV and PPSNV were able toimprove both transmission and absorbance predictivemodels (e scattering around the mean values are also

drastically reduced because the model does not have to learnhow to compensate for the baseline shift in each crossvalidation step leading to more reproducible results Eachvibrational band is optimally aligned so that the additivedepletion trend is encoded in the absolute signal intensityand the model does not have to weigh a data point asbackground correction

5 Conclusion

(e results presented in this study demonstrate the out-performance of the proposed novel standardizationstrategies Dynamic Localized SNV Peak SNV and PartialPeak SNV to improve both the mean and scatter of RMSEP

Noise level factor

RMSE

Wo standardizationSNVLSNVDLSNV

PSNVPPSNVTransmissionAbsorbance

20

15

10

05

0001 02 03 04

(a)

LSNVDLSNVPSNV

PPSNVTransmissionAbsorbance

RMSE

Noise level factor

07

06

05

04

03

02

0101 02 03 04

(b)

Figure 12 RMSE as a function of the noise level factor for the regression of the antioxidant compound of ATF B calibrated by the originalfull data set (e error bars represent the standard deviation of the prediction error calculated from the statistics of 50 repetitions of noiseaddition In subplot (a) all curves are shown and in (b) the sophisticated standardizations are shown

Table 3 Summary of the preprocessing parameters For DLSNVwindow size and starting point for PSN agglomeration windowand for PPSNV window width around POI is shown

MethodATF A ATF B

Transmission Absorbance Transmission AbsorbanceLSNV wopt 52 wopt 52 wopt 51 wopt 2 51DLSNV wopt 2 52 wopt 2 52 wopt 2 51 wopt 2 51

sopt 5 sopt 7 sopt 48 sopt 48PSNV aggopt 5 aggopt 14 aggopt 6 aggopt 10PPSNV pwopt 17 pwopt 25 pwopt 15 pwopt 38

12 Journal of Spectroscopy

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 13: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

values in cross validation and the robustness against noisedrastically with respect to SNV transformation executed onthe entire spectrum Against the benchmark LSNV anenhancement of the predictive power of a ridge regressionmodel by up to 16 and 29 could be achieved for thefriction modifier and the antioxidant compound re-spectively (e demonstrated optimization workflows forperforming SNV on specific regions of the spectrum havebeen introduced here for the first time (erefore thestandardization methods used in this paper are capable ofeliminating nonlinearities by flexible rescaling in definedareas To our knowledge such standardization techniqueshave not been presented elsewhere

Data Availability

(e data used to support the findings of this study are in-cluded within the article

Conflicts of Interest

(e authors declare that they have no conflicts of interest

References

[1] M Zeaiter J M Roger and V Bellon-Maurel ldquoDynamicorthogonal projection a new method to maintain the on-linerobustness of multivariate calibrations application to nir-based monitoring of wine fermentationsrdquo Chemometrics andIntelligent Laboratory Systems vol 80 no 2 pp 227ndash2352006

[2] R M Balabin and R Z Safieva ldquoMotor oil classificationby base stock and viscosity based on near infrared (nir)spectroscopy datardquo Fuel vol 87 no 12 pp 2745ndash27522008

[3] A Borin and R J Poppi ldquoMultivariate quality control oflubricating oils using fourier transform infrared spectros-copyrdquo Journal of the Brazilian Chemical Society vol 15 no 4pp 570ndash576 2004

[4] J Huang D Brennan L Sattler J Alderman B Lane andC OrsquoMathuna ldquoA comparison of calibration methods basedon calibration data size and robustnessrdquo Chemometrics andIntelligent Laboratory Systems vol 62 no 1 pp 25ndash352002

[5] M A Al-Ghouti and L Al-Atoum ldquoVirgin and recycledengine oil differentiation a spectroscopic studyrdquo Journal ofEnvironmental Management vol 90 no 1 pp 187ndash195 2009

[6] R Kellner J M Mermet M Otto M Valcarcel andH M Widmer Analytical Chemistry John Wiley amp SonsAustralia Limited Milton Australia 2004

[7] B G Osborne T Fearn P T Hindle and B G OsbornePractical nir Spectroscopy with Applications in Food andBeverage Analysis Longman Scientific amp Technical WileyHarlow Essex UK 1993

[8] J D Donald and D D Kevin Interpreting Diffuse Reflectanceand Transmittance NIR Chichester UK 2007

[9] A Rinnan F van den Berg and S B Engelsen ldquoReview of themost common pre-processing techniques for near-infraredspectrardquo TrAC Trends in Analytical Chemistry vol 28 no 10pp 1201ndash1222 2009

[10] E Arendse O Amos Fawole L Samukelo MagwazaN Helene and U Linus Opara ldquoComparing the analyticalperformance of near and mid infrared spectrometers for

evaluating pomegranate juice qualityrdquo LWT vol 91pp 180ndash190 2018

[11] J C Machado M A Faria I M P L V O FerreiraR N M J Pascoa and J A Lopes ldquoVarietal discrimination ofhop pellets by near and mid infrared spectroscopyrdquo Talantavol 180 pp 69ndash75 2018

[12] H W Siesler ldquoVibrational spectroscopyrdquo in ReferenceModule in Materials Science and Materials EngineeringElsevier New York NY USA 2016

[13] A Paula Craig B G Botelho L S Oliveira and A S FrancaldquoMid infrared spectroscopy and chemometrics as tools for theclassification of roasted coffees by cup qualityrdquo FoodChemistry vol 245 pp 1052ndash1061 2018

[14] S R Khandasammy M A Fikiet E Mistek et al ldquoBloodstainspaintings and drugs raman spectroscopy applications in fo-rensic sciencerdquo Forensic Chemistry vol 8 pp 111ndash133 2018

[15] J Engel G Jan E Szymanska et al ldquoBreaking with trends inpre-processingrdquo TrAC Trends in Analytical Chemistryvol 50 pp 96ndash106 2013

[16] H Martens S Jensen and P Geladi ldquoMultivariate linearitytransformation for near-infrared reflectance spectrometryrdquo inProceedings of Nordic symposium on Applied Statisticspp 205ndash234 Stavanger Norway June 1983

[17] P Geladi D MacDougall and HMartens ldquoLinearization andscatter-correction for near-infrared reflectance spectra ofmeatrdquo Applied Spectroscopy vol 39 no 3 pp 491ndash500 1985

[18] P Kubelka and F Munk ldquoEin beitrag zur optik der far-banstricherdquo Zeitschrift fur Technische Physik vol 12pp 593ndash601 1931

[19] A Claus Andersson ldquoDirect orthogonalizationrdquo Chemo-metrics and Intelligent Laboratory Systems vol 47 no 1pp 51ndash63 1999

[20] R J Barnes M S Dhanoa and S J Lister ldquoStandard normalvariate transformation and de-trending of near-infrareddiffuse reflectance spectrardquo Applied Spectroscopy vol 43no 5 pp 772ndash777 1989

[21] M Zeaiter J-M Roger and V Bellon-Maurel ldquoRobustnessof models developed by multivariate calibration part ii theinfluence of pre-processing methodsrdquo TrAC Trends in Ana-lytical Chemistry vol 24 no 5 pp 437ndash445 2005

[22] T Fearn C Riccioli A Garrido-Varo and J Emilio Guer-rero-Ginel ldquoOn the geometry of SNV and mscrdquo Chemo-metrics and Intelligent Laboratory Systems vol 96 no 1pp 22ndash26 2009

[23] T Isaksson and B Kowalski ldquoPiece-wise multiplicative scattercorrection applied to near-infrared diffuse transmittance datafrom meat productsrdquo Applied Spectroscopy vol 47 no 6pp 702ndash709 1993

[24] Y Bi K Yuan W Xiao et al ldquoA local pre-processing methodfor near-infrared spectra combined with spectral segmen-tation and standard normal variate transformationrdquo Analy-tica Chimica Acta vol 909 pp 30ndash40 2016

[25] F Pedregosa G Varoquaux A Gramfort et al ldquoScikit-learnmachine learning in pythonrdquo Journal of Machine LearningResearch vol 12 pp 2825ndash2830 2011

[26] S Raschka Python Machine Learning Packt PublishingBirmingham UK 2015

[27] E L Lehmann and G Casella Deory of Point EstimationSpringer Texts in Statistics Springer New York NY USA2003

[28] O Heinisch ldquoSteel R G D and J H Torrie principles andprocedures of statistics (with special reference to the bi-ological sciences) mcgraw-hill book company New York

Journal of Spectroscopy 13

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 14: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

Toronto London 1960 481 s 15 abb 81 s 6 drdquo BiometrischeZeitschrift vol 4 no 3 pp 207-208 1962

[29] K Pearson ldquoNote on regression and inheritance in the case oftwo parentsrdquo Proceedings of the Royal Society of Londonvol 58 no 1 pp 240ndash242 1895

14 Journal of Spectroscopy

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom

Page 15: DynamicLocalizedSNV,PeakSNV,andPartialPeakSNV ...downloads.hindawi.com/journals/jspec/2018/5037572.pdf · same dimension as the original reference target values. 3.4.2. R2 Coecient

TribologyAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

International Journal ofInternational Journal ofPhotoenergy

Hindawiwwwhindawicom Volume 2018

Journal of

Chemistry

Hindawiwwwhindawicom Volume 2018

Advances inPhysical Chemistry

Hindawiwwwhindawicom

Analytical Methods in Chemistry

Journal of

Volume 2018

Bioinorganic Chemistry and ApplicationsHindawiwwwhindawicom Volume 2018

SpectroscopyInternational Journal of

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Medicinal ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

NanotechnologyHindawiwwwhindawicom Volume 2018

Journal of

Applied ChemistryJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Biochemistry Research International

Hindawiwwwhindawicom Volume 2018

Enzyme Research

Hindawiwwwhindawicom Volume 2018

Journal of

SpectroscopyAnalytical ChemistryInternational Journal of

Hindawiwwwhindawicom Volume 2018

MaterialsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

BioMed Research International Electrochemistry

International Journal of

Hindawiwwwhindawicom Volume 2018

Na

nom

ate

ria

ls

Hindawiwwwhindawicom Volume 2018

Journal ofNanomaterials

Submit your manuscripts atwwwhindawicom