multi var

7
computer corner MULTIVAR. A program for multivariate calibration incorporating net analyte signal calculations He Lctor C. Goicoechea Ca Ltedra de Qu| Lmica Anal| Ltica I, Facultad de Bioqu| Lmica y Ciencias Biolo Lgicas, Universidad Nacional del Litoral, Ciudad Universitaria, Santa Fe ( 3000 ), Argentina Alejandro C. Olivieri* Departamento de Qu| Lmica Anal| Ltica, Facultad de Ciencias Bioqu| Lmicas y Farmace Luticas, Universidad Nacional de Rosario, Suipacha 531, Rosario ( 2000 ), Argentina 1. Introduction A useful chemometric tool is the use of multivari- ate calibration applied to spectroscopic data [ 1 ], which enables the ef¢cient extraction of informa- tion concerning certain analytes of interest from spectra of multicomponent mixtures. Two popular methods are principal component regression (PCR ) and partial least-squares (PLS ) [ 2 ], both of which use inverse calibration steps combined with a prior optimisation of the calibration information. They have the following advantages: ( 1 ) use of full spec- tra, ( 2 ) knowledge of only the concentrations of the analytes of interest in the calibration samples is required, and ( 3 ) spectral decomposition into fac- tors avoids the problems associated with spectral collinearities [ 2 ]. They are ideally suited for the study of complex biological samples, as in drug or metabolite monitoring in blood [ 3,4 ], or in pharma- ceutical analysis of certain multicomponent prepa- rations where not all the excipients may be known [ 5 ]. A common requirement of all these multivariate methods is that the background should be mod- elled during calibration, i.e. compounds of no inter- est ( but present in unknown samples) should be contained in the calibration samples. With the introduction and development of the useful concept of net analyte signal (NAS) [ 6 ], a new family of multivariate calibration methods has arisen. The NAS is the part of the signal which is directly related to the concentration predicted by the calibration model. In mathematical terms, it is the part of a spectrum which is orthogonal to the space spanned by the spectra of all analytes except one [ 6 ]. In this article, we present a Visual Basic program which, besides the classical PCR and PLS-1 techni- ques, can be conveniently used for multivariate cal- ibration with several NAS-based methods. It runs under Windows 95/98 or NT, and allows users to produce ASCII ¢les containing NAS results and other relevant statistical information. One can also perform wavelength selection by resorting to net analyte signal regression plots (NASRP ), a method- ology recently introduced in the multivariate cali- bration context [ 7,8 ]. 2. Program 2.1. General characteristics and program availability The program MULTIVAR can be easily installed under Windows 95 / 98 or NT, and has all the capa- bilities of the powerful programming system Visual Basic [ 9 ]. Among other interesting features, it includes dialog boxes for easy data input, ¢le browsers, and reports presented on the screen as text boxes with scroll bars. Fig. 1 shows the £ow diagram for MULTIVAR, which is typical of a multi- 0165-9936/00/$ ^ see front matter ß 2000 Publication by Elsevier Science B.V. PII:S0165-9936(00)00045-5 *Correspondig author. Tel.: +54-341-4372704; Fax: +54-341-4372704. E-mail: [email protected] trends in analytical chemistry, vol. 19, no. 10, 2000 599

Upload: liviu

Post on 10-Nov-2015

3 views

Category:

Documents


1 download

DESCRIPTION

multi var

TRANSCRIPT

  • TRAC 2660 24-10-00

    computer cornerMULTIVAR. A program for multivariatecalibration incorporating net analyte signalcalculationsHector C. GoicoecheaCatedra de Qu|mica Anal|tica I, Facultad de Bioqu|mica y Ciencias Biologicas,Universidad Nacional del Litoral, Ciudad Universitaria, Santa Fe ( 3000 ), Argentina

    Alejandro C. Olivieri*Departamento de Qu|mica Anal|tica, Facultad de Ciencias Bioqu|micas y Farmaceuticas,Universidad Nacional de Rosario, Suipacha 531, Rosario ( 2000 ), Argentina

    1. Introduction

    A useful chemometric tool is the use of multivari-ate calibration applied to spectroscopic data [ 1 ],which enables the efcient extraction of informa-tion concerning certain analytes of interest fromspectra of multicomponent mixtures. Two popularmethods are principal component regression (PCR)and partial least-squares (PLS) [ 2 ], both of whichuse inverse calibration steps combined with a prioroptimisation of the calibration information. Theyhave the following advantages: (1) use of full spec-tra, (2) knowledge of only the concentrations of theanalytes of interest in the calibration samples isrequired, and (3) spectral decomposition into fac-tors avoids the problems associated with spectralcollinearities [ 2 ]. They are ideally suited for thestudy of complex biological samples, as in drug ormetabolite monitoring in blood [ 3,4 ], or in pharma-ceutical analysis of certain multicomponent prepa-rations where not all the excipients may be known[ 5 ]. A common requirement of all these multivariatemethods is that the background should be mod-elled during calibration, i.e. compounds of no inter-est ( but present in unknown samples) should becontained in the calibration samples.

    With the introduction and development of theuseful concept of net analyte signal (NAS) [ 6 ], anew family of multivariate calibration methods

    has arisen. The NAS is the part of the signal whichis directly related to the concentration predicted bythe calibration model. In mathematical terms, it isthe part of a spectrum which is orthogonal to thespace spanned by the spectra of all analytes exceptone [ 6 ].

    In this article, we present a Visual Basic programwhich, besides the classical PCR and PLS-1 techni-ques, can be conveniently used for multivariate cal-ibration with several NAS-based methods. It runsunder Windows 95/98 or NT, and allows users toproduce ASCII les containing NAS results andother relevant statistical information. One can alsoperform wavelength selection by resorting to netanalyte signal regression plots (NASRP), a method-ology recently introduced in the multivariate cali-bration context [ 7,8 ].

    2. Program

    2.1. General characteristics and programavailability

    The program MULTIVAR can be easily installedunder Windows 95/98 or NT, and has all the capa-bilities of the powerful programming system VisualBasic [ 9 ]. Among other interesting features, itincludes dialog boxes for easy data input, lebrowsers, and reports presented on the screen astext boxes with scroll bars. Fig. 1 shows the owdiagram for MULTIVAR, which is typical of a multi-

    0165-9936/00/$ ^ see front matter 2000 Publication by Elsevier Science B.V.PII: S 0 1 6 5 - 9 9 3 6 ( 0 0 ) 0 0 0 4 5 - 5

    *Correspondig author. Tel.: +54-341-4372704;Fax: +54-341-4372704. E-mail: [email protected]

    trends in analytical chemistry, vol. 19, no. 10, 2000 599

  • TRAC 2660 24-10-00

    variate calibration program. However, its uniquefeature as compared to commercially availablepackages is the possibility of carrying out a varietyof NAS-based calculations. Installation les forMULTIVAR are available at the following Internetaddress: ftp: / /www.fbioyf.unr.edu.ar /cientico /multivar.exe, as a single compressed le. Installa-tion disks can also be obtained by writing to theauthors.

    In the next sections the theory upon which NAS-based methods are built is briey reviewed.

    2.2. Net analyte signals and NAS-basedmultivariate methods

    In this section we briey refer to NAS-basedmethods. The following matrixes and vectors willbe used: an IUJ data matrix R composed of thecalibration responses of I samples at J sensors, aJU1 vector sk containing the pure spectrum of ana-

    lyte k at unit concentration and an IU1 vector ck ofcalibration concentrations of analyte k.

    The net analyte signal (NAS) for analyte k (rk*) isdened as the part of its spectrum which is orthog-onal to the space spanned by the spectra of all otheranalytes [ 6 ], and is given by the following equation:

    rk PNAS;kr I3R3kR3kr 1

    where PNAS;k is a JUJ orthogonal projection matrixwhich projects a given vector onto the NAS space, ris the spectrum of a given sample (when r is thespectrum of pure k at unit concentration, Eq. 1becomes sk*=PNAS;ksk ), I is a JUJ unitary matrixand R3k is a JUI column space spanned by thespectra of all other analytes except k (R3k is thepseudo-inverse of R3k , usually computed by singu-lar value decomposition using A factors ).

    Several alternatives exist for the calculation of thematrix R3k . In the method developed by Xu andSchechter [ 10 ], each calibration spectrum ri;cal isdivided by its concentration of analyte k, cik(excluding those samples in which k is absent toavoid a division by zero), and then the average ofthe resulting spectra is calculated as:

    scal 1I 0XIi1

    ri;calcik

    2

    where IP is the number of calibration samples forwhich cikg0. The contribution of the average spec-trum scal is subtracted from the data matrix by thefollowing operation:

    R3ki0;j Ri0;j=cik3sTcal;j 3

    The calibration spectra with cik=0 are thenappended to the matrix obtained in Eq. 3, yieldingthe desired matrix R3k . The net sensitivity vector sk*is calculated by projection of the least-squaresapproximation sk;LS onto the NAS space:

    sk PNAS;ksk;LS PNAS;kRTckcTkck

    4

    The concentration of k in unknown samples isobtained from the spectrum r as:

    cun;k sTkPNAS;kr

    sTkPNAS;ksk s

    TkPNAS;kPNAS;kr

    sTkPNAS;kPNAS;ksk s

    kTrkNskN

    2

    5

    Fig. 1. Flow diagram of the program MULTIVAR.

    600 trends in analytical chemistry, vol. 19, no. 10, 2000

  • TRAC 2660 24-10-00

    which is the basis of the prediction step in NAS-based methods [ 6 ].

    Although Xu and Schechter used all the I factors(when I6 J ) of R3k for prediction, in order to builda method free from optimum factor estimation, wesuggest selecting the optimum number A by thecross-validation procedure described by Haaland[ 2 ]. In this way, the method can be used evenwhen the number of samples exceeds the recom-mended limit of one third the number of sensors[ 10 ].

    Another alternative is hybrid linear analysis(HLA) [ 11 ], in which R3k is directly given byR3ckskT [ 11 ]. It should be noticed that HLA canbe applied provided a very accurately measuredspectrum of pure k is available, and thus it cannotbe used when interactions among sample compo-nents occur, or when sk is unknown.

    Two additional NAS-based methods have beenrecently introduced which do not require the purespectrum sk to be known. In one of them, describedby Goicoechea and Olivieri [ 4 ], the mean (non-centred) calibration spectrum is rst obtained:

    rcal 1I

    XIi1

    ri;cal 6

    and the contribution of analyte k is subtracted fromthe data matrix R in the following way:

    R3k R3 ckrTcal

    ck;cal7

    where ck;cal is the mean (non-centred) calibrationconcentration of analyte k. The calculation of sk* iscarried out with Eq. 5. We call this method HLA/GO, and that developed by Xu and Schechter willbe denoted as HLA /XS.

    A nal possibility involves using the least-squares approximation sk;LS both to obtain thematrix R3k through Eq. 8 and sk* through Eq. 4:

    R3k R3cksk;LS 8

    This latter method can be shown to consist of thefollowing steps: (1) pre-processing the raw datamatrix R by projecting it onto the space orthogonalto that spanned by all analytes except k, leading tothe net analyte data matrix Rk* ( i.e. Rk*=PNAS;kR ),and (2) correlating this latter matrix to the analyteconcentrations ck through a classical least-squaresprocedure. We thus call this method NAP/CLS (netanalyte pre-processing combined with classicalleast-squares) [ 12 ].

    All the possibilities described above for NAS cal-culations are covered by the presently discussedprogram. Furthermore, NAS values are also com-puted for PCR and PLS methods, using previouslydiscussed methodologies [ 6 ].

    2.3. Figures of merit

    Selectivity, sensitivity and limit of determinationcan be calculated and used for method comparisonor to study the quality of a given analytical techni-que. The selectivity is a measure of the degree ofoverlap, and indicates the part of the total signalwhich is not lost due to spectral overlap [ 6 ]. Inmultivariate calibration it can be dened by resort-ing to NAS calculations [ 6 ]:

    SEL NskN

    NskN9

    On the other hand, the sensitivity indicates towhat extent the response due to a particular analytevaries as a function of its concentration [ 6 ], and isgiven by:

    SEN NskN 10

    The program MULTIVAR calculates the sensitiv-ity and selectivity for each applied method. Finally,the following equation has been proposed for esti-mating the LOD [ 13 ]:

    LOD 3NON=NskN 11

    where NOON is a measure of the instrumental noise.The value of NOON may be estimated, in turn, by

    Table 1Typical calibration and prediction reports produced by MULTI-VAR

    Calibration results Prediction results

    Method: NAP /CLS Sample lename: UNK-1.DATData are mean centred Cpred: 1.97E+00Component: bromhex-ine

    Condence limit (P=0.95): 4.58E-03

    Range of sensors: 1^64 Norm of sample NAS: 1.37E-01Number of factors: 2 Spectral residue: 0.43%Sensitivity: 1.23E+00 Error indicator: 2.89E-02Selectivity: 6.83E-01

    trends in analytical chemistry, vol. 19, no. 10, 2000 601

  • TRAC 2660 24-10-00

    registering spectra for several blank samples, cal-culating the norm of the NAS for each sample, andthe corresponding standard deviation. The lattercan be taken as an approximation to NOON.

    A typical report produced by MULTIVAR con-cerning calibration performance is shown inTable 1.

    2.4. Error indicator based on deviations of NASlinearity and sensor selection

    MULTIVAR calculates, for each unknown samplepredicted, the condence limit in the calculatedconcentration, as well as the corresponding spec-tral residuals. Furthermore, it also gives an indica-tion of the errors brought about by deviations of theexpected linearity of the NASRP (a typical report is

    shown in Table 1), a plot of the elements of thevector rk* as a function of those of sk*. In theabsence of error or non-modelled interferents, theNASRP should be linear. Signicant deviations fromlinearity are strong indicators of the presence ofcomponents which were not adequately modelledduring calibration. A quantitative measure of thedeviation from linearity is given by the followingerror indicator (EI ) function:

    EI s21 N 2s2=4NrkN21=2NrkN31 12

    where s is the standard deviation from the best ttedstraight line to the NASRP ( in a given spectralregion) and N is the number of points in the latterplot.

    Table 2Calibration and validation parameters for the analyte bromhexine obtained by MULTIVAR, and results for samples with non-modelledinterferences, in the wavelength range 285^348 nm

    Calibration results

    Parameter PCR PLS-1 HLA HLA /XS HLA /GO NAP /CLS

    Optimum A 3 3 2 2 2 2REP(%)a 0.77 0.73 0.95 1.34 0.87 0.75R2 0.9985 0.9987 0.9978 0.9955 0.9981 0.9986SENb 1.26 1.25 1.17 1.33 1.20 1.23SEL 0.70 0.69 0.63 0.74 0.66 0.68

    Validation results

    REP(%)c 2.74 2.72 3.60 2.95 2.98 2.72

    Samples with interferents

    Interferent added Actual Bromhexine ( mol l31U104 ) foundd

    PCR PLS-1 HLA HLA /XS HLA /GO NAP /CLS

    Dextromethorphan 2.66 2.1 (1) 2.1 (1) 2.2 (1) 2.1 (1) 2.1 (1) 2.1 (1)Sodium benzoate 2.75 2.9 (1) 2.9 (1) 3.0 (1) 2.95 (5) 3.00 (4) 2.95 (5)

    aR2 13

    XI1

    cact3cpred2

    XI1

    cact3c2;c is the average component concentration in the I calibration mixtures;

    REP% 100c

    1

    I

    XI1

    cact3cpred2" #1=2

    :

    bSEN is given in units of 104 (absorbance units)Ul mol31.cThis value of REP(%) corresponds to the validation set and may be different to that for the calibration.dThe condence limit of the predicted concentration is given in parentheses.

    602 trends in analytical chemistry, vol. 19, no. 10, 2000

  • TRAC 2660 24-10-00

    A sensor selection approach, based on a movingspectral window strategy in which a search for theminimum EI is conducted at all possible spectralranges [ 8 ], has been developed recently andapplied to several experimental cases. MULTIVARemploys this methodology, which is available onany of the NAS-based variants.

    3. Results

    In order to illustrate the results provided by theprogram MULTIVAR, samples of a decongestantsyrup containing bromhexine as the active princi-ple were selected. The analyte of interest isembedded in a complex mixture of eight compo-nents. A training set of 12 mixtures, containingbromhexine in the known linear absorbance^con-centration range and a blank syrup sample ( i.e. amixture of ethanol, glycerol, sorbitol, tartaric acid,sodium benzoate, carboxymethylcellulose, orangeessence and amaranth in methanolic HCl 0.1 moll31 ), was prepared for calibration. For furtherdetails see [ 5 ]. A validation set of 11 mixtures wasalso prepared. To some of these samples, potentialinterferences absent in the calibration set wereadded, such as the cough suppressant dextrome-thorphan or the preservative sodium benzoate.

    Fig. 4. (A) NASRP for one of the validation samples in the285^348 nm wavelength range, (B) NASRP for the samplecontaining dextromethorphan in the same range, and (C)NASRP of the latter sample in the 315^344 nm range, aspredicted from the analysis of the EI parameter. All calcula-tions were done with the NAP /CLS method. The solid linesare the best linear ts to the calculated points, and the valuesof EI are indicated in each plot.

    Fig. 3. Norm of NAS of the 11 validation samples as a func-tion of the actual concentration of bromhexine, calculatedwith the NAP /CLS method.

    Fig. 2. Plots of predicted vs. nominal concentrations for the11 validation samples: (A) PLS-1 and (B) NAP /CLS, both inthe 285^348 nm range; (C) PLS-1 and (D) NAP /CLS in the314^344 nm range. The solid lines are the best linear ts tothe calculated points, with the correlation coefcient r2 indi-cated in each plot.

    trends in analytical chemistry, vol. 19, no. 10, 2000 603

  • TRAC 2660 24-10-00

    Analysis of the spectra (see Fig. 1 of [ 5 ]) suggeststhat the 200^280 nm region is uninformative, andthat a convenient working region for MULTIVAR is285^348 nm, where a certain degree of spectraloverlapping between bromhexine and the exci-pients still persists. Table 2 shows cross-validationresults for all tested methods in the spectral range285^348 nm. They include the optimum number offactors selected according to Haaland and Thomas[ 2 ], the square of the correlation coefcient (R2 )and relative error of prediction (REP), which givean indication of the quality of t of all the data, aswell as relevant gures of merit. Table 2 alsopresents the prediction results for the validationset, and Fig. 2A and B show plots of predicted vs.nominal concentrations ( for clarity only PLS-1 andNAP/CLS methods were selected). As can be seen,all methods yield excellent recoveries, although theNAS-based methods require less factors due to thereduction of the dimensionality of the model whichaccompanies the calculation of the matrix R3k . Acomparison of the results presented in Table 2 indi-cates that the performance of the newly introducedmethods that employ NAS calculations is compara-ble to those of the classical PCR and PLS techniques.However, the former provide a natural access torelevant gures of merit such as sensitivity andselectivity.

    An interesting property of the NAS is reected inFig. 3, which shows a plot of the euclidian norm ofthe NAS, calculated with the aid of MULTIVAR forthe unknown samples, as a function of the actualconcentration of the analyte. A linear relationshipexists between the values of Nrk*N and cun;k , whichallows the presentation of the multivariate data in apseudo-univariate mode [ 14 ].

    In order to study the effect of non-modelled inter-ferences, test samples containing dextromethor-phan and sodium benzoate were subjected to astrategy based on NAS /EI calculations in order toidentify the potential presence of an interferenceand to alleviate its effects. Table 2 shows the resultsfor two of these samples using the calibration in the285^348 nm range. In this case all methods displayless predictive power towards these samples (seeTable 2), as can be judged both from the poorrecoveries and from the rather large condencelimits for the predicted concentrations (whichshould be compared to values of 0.005^0.01U1034 mol l31, typical for the normal valida-tion samples). Furthermore, in contrast to otherunknowns (see Fig. 4A for the NASRP of a selectedvalidation sample and its corresponding EI value),the NASRP for the sample containing dextrome-thorphan in the full 285^348 nm spectral region isnot linear, strongly indicating the presence of non-

    Table 3Calibration and validation parameters for the analyte bromhexine obtained by MULTIVAR, and results for samples with non-modelledinterferences, in the wavelength range 315^344 nm

    Calibration results

    Parametera PCR PLS-1 HLA HLA /XS HLA /GO NAP /CLS

    Optimum A 2 2 1 1 1 1REP(%) 1.72 1.71 1.60 1.79 1.67 1.70R2 0.9927 0.9927 0.9933 0.9920 0.9930 0.9928SEN 0.60 0.60 0.59 0.60 0.58 0.58SEL 0.45 0.45 0.41 0.45 0.44 0.44

    Validation results

    REP(%) 4.33 4.33 4.70 4.30 4.35 4.37

    Samples with interferents

    Interferent added Actual Bromhexine ( mol l31U104 ) found

    PCR PLS-1 HLA HLA /XS HLA /GO NAP /CLS

    Dextromethorphan 2.66 2.68 (2) 2.68 (2) 2.68 (2) 2.68 (2) 2.69 (2) 2.68 (2)Sodium benzoate 2.75 2.78 (5) 2.78 (5) 2.76 (2) 2.79 (3) 2.77 (3) 2.77 (3)

    aSee Table 2 for the denition of parameters.

    604 trends in analytical chemistry, vol. 19, no. 10, 2000

  • TRAC 2660 24-10-00

    modelled interferences (see Fig. 4B). The EIquoted in Fig. 4B provides a quantitative measureof the degree of departure from linearity. A searchfor the minimum value of EI for this particular sam-ple as a function of all possible wavelength rangesindicates that the corresponding NASRP in therestricted region 315^344 nm (Fig. 4C) is seeminglylinear. It may be noticed that the latter spectralrange avoids the contribution from dextromethor-phan [ 5 ]. Similar results were obtained with thesample containing sodium benzoate.

    Table 3 shows the calibration and validationparameters in the restricted region 315^344 nm.As can be seen, the statistical indicators are not sig-nicantly different among the tested methods,although a lower quality is apparent on comparisonwith the full range gures presented in Table 2 (seeFig. 2C and D for a plot of predicted vs. nominalconcentrations in the restricted range). However,despite the lower calibration performance, theselected spectral range is appropriate for studyingsamples with the tested non-modelled componentsdextromethorphan and sodium benzoate, asinferred from the predictions shown in Table 3.Moreover, the condence limits for the calculatedconcentrations are in the order of those corre-sponding to the validation samples.

    The above discussion illustrates the value of thesensor selection technique incorporated in MULTI-VAR. Note that this selection procedure should beapplied for each unknown sample showing unsat-isfactory EI values, after which the model should berecalibrated before predictions are made in the newspectral range (see Fig. 1).

    Acknowledgements

    Financial support from CONICET (ConsejoNacional de Investigaciones Cient|cas yTecnicas), the University of Rosario and the Agen-cia de Promocion Cient|ca y Tecnologica (ProjectNo. 06-00000-01765) is gratefully acknowledged.H.C.G. thanks FOMEC (Programa para el Mejora-miento de la Calidad de la Ensenanza Universitaria )for a fellowship. We also thank Prof. Arsenio Munozde la Pena (University of Extremadura, Spain) forvaluable discussions.

    References

    [ 1 ] H. Martens, T. Naes, Multivariate Calibration, Wiley,New York, 1989.

    [ 2 ] D.M. Haaland, E.V. Thomas, Anal. Chem. 60 (1988)1193.

    [ 3 ] H.C. Goicoechea, A.C. Olivieri, Anal. Chim. Acta 384(1999) 95.

    [ 4 ] H.C. Goicoechea, A.C. Olivieri, Anal. Chem. 71 (1999)4361.

    [ 5 ] H.C. Goicoechea, A.C. Olivieri, Talanta 47 (1998) 103.[ 6 ] A. Lorber, K. Faber, B.R. Kowalski, Anal. Chem. 69

    (1997) 1620.[ 7 ] J. Ferre, F.X. Rius, Anal. Chem. 70 (1998) 1999.[ 8 ] H.C. Goicoechea, A.C. Olivieri, Analyst 124 (1999)

    725.[ 9 ] Microsoft Visual Basic, Microsoft Press, Redmond, WA,

    1997.[ 10 ] L. Xu, I. Schechter, Anal. Chem. 69 (1997) 3722.[ 11 ] A.J. Berger, T.-W. Koo, I. Itzkan, M.S. Feld, Anal. Chem.

    70 (1998) 623.[ 12 ] H.C. Goicoechea, A.C. Olivieri, Chemom. Intell. Lab.

    Syst. ( submitted for publication).[ 13 ] K.S. Booksh, B.R. Kowalski, Anal. Chem. 66 (1994)

    782A.[ 14 ] N.M. Faber, Chemom. Intell. Lab. Syst. 50 (2000) 107.

    trends in analytical chemistry, vol. 19, no. 10, 2000 605