generating correlation and regression models from high ... · plication of statistical methods for...

Technical ReportFLLL–TR–0212

Generating Correlation and Regression Models fromHigh Dimensional Measurement Data - Advanced

Aspects, Strategies and Validation

Edwin LughoferFuzzy Logic Laboratorium Linz-Hagenberg

e-mail [email protected]

Werner GroissböckFuzzy Logic Laboratorium Linz-Hagenberg

e-mail [email protected]

Abstract — Automatic generation of data-based models is often of fundamental importance forthe identification and analysis of real-time processes, prediction of future process output values orfault diagnosis for new incoming data sets. One possible choice of a proper model architecturecan be characterized through correlation and regression models, which, in comparison of fuzzy in-ference systems or neural networks, have the advantage that they describe dependencies betweenprocess variables in a closed analytical formula and can be computed in a fast and efficient way.

Key words — identification, prediction, fault diagnosis, model architecture, correlation andregression models, fuzzy inference systems, neural networks, process output values, process vari-ables

Johannes Kepler Universität Linz Fuzzy Logic Laboratorium Linz-Hagenberg

Institut für Algebra, Stochastik undwissensbasierte mathematische SystemeJohannes Kepler Universität LinzA-4040 LinzAustria

FLLLSoftwarepark Hagenberg

Hauptstrasse 99A-4232 Hagenberg

Austria

1 Motivation and Basic Facts 1

1 Motivation and Basic Facts

Regression and specially correlation models are suited to describe functional dependencies be-tween 2 (-> correlation models) or more (-> multidimensional regression models) measure valueswithout knowing the physical background behind, hence they can be noticed as real black boxmodels. In reality, often measure values are available as measure and calculation channels re-sulting from a (complete or partial) test procedure and can be also seen as statistical series. Ap-plication of statistical methods for building up regression/correlation models onto data resultingfrom a complete test run means more or less an offline training which can be combined with anoffline plausibility check in the case of using the models in a fault detection process. Opposed tothat, online training, where models have to be updated from time to time and trained in real-timesynchronously with plausibility checking of new measured data points, can only be realized withlocal model generation methods (see chapter 3.3) if not using online adaptation algorithms.

The goal of statistical methods building up regression and correlation models is to describedependencies between 2 or more channels via mathematical closed analytical formulas, hence by

y = f(x1, ..., xn)∀x1, ..., xn ∈ Ωn (1)

whereΩn covers the whole input space of process variables or also called channelsx1, x2, ..., xn.

In the case of fault diagnosis measure channels and calculation channels (computed out ofmeasure channels) which both can appear in a data set, are treated in the same way, if correlationmodels are built up, but in a different way, if multidimensional regression is used. Correlationmodels are only 2-dimensional, such that calculation channels occurring on one side of the func-tional dependency does not effect the accuracy and quality of the model. They are just modelsthen which do not deliver any new contribution to fault detection rate, but do not effect in worseoverdetection rates, therefore only for computational reasons a channel pre-selection would be auseful thing. Multidimensional regression methods take a variety of channels (measured and cal-culated ones) as input, such that it can happen that a calculation channel worsens the efficiency ofmodels with regard to detection rate. For example, consider a calculation channel which is takenas input channel for reconstructing a measure channel which in turn is an input channel in the(real physical) formula for the calculation channel; obviously, this calculation channel will have astrong influence on the reconstruction of the measure channel, moreover mostly the strongest in-fluence of all channels going into regression, much more stronger than the others, such that otherscan almost be neglected and a regression formula is generated which would not deliver a contri-bution in the improvement of detection rate, as, of course, a fault will appear in both, the measurechannel and the calculation channel, and therefore no deviation of the (from the model) estimatedoutput to the real process output can be recognized. On the other hand, if the calculation chan-nel is known to be a calculation channel and therefore a-priori not taken into account, probably aregression model for the chosen measure channel can be generated which indeed can be a usefulimprovement in detection rate.

For useful channel selection methods refer to [1] and [2].

2 Theoretical Aspects 2

2 Theoretical Aspects

2.1 Least Squares Estimation

The goal of least squares methods building up regression and correlation models is to describedependencies between 2 or more channels through mathematical closed analytical formulas, whichare linear in the parametersβi, hence by

y = f(x) = β0 + β1r1 + ... + βmrm (2)

such that the functional curve fits into the measurements(x1, y1), (x2, y2), ..., (xn, yn) as good aspossible, respectively such that the loss functione =‖ y − y ‖ is minimized. In above equationtheri are the regressors and can either contain data of the original channels (⇒ linear regressionhyper-plain) or also data obtained through transforming original measurements by mathematicalterms such asx2

i , x3i , ..., x

ti, for i = 1, ..., n (⇒ polynomial models) orln(xi), 1

xi,√

xi, ... forarbitrary general functional models.

Taking into account thaty − y = y − Xβ, whereX is the regression matrix containing allselected regressorsri and choosing quadratic norm , the derivative of the error function is reflectedby

∂

∂β(y −Xβ)T (y −Xβ) = −2(XT y −XT Xβ) (3)

Setting the derivative to zero, the linear parameters can be obtained by

β = (XT X)−1XT y (4)

This system of linear equations can be solved for example by Gauss-Jordan elimination methodincluding the search of a pivot element as described in [3].

Practically, using all channels in a measurement system, where up to a few 100 channels canbe recorded and their data be collected, this would generally result in a few thousand regressors,which would lead to an unmanageable computational complexity (for detailed description of com-putational complexity refer to 5.3). Hence the following 2 basic ideas described in the next 2chapters should reflect a strategy how to cope with a large amount of measurement channels.

2.2 Generation of Correlation Models

As stated in [1] dealing with dimension reduction algorithms correlation analysis algorithm canbe applied by computing partial low dimensional models over a variety of input channels. Ascorrelation models are always 2-dimensional, for a data matrix consisting ofn input channels

theoretically

(n2

)correlation models have to be build up, which would result in a high com-

putational time, ifn is large. Due to these reasons a so-called correlation coefficient is computedwhich indicates the rationalibility for building up a model between 2 channels. Different cor-relation coefficients for the purpose of different kind of functional dependencies with differentattributes can be applied. For example the empirical correlation coefficient, calculated through theformula

rxy =∑n

i=1(xi − x)(yi − y)√∑ni=1(xi − x)2

√∑ni=1(yi − y)2

(5)


Figure 1: Uncorrelated channels X and P31 for a diesel engine, data points are widely spread overthe whole space

indicates a strong linear relationship between the channelsx andy. x and y are the mean val-ues of the channels. Another, correlation coefficient is denoted by the so-called Spearman-Rankcorrelation coefficient

rxy =∑n

i=1(Rg(xi)− n+12 )(Rg(yi)− n+1

2 )√∑ni=1(Rg(xi)− n+1

2 )2√∑n

i=1(Rg(yi)− n+12 )2

(6)

which is a measure for a montone dependency betweenx andy. Rg(xi) andRg(yi) are ranks ofthe valuesxi andyi computed due to their appearence in the sorted data channels ofx andy.

Another correlation coefficient which also gives a good measure for monotone dependency isdenoted by Kendall’s rank correlation coefficient

rxy =2

n(n− 1)

n∑i,j=1

sign(Rg(xj − xi))sign(Rg(yj − yi)) (7)

where againRg(xi) andRg(yi) are the rank values ofxi andyi.

In the case of all above described correlation coefficients a value of around 0 indicates thatthere is no dependency between the 2 channels (Figure 1) and therefore no correlation modeluseful, a value near 1 indicates that a high dependency into the same direction (Figure 2), a valuenear -1 indicates that a high dependency into different directions (Figure 3) exists; in both casesbuilding up a correlation model is valid.

If now the absolute value of the correlation coefficient lies near 1, it is reasonable and alsovalid, to built up a 2-dimensional correlation model, which can be for example described by alinear model

f(x) = β0 + β1x (8)

but also by a model containing a constitution of arbitrary well-conditioned mathematical terms,for example by

f(x) = β0 + β1x + β21x

+ β3ln(x) (9)


Figure 2: Correlation between P_MI and BH for an Otto engine, data points are spread along a flatcurve

Figure 3: Correlation between LAMBDA and LBDREZ for a diesel engine, data points are spreadalong a steep curve


with a degree of freedom (= number of regressors) equal to 4. In both cases the parameters can beestimated by the least squares method described in chapter 2.1.

2.3 Generation of Regression Models

In comparison to correlation models, regression models consist of arbitrary input channels. Inorder to be able to identify the complete system process, theoreticallyn (= number of measure andcalculation channels) least squares estimations with at leastn−1 inputs have to be carried out, i.e.each channel is reproduced by the combination of the othern − 1 channels. For practical usage,if n is big, these estimations would result in a too high computational effort, such that a channelpre-selection is conducted for each target output channel, which will return a list of regressors,consisting of combinations of original channels and being most significant for reproducing theoutput channel. Methods for accomplishing this tasks can be found in [1] and [2]. After that, thereal generation of the regression models through least squares estimation with a couple of mostsignificant regressors can be performed within an acceptable time frame.

2.4 Advanced Aspects

2.4.1 Regularization

Sometimes it can happen that the HessianXT X in the least squares algorithm is poorly condi-tioned or even singular. Besides, it is well-known that the likelihood of poor conditioning increaseswith the number of regressors, hence the matrix dimension. Therefore, the variance of the worstestimated parameters increases with increasing model flexibility. Methods for controlling thisvariance and in our case omitting singular Hessians are called regularization techniques. In thecontext of least squares approaches the loss function is extended to

‖ y − y ‖ +α|β|2 → minβ

(10)

Then the estimation of parameters leads to

β = (XtX + αI)−1XT y (11)

Those parameters that are not important for solving the least squares problem are driven towardzero in order to decrease the penalty term. Therefore, only the significant parameters will be used,since their error reduction effect is larger than their penalty term.

2.4.2 Weighting of Data Points

The weighted least squares approach is applied in all situations where the data samples and there-fore the corresponding errors have different relevance or importance for the estimation. For ex-ample in the case of local models, where the most adjacent data points to one special (check) datapoint are selected for constructing regressors matrixX and weighted with a tricube function (seechapter 3.3), a weighted least squares estimation, defined by

β = (XT QX)−1XT Qy (12)

3 Approaches for Generating Correlation and Regression Models 6

has to be performed.Q is called weighting matrix and has a diagonal structure:

Q =

ω1 0 ... 00 ω2 ... 0...

......

...0 0 ... ωn

so each single squared error valuee(i) is weighted by the corresponding matrix entryωi in Q.

2.4.3 Orthogonal Regressors

An important special case of the least squares solution results for mutually orthogonal regressors,i.e. rT

i rj = 0 for i 6= j. Then the Hessian becomes

H = XT X = diag(rT1 r1, r

T2 r2, ..., r

Tn rn) (13)

Hence the inversion of the Hessian is trivial and the parameter estimation becomes to

β = (XT X)−1XT y = diag(1

rT1 r1

,1

rT2 r2

, ...,1

rTn rn

)XT y (14)

such that each parameterβi can be determined seperately through

βi =rTi y

rTi ri

(15)

For orthogonal regressors 2 main advantages appear when solving least squares estimation:

Improvement of computational complexity: no matrix inversion needs to be performed,such that the complexity is reduced fromO(n3) to O(n2), wheren is the dimension of theHessian.

Reaching independence of the parameters from all regressors, as each parameter can beestimated separately. Hence, it is possible to include or erase regressors without affectingthe other parameter estimates and therefore to build up complex models from simple onesincrementally.

Usually, regressors obtained from real-life data are not orthogonal, but can be made so whenapplying principal components onto the data first (see [1]) and then taking the most significantones as input into the regression matrixX.

3 Approaches for Generating Correlation and Regression Models

3.1 Global Models

Global models means nothing else than to apply correlation and regression analysis onto the wholedata set without any dividing of data into segments or applying localization methods. Hence thegenerated models represent functional dependencies between system variables over the completeprocess.


Figure 4: Left side: global correlation model between measure N_MI and POEL of a diesel enginewith APE = 3.65%, right side: localization of models -> APE = 2.3%

Figure 5: 3 Clustered (linear) correlation models, no global correlation model valid

3.2 Clustered Models

In comparison to global models, clustered models can be seen as models which describes partialdependencies between 2 (correlation models) or more (regression models) channels in the data set.Clustering of data before generating the models is mainly performed because of 2 reasons:

Possibility of generation of more accurate models (see figure 4, APE = Average PercentError calculated through equation 18)

Models are only reasonable, if a localization in a preliminary step is performed (see figure5); this fact stems from possible variations of the system’s parameters or environment duringthe data recording.

Hence, in the case of generating correlation models, as a data pre-processing step a clustering al-gorithm such as direct-k-means ([4]), fuzzy-c-means ([5]) or subtractive clustering ([6]) is appliedonto every 2-combination of input data channels, and after that, the correlation coefficients foreach cluster in each 2-channel-combination are computed as hints, if a correlation model for theactual cluster in the actual channel pair should be generated. For this method, as cluster analysismethods demands the computation of euclidian distances, a normalization of data needs to be done


before starting the real model generation algorithm. Hence, the following skeleton of an algorithmfor clustered correlation analysis is obtained:

Input: Train data set, parameters

Output: A list of 2-dimensional

clustered correlation models, cluster centers for each cluster

and channel pair

Step 1: Normalization of input data

Step 2: For each channel pair

Step 3: Perform cluster analysis

For each cluster in the channel pair do

Step 4: Calculate correlation coefficient

If correlation coefficient > threshold then

Step 5: Perform least squares

Step 6:

Collect resulting

parameters into list of correlation models

Else

Continue

End if

End For

End For

Steps 4 and 5 are a little bit more sophisticated as described here and will be demonstrated inchapter 4. For multidimensional regression models the above algorithm can be applied in a similarway.

3.3 Local Models

As opposed to generating global models and clustered models where the complete measurementdata is taken as input at once, local models are generated in dependency of the environmental datapoints to the last measured data point. Hence local models always represent the situation in asystem process up-to-date without the demand of a special adaptive algorithm.

With the so-called Loess algorithm a locally weighted least squares estimation can be per-formed, where the weights for the data points can be calculated in the following way: Startingwith an arbitrary pointxi all other points included in the data set can be weighted proportional tothe euclidian distance toxi. Let therefore be∆i(xk) = d(xk, xi) the euclidian distance betweenxk andxi and let∆(i)(x) be these distances ordered from smallest to largest. The tricube weightfunctionT is defined by:

T (u, t) =

0 u ≥ t(1− (u

t )3)3 0 ≤ u < t

(16)

As a next step a so-called neighbourhood parameterα > 0 can be defined, which inducts thesmoothness of the regressional prediction in actual test data pointxi. There are now two cases todefine the weights for a tuplexi, yi:

4 Implementation Scheme 9

Forα ≤ 1: ωi(x) = T (∆i(x),∆(q)(x)), whereq is equal toαn, truncated to an integer.

Forα > 1: ωi(x) = T (∆i(x),∆(n)(x)α

Theωi(x) are called neighbourhood weights and are computed for just a subset of the input spacein the first case and for all data pointsxk in the second case. The weights decrease or stay constant,if the xk are further apart.

After calculating the weights for the data points, a weighted least squares estimation as de-scribed in chapter 2.4.2 is performed. Another, much more faster and therefore for online systemmore suitable approach was tested in a large amount of test runs within the scope of an indus-trial project and is given by weighting the most (due to euclidian distance) adjacent data pointswith 1 and all the others with 0. This leads not only to a fast data point selection (no complextricube weight function values need to evaluated), but also too a faster parameter estimation, asconventional least squares can be applied.

4 Implementation Scheme

The core strategy for global, clustered and local correlation and regression analysis was achievednot only through theoretical considerations, but also through expensive test runs within the scopeof an industrial project taking into account the improvement of performance without neglectingcomputational time. It is demonstrated in the flow chart in figure 6: All parts in the flow chart weredescribed in the previous chapters, except the point with the model selection due to the quality ofmodels. This topic will be handled in chapter 5.1.

For generating regression models the implementation scheme is nearly the same, with theexception that the channel selection is performed through variable selection and dimension reduc-tion methods as described in [1] and [2] instead of inspecting all 2-combinations of channels andcomputing correlation coefficients as hints for valid correlation models.

5 Validation

5.1 Quality of Models

In literature there exists a large variety of measure values characterizing models qualities. Thequality measure should reflect the trustability or also called confidence level of a model due todata’s nature. This trustability can be described in many various ways, above all depending onthe model architecture. One famous and widely-used measure value is simply denoted by the so-calledbias error, which is that part of the model error which arises due to restricted flexibility ofthe model. In reality most processes are quite complex, hence the model typically applied is notcapable of representing the process exactly. This leads to an error between process and model,which is calledbias error:

err =1n

n∑i=1

(yi − yi)2 (17)

whereyi = f(xi) denotes the estimated output value of the model described byf , yi the realprocess output value andn the number of train data points. In order to be able to avoid overfitting,

5 Validation 10

Figure 6: Core Strategy for building up local, global or clustered correlation models

5 Validation 11

the input data set should be divided into 2 parts first, one part containing all the data points usedfor training (→ training data), the other part containing all the points used for evaluation of thebias error (→ check data). Generally, overfitting can be detected when the bias checking error(difference between the output from the model and the check data output) starts increasing whilethe bias training error (difference between the output from the model and the training data output)is still decreasing.

In the case that the data was not normalized before model generation, thebias erroras definedas an absolute measure value in equation 17 cannot be compared between models containingchannels lying in different ranges. A better relative measure, which was also used as qualitymeasure in figure 4, is defined by the so-calledaverage percent error:

APE =1n

n∑i=1

|yi − yi||yi|

∗ 100% (18)

In the case that the range of the output target channely lies near 0, obviously above equationbecomes numerically unstable, hence another and better normalized measure value for the model’squality is defined byr-squared, hence:

R2 =ssr

sst(19)

wheressr =∑n

i=1(yi − y)2 andsst =∑n

i=1(yi − y)2 and y the mean value of output targetchannely. r-squaredcan be seen as a kind of multiple correlation coefficients, for correlationmodelsr-squaredis equal to the empirical correlation coefficient described in equation 5. Thegreat deficiency ofr-squaredlies in the fact that it does not take into account interpolation, as thedegrees of freedom (= the number of effective parameters, which is mostly but not always as largeas the number of nominal parameters - see [7]) are not incorporated into the calculation. Normally,interpolation, which originates from the equality of the degrees of freedom to the number of train-ing data points, causes the strongest possible overfitting. In that case each training data point canbe approximated exactly which leads to a maximalr-squaredvalue of 1, but should not, of course.Hence, the so-calledr-squared-adjustedvalue, given by

R2_adjusted = 1− (n− 1)(1−R2)n− df

(20)

which normalizes ther-squaredvalue, takes in account the discrepancy between the number oftrain data points (n) and the number of degrees of freedom (df ).

All of the above stated measure values describing the trustability and quality of a model,namelybias error, average percent error, r-squaredand r-squared-adjustedhave one thing incommon: the all can be applied to any data-based model with an arbitrary architecture and struc-ture, for example to fuzzy inference systems, neural networks, self organizing maps etc. In special,for correlation and regression models there exist further techniques for characterizing a model’squality, such asconfidence intervals[a, b] for the parameters, computed by:

a = xβ − t1−α/2(n−m− 1)√

n

n−m− 1

√σ2(x(XT X)−1xT ) (21)

and

b = xβ + t1−α/2(n−m− 1)√

n

n−m− 1

√σ2(x(XT X)−1xT ) (22)

5 Validation 12

whereX is the regression matrix containing all regressors,x the current measurement point trans-formed by the regressors,β the estimated parameters andt1−α/2 the α-quantile of the studentdistribution. An adaptation of these fixed values over the complete domain where the model isgenerated can be performed by the so-callederrorbars, which take into account extrapolation,density of data in certain sub-areas and noise assumptions (see [7] and [8]).

5.2 Amount and Quality of required Test Data

5.2.1 Amount

For statistical methods there exists a theoretical bound for the amount of needed data points. Letthereforef(x1, x2, ..., xn) be a (global or partial local) regression model computed by a statisticalalgorithm, then from the theoretical point of view the minimum number of samples needed isgiven bym, wherem is the number of degrees of freedom. Hence, for example for a linear modelgiven by

f(x1, x2, ..., xn) = β0 + β1x1 + ... + βnxn (23)

the degrees of freedom are exactlyn + 1, wheren is the dimension of the input space. If there areother basis function used to be able to shape the regression model, for instance

f(x1, x2, ..., xn) = β0 + β1x1 + β2ln(x1) + β3x1

x2+ ... + βkxnxn−1 (24)

then, of course, the degrees of freedom are just the number of parameters (betas) in the model.Nevertheless, this is just for theoretical use, but from the practical point of view in literature (see[9]) and also due to our experience from an industrial project the amount of data for a useful,accurate regression model should be10m, wherem again is the number of degrees of freedom.

5.2.2 Quality

The required quality of train data goes hand in hand with the needed amount. In other wordsthe train data should lie widely-spread in the input space and for example not be crowded in onesegment and as a consequence cover only one special case of a test run. Furthermore, the train datashould not contain too many faults and noise such as undefined values and big errors (outliers),which can be in fact filtered out, but the result would be maybe a too small data matrix, such thatthe requirement to the amount of data cannot be fulfilled. Another risk lies in the fact that thereare a lot of errors or faults in the train data, which cannot be pre-filtered and therefore influencethe generation of models in a bad way. Models are built up which do not reflect the real situationand dependency of channels, a large overdetection rate would be the result (error-free data pointswould not fit in the models). If the amount of faults is small, statistical methods for building upmodels are hardly disturbed, as errors would be ignored.

5.3 Computational Complexity

As all 2-combinations of channels are taken into account, the computational complexity for gen-erating partial correlation models depends strongly on the amount of correlated data columns inhistoric data and, of course on the amount of train data itself. As a consequence of a large amount

6 Conclusion 13

of test runs with real-life engine data, it turned out that for

(n2

)= n(n−1)

2 possible channel

combinations approximatelyn solvings of the equtional system

XT Xβ = Xty (25)

need to be performed. As the matrixX consists of original or transformed real-life data and isgenerally not sparse and also in no special form, the computation time of above equational systemwith Gauss Jordan elimination isO(k3), which leads to an approximate overall complexity ofO(nk3).

Note: as stated above,n is the number of channels whilek is the number of degrees of freedom(parametersβi), such that for multidimensional linear regression analysis which takes all channelsas input to a reconstruction of a special channel the complexity getsO(n4). The computation ofempirical correlation coefficient as well as Spearman-Rank correlation coefficient is of complexityO(n) and can therefore be neglected here. After all, the multiplicationXT X is in practical usageindeed the most time-consuming part, because its complexity results inO(k2m) computations,wherem is the number of data points and is mostly much more higher than the number of degreesof freedom and even the number of channels. For local models the complexity gets even worse,because for every test data point a set of partial correlation models is computed; this will theoret-ically lead to a complexity ofO(mnk3), wherem is the number of test data points, but duringpractical test runs, it turned out that only a fraction ofn models need to be generated.

In the table 5.3 for computation time needed by global and clustered correlation and globalregression method is stated. All results are based on the following configuration:Host: PCProcessor: Athlon 800 MhzRAM: 256 MBOperating System: Windows2000Compiler: Visual C++ 6.0Settings: More or less default entries, optimization flag O2 (speed increase)Input data set: 1810 records and 80 channels

Virtual memory requirements also depend strongly on the amount of correlated data columns,because for each correlation model the parametersβi, corresponding cluster centers, normalizationparameters and the quality of the model need to be collected in an open list of correlation modelsand written back at the end of a data processing run.

6 Conclusion

All described approaches and algorithms in this technical report for the generation of correlationand regression models (except the generation of local models) have one great deficit: they areoffline algorithms, which means nothing else than the algorithms can only be applied when sendingthe complete data matrix into the system at once. This circumstance entails big problems withrespect to industrial requirements to the identification system such as quickness of the algorithmsfor online training, auto-adaptation of models to similar test objects or hybrid models. Hybridmodels here means coupling physical knowledge in form of rough models as starting points (forexample some linguistic rules) with data-based adjustments of parameters for the refinement of

References 14

Method Configuration Computational Time

Global Correlation linear regressors 2.75 seconds for 71 par-tial models

Global Correlation linear regressors 4.82 seconds for 356 par-tial models

Global Correlation regr. ln(x) ,1x andx used 4.26 seconds for 71 par-tial models

Clustered Correlation linear regressors, 10 clusters11.08 secondsClustered Correlation linear regressors, 20 clusters14.34 seconds

Global Regression linear regressors 88.05 seconds (67 regres-sion models with 66 de-grees of freedom)

Table 1: Computation time for generating correlation and regression models

the models during the measurement process. Hence, while local models are more or less open-loop compatible by nature, for global models special algorithms need to be developed to achieveopen-loop ability. These methods are called adaptive methods or adaptation algorithms and willbe dealt in a further technical report.

References

[1] Edwin Lughofer, Werner Groissboeck, "Dimension Reduction: A Comparison of Methodswith respect to Fault Diagnosis of Engine Test Bench Data", Technical Report FLLL-TR-0210

[2] Werner Groissboeck, Edwin Lughofer, "A Theoretical and Practical Comparison of VariableSelection Methods with the Main Focus on Orthogonalization", Technical Report FLLL-TR-0211

[3] W. H. Press, S. A. Teukolsky, W. T. Vetterling and P.B. Flannery, "Numerical Recipes in C:The Art of Scientific Computing", Cambridge University Press, Cambridge, U.K., seconded., 1992

[4] Khaled Alsabthi, Sanyesy Ranka, Vineet Singh, "An efficient k-means clustering algorithm",Paper

[5] Frank Höppner, Frank Klawonn, Rudolf Kruse, "Fuzzy-Clusteranalyse - Method für dieBilderkennung, Klassifikation und Datenanalyse", Vieweg

[6] Robert P. Velthuizen, Lawrence O. Hall, Laurence P. Clarke and Martin L. Silbiger, "Aninvestigation of mountain clustering for large data sets", Pattern Recognition, Vol. 30, No. 7,pp. 1121-1135, 1997

References 15

[7] Oliver Nelles, "Nonlinear System Identification - From Classical Approaches to Neural Net-works and Fuzzy Models", ISBN 3-540-67369-5 Springer-Verlag Berlin Heidelberg NewYork

[8] William D. Penny and Steven J. Roberts, "Error Bars for linear and nonlinear Neural NetworkRegression Models", Neural Systems Research Group, Department of Electrical and Elec-tronic Engineering, Imperial College of Science, Technology and Medicine, London SW72BT., U.K.

[9] Frank E. Harell jr. Regression modeling strategies, "With applications to linear models, lo-gistic regression and survival analysis", Springer Series in Statistics

[10] Murray R. Spiegel, "Statistik", McGraw-Hill Book Company Europe

generating correlation and regression models from high ... · plication of statistical methods for...

Documents