1. introduction to multiway analysis

1

1. Introduction to multiway analysis

Quimiometria Teórica e Aplicada

Instituto de Química - UNICAMP

2

Why build models of chemical data?Why build models of chemical data?

• Data exploration– e.g. find important sources of variation in complex environmental

samples

• Compound identification and calibration in mixtures– e.g. identification and quantification of pollutants in river water

• Statistical process control– e.g. detect disturbances in product quality

• Models are useful approximations of reality– first-principles models are based on chemical/physical knowledge –

do they fit well with the measured data?

– empirical models (e.g. PCA, PLS) are purely mathematical – do they have a chemical meaning?

3

Multiway dataMultiway data

• Multiway data is becoming more common in chemistry. Examples are

– Chromatography

sample number elution time wavelength

– On-line analysis

experiment number time wavelength/temperature/pressure

– Tandem mass spectroscopy (MS-MS)

sample number parent ion mass daughter ion mass

– Image analysis

experiment number time x-position y-position

4

Multiway data – an exampleMultiway data – an example

• Batch process data:

process variable

time

ba

tch

time

One batch A series of batches X (J K) X (I J K)

process variable

5

Multiway modellingMultiway modelling

• The PARAFAC (or ‘CANDECOMP’) and Tucker models were developed by psychometricians 30 years ago, but are especially useful in chemistry, because chemical data often has a multilinear structure.

• PARAFAC and Tucker are different generalizations of PCA for higher-order data.

• There also exist generalizations of PLS for higher-order data, e.g. N-PLS.

6

Two-way modellingTwo-way modelling

• Two-way data can be modelled using bilinear models:

X

process variable

time

EPT

T

+=

(1) X = TPT + E PCA

} These models give the same residuals, E

(2) X = USVT + E SVD

VT

U

S

(3) X = AGBT + E TMCA

BT

A

G

7

Multiway models - PARAFACMultiway models - PARAFAC

X

process variable

time

ba

tch

EBT

CT

A

+=

EBCAX T

• Multiway data can be modelled using multilinear models, such as the PARAFAC model...

8

Multiway models - TuckerMultiway models - Tucker

X

process variable

time

ba

tch

EBT

CT

A

G +=

EBCAGX T

core array

• ...or the Tucker model:

9

UnfoldingUnfolding

X

J

KI

X1 ... XI

I

JK

• But if a multiway structure exists in the data, multiway methods have some important advantages!!

XIJK

• Another option is to matricize (or ‘unfold’) the data and use standard two-way methods:

• Can also unfold along other modes: XJKI and XKIJ

10

Advantages of multiwayAdvantages of multiway

• Multiway models use one set of loadings for each mode – results are much easier to plot and understand.

• Multiway models need fewer model parameters to describe the data, e.g. a three-component model of X (30 800 200) uses– 540090 parameters for unfold-PCA

– 3090 parameters PARAFAC

• PARAFAC is more parsimonious than unfold-PCA.

11

Disadvantages of multiwayDisadvantages of multiway

• PARAFAC and Tucker models are usually calculated using a technique called ‘alternating least squares’ (ALS).

• This is sometimes slow...

...and sometimes gives convergence problems if an inappropriate model is used.

• However, ALS algorithms are easy to understand and there is now some high-quality, free MATLAB code available on the internet:• The N-way Toolbox (Andersson & Bro, http://www.models.kvl.dk)

12

ConclusionsConclusions

• PARAFAC and Tucker are both generalizations of the PCA model for multiway data.

• PARAFAC and Tucker models use fewer parameters and are easier to interpret than unfold-PCA.

• Models can be calculated in MATLAB using ‘N-way Toolbox’ (or ‘PLS_Toolbox’)

1. introduction to multiway analysis

Documents

models of chemical data

tucker models

multiway models tucker

multiway datamultiway

data exploratione

measured data

multilinear models

principles models