1. introduction to multiway analysis
DESCRIPTION
1. Introduction to multiway analysis. Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP. Why build models of chemical data?. Data exploration e.g. find important sources of variation in complex environmental samples Compound identification and calibration in mixtures - PowerPoint PPT PresentationTRANSCRIPT
1
1. Introduction to multiway analysis
Quimiometria Teórica e Aplicada
Instituto de Química - UNICAMP
2
Why build models of chemical data?Why build models of chemical data?
• Data exploration– e.g. find important sources of variation in complex environmental
samples
• Compound identification and calibration in mixtures– e.g. identification and quantification of pollutants in river water
• Statistical process control– e.g. detect disturbances in product quality
• Models are useful approximations of reality– first-principles models are based on chemical/physical knowledge –
do they fit well with the measured data?
– empirical models (e.g. PCA, PLS) are purely mathematical – do they have a chemical meaning?
3
Multiway dataMultiway data
• Multiway data is becoming more common in chemistry. Examples are
– Chromatography
sample number elution time wavelength
– On-line analysis
experiment number time wavelength/temperature/pressure
– Tandem mass spectroscopy (MS-MS)
sample number parent ion mass daughter ion mass
– Image analysis
experiment number time x-position y-position
4
Multiway data – an exampleMultiway data – an example
• Batch process data:
process variable
time
ba
tch
time
One batch A series of batches X (J K) X (I J K)
process variable
5
Multiway modellingMultiway modelling
• The PARAFAC (or ‘CANDECOMP’) and Tucker models were developed by psychometricians 30 years ago, but are especially useful in chemistry, because chemical data often has a multilinear structure.
• PARAFAC and Tucker are different generalizations of PCA for higher-order data.
• There also exist generalizations of PLS for higher-order data, e.g. N-PLS.
6
Two-way modellingTwo-way modelling
• Two-way data can be modelled using bilinear models:
X
process variable
time
EPT
T
+=
(1) X = TPT + E PCA
} These models give the same residuals, E
(2) X = USVT + E SVD
VT
U
S
(3) X = AGBT + E TMCA
BT
A
G
7
Multiway models - PARAFACMultiway models - PARAFAC
X
process variable
time
ba
tch
EBT
CT
A
+=
EBCAX T
• Multiway data can be modelled using multilinear models, such as the PARAFAC model...
8
Multiway models - TuckerMultiway models - Tucker
X
process variable
time
ba
tch
EBT
CT
A
G +=
EBCAGX T
core array
• ...or the Tucker model:
9
UnfoldingUnfolding
X
J
KI
X1 ... XI
I
JK
• But if a multiway structure exists in the data, multiway methods have some important advantages!!
XIJK
• Another option is to matricize (or ‘unfold’) the data and use standard two-way methods:
• Can also unfold along other modes: XJKI and XKIJ
10
Advantages of multiwayAdvantages of multiway
• Multiway models use one set of loadings for each mode – results are much easier to plot and understand.
• Multiway models need fewer model parameters to describe the data, e.g. a three-component model of X (30 800 200) uses– 540090 parameters for unfold-PCA
– 3090 parameters PARAFAC
• PARAFAC is more parsimonious than unfold-PCA.
11
Disadvantages of multiwayDisadvantages of multiway
• PARAFAC and Tucker models are usually calculated using a technique called ‘alternating least squares’ (ALS).
• This is sometimes slow...
...and sometimes gives convergence problems if an inappropriate model is used.
• However, ALS algorithms are easy to understand and there is now some high-quality, free MATLAB code available on the internet:• The N-way Toolbox (Andersson & Bro, http://www.models.kvl.dk)
12
ConclusionsConclusions
• PARAFAC and Tucker are both generalizations of the PCA model for multiway data.
• PARAFAC and Tucker models use fewer parameters and are easier to interpret than unfold-PCA.
• Models can be calculated in MATLAB using ‘N-way Toolbox’ (or ‘PLS_Toolbox’)