tensor computations: e ciency or productivity?hpac.cs.umu.se/~pauldj/talks/siam-cse19.pdftensor...
TRANSCRIPT
Tensor Computations: Efficiency Or Productivity?
Paolo Bientinesi – Umea Universitet & RWTH Aachen UniversityRasmus Bro – University of Copenhagen
Edoardo Di Napoli – Julich Supercomputing Centre
Lars Karlsson – Umea Universitet
March 1, 2019SIAM Conference on Computational Science and Engineering
Spokane, Washington, USA
Part I
The HPC perspective
Disclaimer: Talk about awareness, not results
2 / 21
Typical HPC “workflow” – not limited to tensor computations
I 1. Target problemI Fixed problem size?I Target architecture? (parallel paradigm)
I 2. Identify building blocks, bottlenecks, hotspots, critical paths, . . .
I 3. Algorithmic & code optimizationsObjectives: speedups, scalability, portability, . . .
I : High exploitation of platform’s potential, time savings, green computing(publications, “more science per hour”, . . . )
I : However, ... often limited integration into actual scientific codes(little funding, not researchy/publishable effort, no credits, . . . )
⇒ Caveat: Gains in the building blocks... often lost at the higher levels
3 / 21
Typical HPC “workflow” – not limited to tensor computations
I 1. Target problemI Fixed problem size?I Target architecture? (parallel paradigm)
I 2. Identify building blocks, bottlenecks, hotspots, critical paths, . . .
I 3. Algorithmic & code optimizationsObjectives: speedups, scalability, portability, . . .
I : High exploitation of platform’s potential, time savings, green computing(publications, “more science per hour”, . . . )
I : However, ... often limited integration into actual scientific codes(little funding, not researchy/publishable effort, no credits, . . . )
⇒ Caveat: Gains in the building blocks... often lost at the higher levels
3 / 21
Typical HPC “workflow” – not limited to tensor computations
I 1. Target problemI Fixed problem size?I Target architecture? (parallel paradigm)
I 2. Identify building blocks, bottlenecks, hotspots, critical paths, . . .
I 3. Algorithmic & code optimizationsObjectives: speedups, scalability, portability, . . .
I : High exploitation of platform’s potential, time savings, green computing(publications, “more science per hour”, . . . )
I : However, ... often limited integration into actual scientific codes(little funding, not researchy/publishable effort, no credits, . . . )
⇒ Caveat: Gains in the building blocks... often lost at the higher levels
3 / 21
Typical HPC “workflow” – not limited to tensor computations
I 1. Target problemI Fixed problem size?I Target architecture? (parallel paradigm)
I 2. Identify building blocks, bottlenecks, hotspots, critical paths, . . .
I 3. Algorithmic & code optimizationsObjectives: speedups, scalability, portability, . . .
I : High exploitation of platform’s potential, time savings, green computing(publications, “more science per hour”, . . . )
I : However, ... often limited integration into actual scientific codes(little funding, not researchy/publishable effort, no credits, . . . )
⇒ Caveat: Gains in the building blocks... often lost at the higher levels
3 / 21
Typical HPC “workflow” – not limited to tensor computations
I 1. Target problemI Fixed problem size?I Target architecture? (parallel paradigm)
I 2. Identify building blocks, bottlenecks, hotspots, critical paths, . . .
I 3. Algorithmic & code optimizationsObjectives: speedups, scalability, portability, . . .
I : High exploitation of platform’s potential, time savings, green computing(publications, “more science per hour”, . . . )
I : However, ... often limited integration into actual scientific codes(little funding, not researchy/publishable effort, no credits, . . . )
⇒ Caveat: Gains in the building blocks... often lost at the higher levels
3 / 21
Typical HPC “workflow” – not limited to tensor computations
I 1. Target problemI Fixed problem size?I Target architecture? (parallel paradigm)
I 2. Identify building blocks, bottlenecks, hotspots, critical paths, . . .
I 3. Algorithmic & code optimizationsObjectives: speedups, scalability, portability, . . .
I : High exploitation of platform’s potential, time savings, green computing(publications, “more science per hour”, . . . )
I : However, ... often limited integration into actual scientific codes(little funding, not researchy/publishable effort, no credits, . . . )
⇒ Caveat: Gains in the building blocks... often lost at the higher levels
3 / 21
Examples (1/2) Genome-Wide Association Studies (GWAS)
I What is it? Data correlation analysis. 2D grid of generalized least squares problems (GLS)
γ µ
α
η
m
SNPs
t Ph-s
p
B
bγ,α
=XTγ
−1
Mα Xγ
−1
XTγ
−1
Mα yα
bγ,η
=XTγ
−1
Mη Xγ
−1
XTγ
−1
Mη yη bµ,η
=XTµ
−1
Mη Xµ
−1
XTµ
−1
Mη yη
4 / 21
Examples (1/2) Genome-Wide Association Studies (GWAS)
I How is it related to tensors? 1D×1D cartesian product of GLSs, 2D output = 4D data
γ µ
α
η
m
SNPs
t Ph-s
p
B
bγ,α
=XTγ
−1
Mα Xγ
−1
XTγ
−1
Mα yα
bγ,η
=XTγ
−1
Mη Xγ
−1
XTγ
−1
Mη yη bµ,η
=XTµ
−1
Mη Xµ
−1
XTµ
−1
Mη yη
4 / 21
Computing Petaflops over Terabytes of Data:The Case of Genome-Wide Association Studies.
ACM TOMS, 2014
γ µ
α
η
m
SNPs
t Ph-s
p
B
bγ,α
=XTγ
−1
Mα Xγ
−1
XTγ
−1
Mα yα
bγ,η
=XTγ
−1
Mη Xγ
−1
XTγ
−1
Mη yη bµ,η
=XTµ
−1
Mη Xµ
−1
XTµ
−1
Mη yη
I Algorithmic improvement: lower computational complexity
I HPC optimizations: asynch I/O, overlap, BLAS-3, parallelism, . . .
I 100x – 1000x speedups! Library available: OmicABEL
I But...I Interface: C vs. RI Data management: data formats, overwriting, multiple filesI Data manipulation: imputation, filtering, selectionI Workflow not as fixed as first understood: M vs no-M, . . .I “The pre-processing is slower than the analysis”
⇒ Performance is important, but not as much as we like to think
5 / 21
Computing Petaflops over Terabytes of Data:The Case of Genome-Wide Association Studies.
ACM TOMS, 2014
γ µ
α
η
m
SNPs
t Ph-s
p
B
bγ,α
=XTγ
−1
Mα Xγ
−1
XTγ
−1
Mα yα
bγ,η
=XTγ
−1
Mη Xγ
−1
XTγ
−1
Mη yη bµ,η
=XTµ
−1
Mη Xµ
−1
XTµ
−1
Mη yη
I Algorithmic improvement: lower computational complexity
I HPC optimizations: asynch I/O, overlap, BLAS-3, parallelism, . . .
I 100x – 1000x speedups! Library available: OmicABEL
I But...I Interface: C vs. RI Data management: data formats, overwriting, multiple filesI Data manipulation: imputation, filtering, selectionI Workflow not as fixed as first understood: M vs no-M, . . .I “The pre-processing is slower than the analysis”
⇒ Performance is important, but not as much as we like to think
5 / 21
Computing Petaflops over Terabytes of Data:The Case of Genome-Wide Association Studies.
ACM TOMS, 2014
γ µ
α
η
m
SNPs
t Ph-s
p
B
bγ,α
=XTγ
−1
Mα Xγ
−1
XTγ
−1
Mα yα
bγ,η
=XTγ
−1
Mη Xγ
−1
XTγ
−1
Mη yη bµ,η
=XTµ
−1
Mη Xµ
−1
XTµ
−1
Mη yη
I Algorithmic improvement: lower computational complexity
I HPC optimizations: asynch I/O, overlap, BLAS-3, parallelism, . . .
I 100x – 1000x speedups! Library available: OmicABEL
I But...
I Interface: C vs. RI Data management: data formats, overwriting, multiple filesI Data manipulation: imputation, filtering, selectionI Workflow not as fixed as first understood: M vs no-M, . . .I “The pre-processing is slower than the analysis”
⇒ Performance is important, but not as much as we like to think
5 / 21
Computing Petaflops over Terabytes of Data:The Case of Genome-Wide Association Studies.
ACM TOMS, 2014
γ µ
α
η
m
SNPs
t Ph-s
p
B
bγ,α
=XTγ
−1
Mα Xγ
−1
XTγ
−1
Mα yα
bγ,η
=XTγ
−1
Mη Xγ
−1
XTγ
−1
Mη yη bµ,η
=XTµ
−1
Mη Xµ
−1
XTµ
−1
Mη yη
I Algorithmic improvement: lower computational complexity
I HPC optimizations: asynch I/O, overlap, BLAS-3, parallelism, . . .
I 100x – 1000x speedups! Library available: OmicABEL
I But...I Interface: C vs. RI Data management: data formats, overwriting, multiple filesI Data manipulation: imputation, filtering, selectionI Workflow not as fixed as first understood: M vs no-M, . . .I “The pre-processing is slower than the analysis”
⇒ Performance is important, but not as much as we like to think
5 / 21
Computing Petaflops over Terabytes of Data:The Case of Genome-Wide Association Studies.
ACM TOMS, 2014
γ µ
α
η
m
SNPs
t Ph-s
p
B
bγ,α
=XTγ
−1
Mα Xγ
−1
XTγ
−1
Mα yα
bγ,η
=XTγ
−1
Mη Xγ
−1
XTγ
−1
Mη yη bµ,η
=XTµ
−1
Mη Xµ
−1
XTµ
−1
Mη yη
I Algorithmic improvement: lower computational complexity
I HPC optimizations: asynch I/O, overlap, BLAS-3, parallelism, . . .
I 100x – 1000x speedups! Library available: OmicABEL
I But...I Interface: C vs. RI Data management: data formats, overwriting, multiple filesI Data manipulation: imputation, filtering, selectionI Workflow not as fixed as first understood: M vs no-M, . . .I “The pre-processing is slower than the analysis”
⇒ Performance is important, but not as much as we like to think
5 / 21
Examples (2/2) High-Performance Tensor Kernels
Paul Springer
I Tensor Transpositions
I Summations
I Tensor Contractions
6 / 21
Examples (2/2) High-Performance Tensor Kernels
Paul Springer
I Tensor Transpositions
Bi1i2...iN ← α · Aπ(i1i2...iN) + β · Bi1i2...iN
I Summations — linear summation over tensor transpositions
Bi0i1i2 ← 2Ai0i1i2 −Ai2i1i0 −Ai0i2i1
Bi0i1i2 ← 4Ai0i1i2 − 2Ai1i0i2 − 2Ai2i1i0 +Ai1i2i0 − 2Ai0i2i1 +Ai2i0i1
Bi0i1i2i3 ← 2Ai0i1i2i3 −Ai2i1i0i3 −Ai0i2i1i3 −Ai0i1i3i2
I Tensor Contractions
CπC(Im∪In) ← α · AπA(Im∪Ik ) × BπB(In∪Ik ) + β · CπC(Im∪In)
6 / 21
Examples (2/2) High-Performance Tensor KernelsPaul Springer
I Tensor Transpositions
TTC: A high-performance Compiler for Tensor Transpositions. ACM TOMS, 2017
Compiler: https://github.com/HPAC/TTC Library: https://github.com/HPAC/hptt
I Summations — linear summation over tensor transpositions
Spin Summations: A High-Performance Perspective. ACM TOMS, 2019
Generator: https://github.com/springer13/spin-summations
I Tensor Contractions
Design of a high-performance GEMM-like Tensor-Tensor Multiplication. ACM TOMS, 2018
Compiler: https://github.com/HPAC/tccg Library: https://github.com/springer13/tcl
6 / 21
But...
I “Wrong” level of abstraction for domain scientists
I Kernels: good for developers – too low level for most end users
I Mismatch → mapping problem
7 / 21
But...
I “Wrong” level of abstraction for domain scientists
I Kernels: good for developers – too low level for most end users
I Mismatch → mapping problem
7 / 21
But...
I “Wrong” level of abstraction for domain scientists
I Kernels: good for developers – too low level for most end users
I Mismatch → mapping problem
7 / 21
But...
I “Wrong” level of abstraction for domain scientists
I Kernels: good for developers – too low level for most end users
I Mismatch → mapping problem
7 / 21
2D case: “Right” level of abstraction
Generalized Least Squares b := (XTM−1X )−1XTM−1y n > m; M ∈ Rn×n, SPD; X ∈ Rn×m; y ∈ Rn×1
Signal Processing x :=(A−TBTBA−1 + RTLR
)−1A−TBTBA−1y
Kalman Filter Kk := PbkH
T (HPbkH
T + R)−1; xak := xbk + Kk (zk − Hxbk ); Pak := (I − KKH)Pb
k
Ensemble Kalman Filter X a := X b +(B−1 + HTR−1H
)−1 (Y − HX b
)Image Restoration xk := (HTH + λσ2In)−1(HT y + λσ2(vk−1 − uk−1))
Rand. Matrix Inversion Xk+1 := S(STAS)−1ST + (In − S(STAS)−1STA)Xk (In − AS(STAS)−1ST )
Stochastic Newton Bk := kk−1
Bk−1(In − ATWk ((k − 1)Il + WTk ABk−1A
TWk )−1WTk ABk−1)
Optimization xf := WAT (AWAT )−1(b − Ax); xo := W (AT (AWAT )−1Ax − c)
Tikhonov Regularization x := (ATA + ΓT Γ)−1ATb A ∈ Rn×m; Γ ∈ Rm×m; b ∈ Rn×1
Gen. Tikhonov Reg. x := (ATPA + Q)−1(ATPb + Qx0) P ∈ Rn×n, SSPD; Q ∈ Rm×m, SSPD; x0 ∈ Rm×1
LMMSE estimator Kt+1 := CtAT (ACtAT + Cz )−1; xt+1 := xt + Kt+1(y − Axt); Ct+1 := (I − Kt+1A)Ct
8 / 21
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .9 / 21
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
y := αx + y LU = A · · · C := αAB + βC
X := A−1B C := ABT + BAT + C X := L−1ML−T QR = A
BLAS . . . LAPACK . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .
LINEARALGEBRAMAPPINGPROBLEM
(LAMP)
10 / 21
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
y := αx + y LU = A · · · C := αAB + βC
X := A−1B C := ABT + BAT + C X := L−1ML−T QR = A
BLAS . . . LAPACK . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .
LINEARALGEBRAMAPPINGPROBLEM
(LAMP)
10 / 21
x := A(BTB + ATRTΛRA)−1BTBA−1y
{C† := PCPT + Q
K := C†HT (HC†H
T )−1
E := Q−1U(I + UTQ−1U)−1UT . . .
y := αx + y LU = A · · · C := αAB + βC
X := A−1B C := ABT + BAT + C X := L−1ML−T QR = A
BLAS . . . LAPACK . . .
MUL ADD MOV
MOVAPD
VFMADDPD . . .
LINEARALGEBRAMAPPINGPROBLEM
(LAMP)
10 / 21
LAMP: A problem often ignoredLinnea: A compiler for linear algebra – Henrik Barthels, Christos Psarras
Linnea’s speedups
Jl: Julia, Arma: Armadillo, Eig: Eigen, Mat: Matlab. n/r: naive/recommended implementation11 / 21
Tensor App #1 Tensor App #2 . . . Tensor App #N
MUL ADD MOV
MOVAPD
VFMADDPD . . .
12 / 21
Tensor App #1 Tensor App #2 . . . Tensor App #N
Transposition Contraction · · · Alternating LS
Khatri-Rao SpMTTKRP · · · TTV, TTM
HPTT TCL . . . BLAS
MUL ADD MOV
MOVAPD
VFMADDPD . . .
13 / 21
nD case: Exemplary applicationsCoupled-Cluster methods
Finite Element 3D diffusion operator
credits to D. Matthews, E. Solomonik, J. Stanton, and J. Gauss
credits to A. Fisher – https://github.com/LLNL/acrotensor
14 / 21
nD case: Exemplary applicationsCoupled-Cluster methods Finite Element 3D diffusion operator
credits to D. Matthews, E. Solomonik, J. Stanton, and J. Gauss credits to A. Fisher – https://github.com/LLNL/acrotensor
14 / 21
Awareness
I Performance:Speedups in building blocks do not always translate to speedups in applications
I Research Problem: Mapping onto building blocks
I Solved by hand → loss of productivity (possibly loss of efficiency too)I Solved automatically → loss of efficiency
Beware: It’s challenging even for “simple” matrix computations!
15 / 21
Awareness
I Performance:Speedups in building blocks do not always translate to speedups in applications
I Research Problem: Mapping onto building blocks
I Solved by hand → loss of productivity (possibly loss of efficiency too)I Solved automatically → loss of efficiency
Beware: It’s challenging even for “simple” matrix computations!
15 / 21
Part II
The computational scientists’ perspective
“The fastest FLOPS are those that are not executed.”– Lars
16 / 21
Chromatography-MS
PARAFAC Tucker PARAFAC2
Transposition Contraction · · · Alternating LS
Khatri-Rao SpMTTKRP · · · TTV, TTM
HPTT TCL . . . BLAS
MUL ADD MOV
MOVAPD
VFMADDPD . . .17 / 21
Example application: Untargeted chemical profilingChromatography with mass spectrometry detection
I Problem: Identify components in a sample
Jessica Torres – Bitesize Bio
18 / 21
Example application: Untargeted chemical profilingChromatography with mass spectrometry detection
I 3-way data: Mass-spectrum × elution time × sample
Rasmus Bro
18 / 21
Example application: Untargeted chemical profilingChromatography with mass spectrometry detection
I 3-way tensor → Individual components
→
Rasmus Bro
18 / 21
Workflow
I 1. (Pre-processing: alignment)
I 2. Choose time intervals across samples
I 3. (Data preparation: remove background)
I 4. Fit model: 1–15 components; if needed: non-negativity constraintsPARAFAC — PARAFAC2 — TUCKER
I 5. Determine whether or not one of the models is “right”
I : Determine which of the components represent chemical information
I : Start over; add/change constraints, change model
19 / 21
Workflow
I 1. (Pre-processing: alignment)
I 2. Choose time intervals across samples
I 3. (Data preparation: remove background)
I 4. Fit model: 1–15 components; if needed: non-negativity constraintsPARAFAC — PARAFAC2 — TUCKER
I 5. Determine whether or not one of the models is “right”
I : Determine which of the components represent chemical information
I : Start over; add/change constraints, change model
19 / 21
Workflow
I 1. (Pre-processing: alignment)
I 2. Choose time intervals across samples
I 3. (Data preparation: remove background)
I 4. Fit model: 1–15 components; if needed: non-negativity constraintsPARAFAC — PARAFAC2 — TUCKER
I 5. Determine whether or not one of the models is “right”
I : Determine which of the components represent chemical information
I : Start over; add/change constraints, change model
19 / 21
Workflow
I 1. (Pre-processing: alignment)
I 2. Choose time intervals across samples
I 3. (Data preparation: remove background)
I 4. Fit model: 1–15 components; if needed: non-negativity constraintsPARAFAC — PARAFAC2 — TUCKER
I 5. Determine whether or not one of the models is “right”
I : Determine which of the components represent chemical information
I : Start over; add/change constraints, change model
19 / 21
Awareness & ConclusionsI Performance:
Speedups in building blocks do not always translate to speedups in applications
I Research Problem: Mapping onto building blocksI Solved by hand → loss of productivity (possibly loss of efficiency too)I Solved automatically → loss of efficiency
Beware: It’s challenging even for “simple” matrix computations!
I Application as a wholeI Massive redundancy! Careful with black-box librariesI Man-in-the-middle workflow. Manual “check & decide”
I Gap: Domain scientists’ needs ↔ Computer scientists’ artifacts
I Problem: How to maximize scientific output?I Speedups in algorithms and building blocksI Efficient mappingI Reduction in computation in the application
Thank you
Questions?
20 / 21
Awareness & ConclusionsI Performance:
Speedups in building blocks do not always translate to speedups in applications
I Research Problem: Mapping onto building blocksI Solved by hand → loss of productivity (possibly loss of efficiency too)I Solved automatically → loss of efficiency
Beware: It’s challenging even for “simple” matrix computations!
I Application as a wholeI Massive redundancy! Careful with black-box librariesI Man-in-the-middle workflow. Manual “check & decide”
I Gap: Domain scientists’ needs ↔ Computer scientists’ artifacts
I Problem: How to maximize scientific output?I Speedups in algorithms and building blocksI Efficient mappingI Reduction in computation in the application
Thank you
Questions?
20 / 21
Awareness & ConclusionsI Performance:
Speedups in building blocks do not always translate to speedups in applications
I Research Problem: Mapping onto building blocksI Solved by hand → loss of productivity (possibly loss of efficiency too)I Solved automatically → loss of efficiency
Beware: It’s challenging even for “simple” matrix computations!
I Application as a wholeI Massive redundancy! Careful with black-box librariesI Man-in-the-middle workflow. Manual “check & decide”
I Gap: Domain scientists’ needs ↔ Computer scientists’ artifacts
I Problem: How to maximize scientific output?I Speedups in algorithms and building blocksI Efficient mappingI Reduction in computation in the application
Thank you
Questions?
20 / 21
Awareness & ConclusionsI Performance:
Speedups in building blocks do not always translate to speedups in applications
I Research Problem: Mapping onto building blocksI Solved by hand → loss of productivity (possibly loss of efficiency too)I Solved automatically → loss of efficiency
Beware: It’s challenging even for “simple” matrix computations!
I Application as a wholeI Massive redundancy! Careful with black-box librariesI Man-in-the-middle workflow. Manual “check & decide”
I Gap: Domain scientists’ needs ↔ Computer scientists’ artifacts
I Problem: How to maximize scientific output?I Speedups in algorithms and building blocksI Efficient mappingI Reduction in computation in the application
Thank you
Questions?
20 / 21
Awareness & ConclusionsI Performance:
Speedups in building blocks do not always translate to speedups in applications
I Research Problem: Mapping onto building blocksI Solved by hand → loss of productivity (possibly loss of efficiency too)I Solved automatically → loss of efficiency
Beware: It’s challenging even for “simple” matrix computations!
I Application as a wholeI Massive redundancy! Careful with black-box librariesI Man-in-the-middle workflow. Manual “check & decide”
I Gap: Domain scientists’ needs ↔ Computer scientists’ artifacts
I Problem: How to maximize scientific output?I Speedups in algorithms and building blocksI Efficient mappingI Reduction in computation in the application
Thank you
Questions?
20 / 21
I Spin Summations: A High-Performance Perspective,P. Springer, D. Matthews and P. Bientinesi.ACM Transactions on Mathematical Software, 2018. Accepted.
I Design of a High-Performance GEMM-like Tensor-Tensor Multiplication,P. Springer and P. Bientinesi.ACM Transactions on Mathematical Software, Vol. 44(3), 28:1–28:29, January 2018.
I The Generalized Matrix Chain Algorithm,H. Barthels, M. Copik and P. Bientinesi.Proceedings of the International Symposium on Code Generation and Optimization, February 2018.
I TTC: A High-Performance Compiler for Tensor Transpositions,P. Springer, J. Hammond and P. Bientinesi.ACM Transactions on Mathematical Software, Vol. 44(2), 15:1–15:21, August 2017.
I Gas chromatography – mass spectrometry data processing made easy,L. Johnsena, P. Skoub, B. Khakimovb, R. Bro.Journal of Chromatography A, 1503 (2017) 57–64.
I Computing Petaflops over Terabytes of Data: The Case of Genome-Wide Association Studies,D. Fabregat-Traver and P. Bientinesi.ACM Transactions on Mathematical Software, Vol. 40(4), 27:1–27:22, June 2014.
I Towards an Efficient Use of the BLAS Library for Multilinear Tensor Contractions,E. Di Napoli, D. Fabregat-Traver, G. Quintana-Orti and P. Bientinesi.Applied Mathematics and Computation, Vol. 235, 454–468, May 2014.
I Solving GC-MS problems with PARAFAC2,J. Amigo, T. Skov, J. Coello, S. Maspoch, R. Bro.Trends in Analytical Chemistry, Vol. 27, No. 8, 2008.
21 / 21