shallow learning with kernels for dictionary-free magnetic...
TRANSCRIPT
![Page 1: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/1.jpg)
Shallow Learning with Kernels
for Dictionary-Free Magnetic Resonance Fingerprinting
Gopal Nataraj∗, Mingjie Gao∗, Jakob Asslander†, Clayton Scott∗, & Jeffrey A. Fessler∗
ISMRM Workshop on Magnetic Resonance Fingerprinting
∗Dept. of Electrical Engineering and Computer Science, University of Michigan†Center for Biomedical Imaging, NYU School of Medicine
![Page 2: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/2.jpg)
Problem Statement
Given: at every voxel, measurement vector y = s(x) + ǫ
-1
4
a.u
.
MRF “component” images (more later...)
y x(y)x(·)
Task: design fast voxel-by-voxel estimator x(·)
that scales well with #unknowns per voxel, L2
![Page 3: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/3.jpg)
Problem Statement
Given: at every voxel, measurement vector y = s(x) + ǫ
-1
4
a.u
.
MRF “component” images (more later...)
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
y x(y)x(·)
Task: design fast voxel-by-voxel estimator x(·)
that scales well with #unknowns per voxel, L2
![Page 4: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/4.jpg)
Machine Learning at Different “Depths” for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
Deep Learning
• promising for QMRI [Cohen et al., 2017, Virtue et al., 2017]
• needs many training points to avoid overfitting
• trained via non-convex optimization
• limited theoretical basis
3
![Page 5: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/5.jpg)
Machine Learning at Different “Depths” for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
Deep Learning
• promising for QMRI [Cohen et al., 2017, Virtue et al., 2017]
• needs many training points to avoid overfitting
• trained via non-convex optimization
• limited theoretical basis
3
![Page 6: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/6.jpg)
Machine Learning at Different “Depths” for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
Deep Learning
• promising for QMRI [Cohen et al., 2017, Virtue et al., 2017]
• needs many training points to avoid overfitting
• trained via non-convex optimization
• limited theoretical basis
Shallow Learning
• simpler structure needs fewer training points
• fast training via convex optimization
3
![Page 7: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/7.jpg)
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
gl∈Gbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2+ρl‖gl‖
2G
(1)
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
![Page 8: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/8.jpg)
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
glbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2
ill-posed!
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
![Page 9: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/9.jpg)
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
glbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2
ill-posed!
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
![Page 10: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/10.jpg)
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
gl∈Gbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2+ρl‖gl‖
2G
(1)
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
![Page 11: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/11.jpg)
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
gl∈Gbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2+ρl‖gl‖
2G
(1)
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
![Page 12: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/12.jpg)
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
5
![Page 13: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/13.jpg)
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
![Page 14: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/14.jpg)
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
-1
4
a.u
.
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
![Page 15: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/15.jpg)
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
-1
4
a.u
.
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
![Page 16: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/16.jpg)
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
-1
4
a.u
.
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
![Page 17: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/17.jpg)
In vivo results
kx
ky
...
kx
ky
flip 1
...
flip 840
Dictionary-based Grid Search Dictionary-Free PERK
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
6
![Page 18: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/18.jpg)
In vivo results
kx
ky
...
kx
ky
flip 1
...
flip 840
Dictionary-based Grid Search Dictionary-Free PERK
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
6
![Page 19: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/19.jpg)
In vivo results
kx
ky
...
kx
ky
flip 1
...
flip 840
Dictionary-based Grid Search Dictionary-Free PERK
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
28s/slice 4s train; 0.2s/slice test50 slices ∼1400s 4s train; ∼10s test 6
![Page 20: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/20.jpg)
In vivo results
kx
ky
...
kx
ky
flip 1
...
flip 840
Dictionary-based Grid Search Dictionary-Free PERK
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
28s/slice 4s train; 0.2s/slice test50 slices ∼1400s 4s train; ∼10s test 6
![Page 21: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/21.jpg)
In vivo results
kx
ky
...
kx
ky
flip 1
...
flip 840
Dictionary-based Grid Search Dictionary-Free PERK
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
28s/slice 4s train; 0.2s/slice test50 slices ∼1400s 4s train; ∼10s test 6
![Page 22: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/22.jpg)
Summary
Contributions:
• PERK: fast, dictionary-free ML method for QMRI [Nataraj et al., 2017b]
• demonstrated PERK for in vivo MRF T1,T2 estimation
Future Work:
• QMRI problems involving more unknowns for which we expect
orders-of-magnitude computational gains [Nataraj et al., 2017a, #5076]
• comparison with other ML methods, e.g. deep learning
7
![Page 23: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/23.jpg)
Summary
Contributions:
• PERK: fast, dictionary-free ML method for QMRI [Nataraj et al., 2017b]
• demonstrated PERK for in vivo MRF T1,T2 estimation
Future Work:
• QMRI problems involving more unknowns for which we expect
orders-of-magnitude computational gains [Nataraj et al., 2017a, #5076]
• comparison with other ML methods, e.g. deep learning
7
![Page 24: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/24.jpg)
Summary
Contributions:
• PERK: fast, dictionary-free ML method for QMRI [Nataraj et al., 2017b]
• demonstrated PERK for in vivo MRF T1,T2 estimation
Future Work:
• QMRI problems involving more unknowns for which we expect
orders-of-magnitude computational gains [Nataraj et al., 2017a, #5076]
• comparison with other ML methods, e.g. deep learning
7
![Page 25: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/25.jpg)
References i
Asslander, J., Lattanzi, R., Sodickson, D. K., and Cloos, M. A. (2017).
Relaxation in spherical coordinates: analysis and optimization of pseudo-SSFP based
MR-fingerprinting.
arxiv 1703.00481.
Cohen, O., Zhu, B., and Rosen, M. (2017).
Deep learning for fast MR fingerprinting reconstruction.
In Proc. Intl. Soc. Mag. Res. Med., page 0688.
Nataraj, G., Nielsen, J.-F., and Fessler, J. A. (2017a).
Myelin water fraction estimation from optimized steady-state sequences using kernel ridge
regression.
In Proc. Intl. Soc. Mag. Res. Med., page 5076.
8
![Page 26: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/26.jpg)
References ii
Nataraj, G., Nielsen, J.-F., Scott, C., and Fessler, J. A. (2017b).
Dictionary-free MRI PERK: Parameter estimation via regression with kernels.
IEEE Trans. Med. Imag.
Submitted.
Rahimi, A. and Recht, B. (2007).
Random features for large-scale kernel machines.
In NIPS.
Scholkopf, B., Herbrich, R., and Smola, A. J. (2001).
A generalized representer theorem.
In Proc. Computational Learning Theory (COLT), pages 416–426.
LNCS 2111.
9
![Page 27: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/27.jpg)
References iii
Virtue, P., Yu, S. X., and Lustig, M. (2017).
Better than real: Complex-valued neural nets for MRI fingerprinting.
In Proc. IEEE Intl. Conf. on Image Processing.
To appear.
10
![Page 28: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/28.jpg)
PERK solution
Closed-form solution for each l ∈ {1, . . . , L}:
xl(·) = xTl
(1
N1N +M(MKM+ Nρl IN)
−1
(k(·)−
1
NK1N
))(2)
• xl := [xl ,1, . . . , xl ,N ]T training point regressands
• K :=
k(y1, y1) · · · k(y1, yN)
.... . .
...
k(yN , y1) · · · k(yN , yN)
Gram matrix
• M := IN − 1N1N1
TN de-meaning operator
• k(·) := [k(·, y1), . . . , k(·, yN )]T nonlinear kernel embedding
Can we scale computation with L more gracefully?
• Yes, in fact (2) separable in l ∈ {1, . . . , L} by construction
• However, explicitly computing K may be undesirable... 11
![Page 29: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/29.jpg)
PERK solution
Closed-form solution for each l ∈ {1, . . . , L}:
xl(·) = xTl
(1
N1N +M(MKM+ Nρl IN)
−1
(k(·)−
1
NK1N
))(2)
• xl := [xl ,1, . . . , xl ,N ]T training point regressands
• K :=
k(y1, y1) · · · k(y1, yN)
.... . .
...
k(yN , y1) · · · k(yN , yN)
Gram matrix
• M := IN − 1N1N1
TN de-meaning operator
• k(·) := [k(·, y1), . . . , k(·, yN )]T nonlinear kernel embedding
Can we scale computation with L more gracefully?
• Yes, in fact (2) separable in l ∈ {1, . . . , L} by construction
• However, explicitly computing K may be undesirable... 11
![Page 30: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/30.jpg)
Backup: PERK as High-Dimensional Bayesian Linear Regression
Suppose there exists“approximate feature mapping” z : Y 7→ RZ
such that Z := [z(y1), . . . , z(yN)] has for dim(Y) ≪ Z ≪ N
K ≈ ZTZ. (3)
Plugging (3) into PERK solution (2) and rearranging gives
xl(·) ≈1
NxTl 1N +
1
NxTl MZT
(1
NZMZT + ρl IZ
)−1(z(·)−
1
NZ1N
)
Does such a z exist and work well in practice?
• Yes, e.g. for kernels of form k(y, y′) ≡ k(y − y′) [Rahimi and Recht, 2007]
• In such cases, can reduce from ∼N2 to ∼NZ computations
12
![Page 31: Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting](https://reader033.vdocument.in/reader033/viewer/2022051904/5ff69e691f842249760704eb/html5/thumbnails/31.jpg)
Backup: PERK as High-Dimensional Bayesian Linear Regression
Suppose there exists“approximate feature mapping” z : Y 7→ RZ
such that Z := [z(y1), . . . , z(yN)] has for dim(Y) ≪ Z ≪ N
K ≈ ZTZ. (3)
Plugging (3) into KRR solution (2) and rearranging gives
xl(·) ≈ mxl + cTxl z
(Czz + ρl IZ
)−1
(z(·)− mz) (4)
which is regularized (“ridge”) Z -dimensional affine regression!
Does such a z exist and work well in practice?
• Yes, e.g. for kernels of form k(y, y′) ≡ k(y − y′) [Rahimi and Recht, 2007]
• In such cases, can reduce from ∼N2 to ∼NZ computations
12