dujh0d ujlq+ 00v iru6 shhfk5 hfrjqlwlrqhj/talks/msr_talk.pdf · prof. hui jiang department of...
TRANSCRIPT
![Page 1: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/1.jpg)
Prof. Hui Jiang
Department of Computer Science and Engineering
York University, Toronto, Ont. M3J 1P3, CANADA
Email: [email protected]
/DUJH�0DUJLQ�+00V�IRU�6SHHFK�5HFRJQLWLRQ
(This is a joint work with Xinwei Li, Chao-Jun Liu )
![Page 2: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/2.jpg)
2XWOLQHx Background
– Discriminative Training for ASR
– Large Margin Classifiers: concept & theory
x Large Margin HMMs for ASR– A new estimation criterion for HMM
– Analysis of Margin in CDHMMs
– Large Margin Estimation (LME): Constrained minimax optimization
x Optimization Methods– Gradient Descent (GD) search
– Semi-definite Programming (SDP)
x Experiments– The ISOLET recognition task
– The TIDIGITS connected digit string recognition
x Summary
![Page 3: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/3.jpg)
$XWRPDWLF�6SHHFK�5HFRJQLWLRQ�x Statistical Speech Model
1
a11
2
a22
3
a33
a12 a23
ObservationSequence
x1 x2 x3 x4 x5
)( 11 OT )( 21 OT )( 32 OT )( 43 OT )( 53 OT
X =
WX
XW NoisyChnnel
WordSequence
SpeechSignal
ChannelDecoding
SpeechSignal
WordSequence
x Bayesian Decision Rule:
Language Model Acoustic Model
Discriminant Function
)F(X|WXpWPXWpW WWWW
/ � :�:�:�
maxarg)|()(maxarg)|(maxargˆ
![Page 4: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/4.jpg)
+00�(VWLPDWLRQ�0HWKRGVx Maximum Likelihood Estimation (MLE)
– The Baum-Welch algorithm: the EM algorithm for HMM
x Discriminative Training (DT)– Maximum Mutual Information Estimation (MMIE):
• MPE, MWE, etc. – Minimum Classification Error (MCE):
x Discriminative training can improve over the standa rd ML training.
x Can we do better than DT?
![Page 5: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/5.jpg)
/DUJH�0DUJLQ�&ODVVLILHU�6XSSRUW�9HFWRU�0DFKLQH��690�
larger margin
![Page 6: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/6.jpg)
/DUJH�0DUJLQ�&ODVVLILHUVx Why large margin classifiers yield better
generalization performance?
x Conceptually, large margin Ζ Robustness w.r.t. data patterns
– Robustness w.r.t. classifier parameters
x Theoretically …
![Page 7: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/7.jpg)
6WDWLVWLFDO�/HDUQLQJ�7KHRU\x In pattern classification, the generalization upper bound
holds with probability 1- (Vapnik et. al. ):
¹̧·
©̈§ ���d )
4log()1
2(log
1)()(
GTTV
NV
NRR emp
TestErrorRate
Training ErrorRate
VC Confidence
N: size of training set V: VC dimension
![Page 8: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/8.jpg)
6WDWLVWLFDO�/HDUQLQJ�7KHRU\x In pattern classification, the generalization upper bound
holds with probability 1- (Vapnik et. al. ):
¸̧¹·¨̈©
§¹̧·
©̈§��d GTT 1
log)/(log
)()(2
2
d
VNV
N
CRR d
TestError
MarginError
VC Confidence
V: VC dimensiond: marginN: size of training setC: universal constant
![Page 9: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/9.jpg)
+RZ�DERXW�XVLQJ�690�IRU�6SHHFK�5HFRJQLWLRQ"
x Done in some simple ASR tasks:– phoneme recognition; speaker recognition– small vocabulary isolated speech recognition
x Hard to extend to large-scale continuous speech rec ognition.
x No significant improvement is reported.– still not a main-stream method
x Why? – SVM: binary, static classifier.– Lack of a proper kernel function to map speech samp les from
one dynamic high-dimension space to another high-di mension space, which is suitable for linear classifiers.
![Page 10: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/10.jpg)
/DUJH�0DUJLQ�+00�EDVHG�&ODVVLILHU
model 1 model 2
separation boundary F(X| 1)-F(X| 2)=0
![Page 11: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/11.jpg)
/DUJH�0DUJLQ�+00�EDVHG�&ODVVLILHU
original separation boundary F(X| 1)-F(X| 2)=0
1
’1
2
’2
new separation boundary F(X| ’1)-F(X| ’2)=0
![Page 12: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/12.jpg)
+RZ�WR�GHILQH�VHSDUDWLRQ�PDUJLQ" ���
x In 2-class separable problem:
– For a data token, x1, of class 1
– For a data token, x2, of class 2
)|F(x)|F(xxd 21111)( �
)|F(x)|F(xxd 12222)( �
> 0
> 0
![Page 13: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/13.jpg)
+RZ�WR�GHILQH�VHSDUDWLRQ�PDUJLQ" ���
x Extend to multiple-class problem:
– N classes 1, 2, …, N,
– For a data token, x i, of class i
> @)|F(x)|F(x
)|F(x)|F(xxd
jiiiij
jiij
iii
� �
z
zmin
max)(
![Page 14: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/14.jpg)
/DUJH�0DUJLQ�(VWLPDWLRQ�RI�+00Vx An N-class problem: each class is represented by an HMM
x Given a training set DD, define a subset, called support token set SS, as:
x Large-Margin Estimation ( LME) of HMMs:
},,,{ 21 N/// �
})(0 and |{ Hdd� iii XdDXXS
0))( all o(subject t)(minmaxargˆ ! � iiSX
XdXdi
![Page 15: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/15.jpg)
/DUJH�0DUJLQ�(VWLPDWLRQ�RI�+00Vx Convert to a minimax optimization problem.
x Assume Xi belongs to class i:
> @)|F(X)|F(X iijiijSX i
� z� ,maxminargˆ
. and allfor
0
:sconstraint subject to
ijSX
)|F(X)|F(X
i
iiji
z���
![Page 16: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/16.jpg)
$QDO\VLV�RI�0DUJLQV�LQ�&'+00x The margin in CDHMM is unbounded without additional
constraints.
x Adjust CDHMM parameters in certain way to increase the margin unlimitedly.
x Adopt Viterbi approximation:
![Page 17: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/17.jpg)
$QDO\VLV�RI�0DUJLQV�LQ�&'+00
x Each dimension: independent
x Linear: same variance
x Quadratic: different variances
/LQHDU
4XDGUDWLF
![Page 18: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/18.jpg)
0DUJLQ��/LQHDU�'LPHQVLRQV
0 5 10 15-0.05
0
0.05
0.1
0.15
0.2
C1 C2
x o
÷ø
öçè
æ +-
-=
2
12
2
12mm
smm
xd
![Page 19: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/19.jpg)
0DUJLQ��4XDGUDWLF�'LPHQVLRQV
0 5 10 15−0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
C2
C1
x o o x
X4 X1
X2 d2(X1)
−d1(X2)
![Page 20: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/20.jpg)
&RQVWUDLQWV�LQ�/0(�RI�&'+00x Impose constraints to make LME solvable:
± Linear part: fix the norm of the slope to a constan t
± Quadratic part: constrain the vertex to a range
2
1
21
1
)|,( ij
R
t DditdijW gXR
t
i // ¦ ¦
�
2
1
2)0(2
2
)()|,( ij
R
t DditditdijW GXR
t
id� // ¦ ¦
�
![Page 21: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/21.jpg)
/0(��FRQVWUDLQHG�PLQLPD[�RSWLPL]DWLRQ
x Large Margin Estimation (LME) of CDHMM Î aconstrainted minimiax optimization problem> @)|F(X)|F(X iiji
ijSX i
� z� ,maxminargˆ
. and allfor
)|,(
)|,(
:sconstraint subject to
22
21
ijSX
GXR
gXR
i
ijijW
ijijW
i
i
z�d// //
subject to constraints:
![Page 22: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/22.jpg)
2SWLPL]DWLRQ�0HWKRGVx Gradient Descent (GD) Search
– approx the objective function with a differentiable one
– cast constraints as penalty terms
x Semi-definite Programming (SDP)
– math manipulation
– relaxation
![Page 23: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/23.jpg)
/0(�2SWLPL]DWLRQ��*UDGLHQW�'HVFHQW
x Approximate with summation of exponential s
x Constraints Î Penalty terms:
)(Q
»¼º«¬
ª � | ¦z� ijSX
i
i
XdQQ,
)](exp[log1
)()( KKK
)()(lim )0()()( QQQQ �! �fo KKK K
� �� �2
,,
222
2
,,
211
2211
)|,(,0max)(
)|,()(
)()()()(
¦¦
z
z
�// /�// /
/��/��/ /
i
i
i
i
WjjiijiWj
WjjiijiWj
GXRP
gXRP
PPQO WWK
)(min)( ii
XdQ
![Page 24: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/24.jpg)
/0(�2SWLPL]DWLRQ��*UDGLHQW�'HVFHQW
x The gradient descent optimization:
x Gradient descent optimization:
± Many parameters to be tuned experimentally:
� step size, penalty coefficients, , etc.
± Slow convergence speed.
± Local optimum.
)(’
)()(’ˆ)1('ˆ
n
Onn w
w�� � H
![Page 25: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/25.jpg)
+RZ�WR�FDOFXODWH�WKH�JUDGLHQWIRU�FRQWLQXRXV�GHQVLW\�+00"����
¦¦
z�
z��
ww��
ww
ijSXi
ijSX
ii
i
i
Xd
XdXd
Q
,
,
)](exp[
)()](exp[
)(
KK
K
i
ii
i
i XXd
/w/w /w
w )|()( F
j
ji
j
iXXd
/w/w� /w
w )|()( F
![Page 26: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/26.jpg)
+RZ�WR�FDOFXODWH�WKH�JUDGLHQWIRU�FRQWLQXRXV�GHQVLW\�+00"����x Assumption 1: adjust CDHMM mean vectors only
x Assumption 2: diagonal precision matrices
x Assumption 3: use the Viterbi approximation
¦¦
��|/ T
t
D
d
iitd
iii dtltsdtlts
mXrCX1 1
2)()( )(2
1’)|(F
¦¦
��|/ T
t
D
d
jitd
jji dtltsdtlts
mXrCX1 1
2)()( )(2
1")|(
’’’’F
![Page 27: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/27.jpg)
x An active area in optimization community nowadays.x The standard SDP form
x Linear function of symmetric matrices in semi-defin ite matrix conex SDP can solve nonlinear optimization if configured properly.x Convex conic optimization Î Global optimal solutionx New efficient algorithms are developed.
6HPL�GHILQLWH�3URJUDPPLQJ��6'3�
M, X,ibXA
XC
ji
N
jjij
N
jjj
XXX N
� �
�
¦
¦
�
�
1
subject to
1
1,,,min
21
subject to
![Page 28: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/28.jpg)
/0(�2SWLPL]DWLRQ��6'3x LME: convert the constrained minimax optimziation Î
semi-definite programming (SDP)
x Introduce a new constraint to the minimax optimizat ion problem:
subject to
![Page 29: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/29.jpg)
/0(�6'3��0LQLPL]DWLRQx Minimax optimization Πminimization
± Replace max with a common upper-bound ± :
subject to
![Page 30: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/30.jpg)
/0(�6'3� 0DWUL[�)RUPx Transform into matrix form
subject to
![Page 31: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/31.jpg)
/0(�6'3��5HOD[DWLRQx Matrix Relaxation: equality ΠInequality
x Relaxation to an SDP problem
subject to
![Page 32: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/32.jpg)
/0(�6'3� 5HOD[DWLRQ�$QDO\VLVx Relaxation ± geometry explanation
± Augment x and u to a higher dimension
± Solve the problem in the augmented space
![Page 33: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/33.jpg)
/0(�6'3�� WUDLQLQJ�SURFHGXUH
![Page 34: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/34.jpg)
([SHULPHQWV��RYHUYLHZx Implemented under the HTK framework.
x Added more training tools:
± MCE training tool: HMce.c
± LME-GD training tool: HCLme.c
± LME-SDP training tool: C programs + Matlab program + dsdpdsdp package
x ASR Tasks
± OGI ISOLET E-set recognition
± TIDIGITS
![Page 35: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/35.jpg)
/0(�7UDLQLQJ�6\VWHP
![Page 36: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/36.jpg)
([SHULPHQWDO�5HVXOW��,62/(7
x Feature vector is of 39 dimensions:
(12 MFCC + E) + +
x MLE models (14 states per HMM ) are trained by HTK.
x MLE Î MCE ; MCE Î LME .
x Alphabet (26-letter) recognition (training ± 3120 utterances; test ± 1560 utterances):
– OGI (96%), Cambridge (96.73%).
– Ours: MLE (95.4%), MCE (96.1%), LME ( 96.92%).
![Page 37: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/37.jpg)
([SHULPHQWDO�5HVXOW�,62/(7�(�VHW
x ISOLET E-set: {B, C, D, E, G, P, T, V, Z}
x Training: 1080 utterances; Test: 540 utterances
x word accuracy (in %) on ISOLET E-set test data
94.4495.0092.96LME-GD
95.0095.1992.96LME-SDP
93.8994.0791.48MCE
91.4890.5685.56ML
mix-4mix-2mix-1
![Page 38: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/38.jpg)
([SHULPHQWDO�5HVXOW��7,',*,76
x Connected digit strings: µ1¶ to µ9¶ plus µoh ¶ and µzero ¶x Training with 8623 sentences; test 8700 sentences.x Feature vector is of 39 dimensions:
(12 MFCC + E) + + .x Unknown length digit string recognition.x Context-independent whole-word HMM models.x MLE models (12 states per HMM) are trained by HTK.x MLE Î MCE ; MCE Î LME .x MCE/LME training: N-best (N=5) based string-level.
![Page 39: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/39.jpg)
6WULQJ�/HYHO�/0(�7UDLQLQJ
1. Identify support tokens
2. LME optimization
3. Converge or not
![Page 40: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/40.jpg)
([SHULPHQWDO�5HVXOW�� 7,',*,76x string accuracy (in %) in test data
-GD
![Page 41: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/41.jpg)
([SHULPHQWDO�5HVXOW�� 7,',*,76x WER (in %) on TIDIGITS Test set
-GD
![Page 42: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/42.jpg)
7,',*,76�5HVXOWV/0(�*'�YV��/0(�6'3
0 10 20 30 40 50 60 70 8098.8
98.9
99
99.1
99.2
99.3
99.4
99.5
99.6
99.7
99.8
Number of Iterations
Acc
urac
y (%
)
testtraining
*UDGLHQW�'HVFHQW 6'3
0 2 4 6 8 10 12
98.9
99
99.1
99.2
99.3
99.4
99.5
99.6
99.7
99.8
99.9
Number of Iterations
Str
ing
Acc
urac
y (%
)testtraining
![Page 43: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/43.jpg)
7,',*,76�UHVXOWV�/0(�YV��RWKHUV
WER (%)string error rate (%)ModelCriterion
0.451.34context-indep
whole word model
MLE
(with HTK)
n/a0.95context-indep
whole word model
MCE
(Juang et. al. ’97 )
0.240.72context-dep
head-body-tail model
MCE
(Juang et. al. ’97 )
0.290.89context-indep
two models per word
MMIE
(Normandin’94 )
0.180.53context-indep
whole word modelLME-SDP
0.220.66context-indep
whole word modelLME-GD
![Page 44: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/44.jpg)
6XPPDU\x HMM Estimation methods for ASR
growth-transformation
or extended BW (EBW)
Maximum Mutual Information Estimation (MMIE)
(Maximum Conditional Likelihood)
gradient descent, GPD
Quickprop, etc.
Minimum Classification Error (MCE)
EM or Baum-Welch (BW)Maximum Likelihood Estimation (MLE)
gradient descent
semisemi --definite programming definite programming (SDP)(SDP)
Large Margin EstimationLarge Margin Estimation
(LME)(LME)
gradient descentMaximum Relative Margin EstimationMaximum Relative Margin Estimation
(MRME)(MRME)
Optimization methodsCriterion
![Page 45: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/45.jpg)
/DUJH�0DUJLQ�(VWLPDWLRQ��/0(��YV��'LVFULPLQDWLYH�7UDLQLQJ��'7�
x MCE or MMIE is only asymptotic bound of the Bayes error.
x But Vapnik’s generalization bound holds for a finite body of training data.
),(lim)( NQR MCEN
TT fod),(lim)( NQR MMI
NTT fo�d
¹̧·
©̈§ ���d )
4log()1
2(log
1)()(
GTTV
NV
NRR emp
Large MarginEstimation
Discriminative Training
![Page 46: DUJH0D UJLQ+ 00V IRU6 SHHFK5 HFRJQLWLRQhj/Talks/MSR_TALK.pdf · Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Email:](https://reader033.vdocument.in/reader033/viewer/2022050315/5f7781a8f5fd6f6cf4169f5e/html5/thumbnails/46.jpg)
2QJRLQJ�:RUNVx How to handle training errors?
– combined objective function: margin + training erro rs
(ICASSP’06)
x Extend to large-scale subword-based speech recognition:
– WSJ-5K
– SPINE