source separation tutorial mini-series injb/teaching/sstutorial/part...speech enhancement: a source...
TRANSCRIPT
![Page 1: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/1.jpg)
Source Separation Tutorial Mini-Series ISpeech enhancement algorithms
Eunjoon Cho
Stanford University, EE
April 2nd, 2013
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 1 / 41
![Page 2: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/2.jpg)
Speech enhancement: A source separation perspective
Source separation: Decoupling of two or more sources with no, littleor some prior information.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 2 / 41
![Page 3: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/3.jpg)
Speech enhancement: A source separation perspective
Source separation: Decoupling of two or more sources with no, littleor some prior information.
Speech enhancement is a natural application for source separation.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 2 / 41
![Page 4: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/4.jpg)
Speech enhancement: The application perspective
Goal: Increase the intelligibility/quality of noisy speech.
Practical constraints.
Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41
![Page 5: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/5.jpg)
Speech enhancement: The application perspective
Goal: Increase the intelligibility/quality of noisy speech.
Practical constraints.
Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41
![Page 6: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/6.jpg)
Speech enhancement: The application perspective
Goal: Increase the intelligibility/quality of noisy speech.
Practical constraints.
Computationally efficient: Real-time applications on mobile phones,teleconferences.
A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41
![Page 7: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/7.jpg)
Speech enhancement: The application perspective
Goal: Increase the intelligibility/quality of noisy speech.
Practical constraints.
Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.
Stronger emphasis on reconstructing speech (differentobjective/subjective measures).
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41
![Page 8: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/8.jpg)
Speech enhancement: The application perspective
Goal: Increase the intelligibility/quality of noisy speech.
Practical constraints.
Computationally efficient: Real-time applications on mobile phones,teleconferences.A solution independent of the noise environment.Stronger emphasis on reconstructing speech (differentobjective/subjective measures).
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 3 / 41
![Page 9: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/9.jpg)
Objective
Present some of the well-established methods in the speechenhancement literature and discuss the relationship between them.
Shed insight on how such methods differ in approach andassumptions with methods that rely on matrix factorization and/orprior training of sources.
Working code that can act as baselines for any speech enhancementwork.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 4 / 41
![Page 10: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/10.jpg)
Objective
Present some of the well-established methods in the speechenhancement literature and discuss the relationship between them.
Shed insight on how such methods differ in approach andassumptions with methods that rely on matrix factorization and/orprior training of sources.
Working code that can act as baselines for any speech enhancementwork.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 4 / 41
![Page 11: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/11.jpg)
Objective
Present some of the well-established methods in the speechenhancement literature and discuss the relationship between them.
Shed insight on how such methods differ in approach andassumptions with methods that rely on matrix factorization and/orprior training of sources.
Working code that can act as baselines for any speech enhancementwork.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 4 / 41
![Page 12: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/12.jpg)
Speech enhancement: Model sources
Under-determined problem: Y (ω) = X(ω) +D(ω)
Train dictionaries (bases) of noise and/or speech to use as prior.
Model of the noise can be inaccurate.Training online can be computationally expensive.
Assumption that noise varies more slowly compared to speech andthat speech is temporally sparse.
Use voice activity detectors and estimate noise when there is no speech.Keep track of the minimum level of spectrum at certain frequency.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 5 / 41
![Page 13: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/13.jpg)
Speech enhancement: Model sources
Under-determined problem: Y (ω) = X(ω) +D(ω)
Train dictionaries (bases) of noise and/or speech to use as prior.
Model of the noise can be inaccurate.Training online can be computationally expensive.
Assumption that noise varies more slowly compared to speech andthat speech is temporally sparse.
Use voice activity detectors and estimate noise when there is no speech.Keep track of the minimum level of spectrum at certain frequency.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 5 / 41
![Page 14: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/14.jpg)
Speech enhancement: Model sources
Under-determined problem: Y (ω) = X(ω) +D(ω)
Train dictionaries (bases) of noise and/or speech to use as prior.
Model of the noise can be inaccurate.Training online can be computationally expensive.
Assumption that noise varies more slowly compared to speech andthat speech is temporally sparse.
Use voice activity detectors and estimate noise when there is no speech.Keep track of the minimum level of spectrum at certain frequency.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 5 / 41
![Page 15: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/15.jpg)
Speech enhancement: Separate sources
Based on an estimate of the noise, how can we estimate the speech.
Spectral subtraction [Bol79]Wiener filtering: MMSE Estimator [LO79]Spectral Amplitude MMSE Estimator [EM84]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 6 / 41
![Page 16: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/16.jpg)
Frequency domain approaches
Discuss approaches in the frequency domain: Y (ω) = X(ω) +D(ω)
Overall flow of STFT processing
y(n)w(n − mR)
ym(n) Ym(w)
Xm(w)x(n) xm(n)Overlap add IFT
FT
Estimation
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 7 / 41
![Page 17: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/17.jpg)
Spectral subtraction
|Ym(ω)|
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
![Page 18: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/18.jpg)
Spectral subtraction
|Ym(ω)|, |Dm(ω)|
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
noise estimate
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
![Page 19: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/19.jpg)
Spectral subtraction
|Ym(ω)|, E [|Dm(ω)|]
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
noise estimate
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
![Page 20: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/20.jpg)
Spectral subtraction
|Xm(ω)| = |Ym(ω)| − E [|Dm(ω)|]
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
noise estimate
denoised speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
![Page 21: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/21.jpg)
Spectral subtraction
|Xm(ω)| = max{|Ym(ω)| − E [|Dm(ω)|] , 0}
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
noise estimate
denoised speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
![Page 22: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/22.jpg)
Spectral subtraction
|Xm(ω)| = max{|Ym(ω)| − E [|Dm(ω)|] , 0}∠Xm(ω) = ∠Ym(ω)
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−90
−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Po
we
r m
ag
nitu
de
(d
B)
Frame 25
noisy speech
noise estimate
denoised speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 8 / 41
![Page 23: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/23.jpg)
Power spectral subtraction
|Xm(ω)|α = max{|Ym(ω)|α − E [|Dm(ω)|α] , 0}
The power spectral subtraction: α = 2
|Xm(ω)|2 = max{|Ym(ω)|2 − E[|Dm(ω)|2
], 0}
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 9 / 41
![Page 24: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/24.jpg)
Power spectral subtraction
|Xm(ω)|α = max{|Ym(ω)|α − E [|Dm(ω)|α] , 0}
The power spectral subtraction: α = 2
|Xm(ω)|2 = max{|Ym(ω)|2 − E[|Dm(ω)|2
], 0}
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 9 / 41
![Page 25: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/25.jpg)
Gain for power spectral subtraction
From the power spectral subtraction,
|Xm(ω)|2 = |Ym(ω)|2 − E[|Dm(ω)|2
]the gain can be expressed as follows
Hm(ω) =|Xm(ω)||Ym(ω)|
=
√|Ym(ω)|2 − E [|Dm(ω)|2]
|Ym(ω)|2=
√γ(ω)− 1
γ(ω)
γ(ω) is called the a-posteriori SNR. γ(ω) = |Ym(ω)|2E[|Dm(ω)|2]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 10 / 41
![Page 26: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/26.jpg)
Gain for power spectral subtraction
From the power spectral subtraction,
|Xm(ω)|2 = |Ym(ω)|2 − E[|Dm(ω)|2
]the gain can be expressed as follows
Hm(ω) =|Xm(ω)||Ym(ω)|
=
√|Ym(ω)|2 − E [|Dm(ω)|2]
|Ym(ω)|2=
√γ(ω)− 1
γ(ω)
γ(ω) is called the a-posteriori SNR. γ(ω) = |Ym(ω)|2E[|Dm(ω)|2]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 10 / 41
![Page 27: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/27.jpg)
Gain for power spectral subtraction
From the power spectral subtraction,
|Xm(ω)|2 = |Ym(ω)|2 − E[|Dm(ω)|2
]the gain can be expressed as follows
Hm(ω) =|Xm(ω)||Ym(ω)|
=
√|Ym(ω)|2 − E [|Dm(ω)|2]
|Ym(ω)|2=
√γ(ω)− 1
γ(ω)
γ(ω) is called the a-posteriori SNR. γ(ω) = |Ym(ω)|2E[|Dm(ω)|2]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 10 / 41
![Page 28: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/28.jpg)
Gain for power spectral subtraction
Hm(ω) =
√γ(ω)− 1
γ(ω)
0 5 10 15 20−25
−20
−15
−10
−5
0
SNR (dB), 10log(γ(ω))
Ga
in (
dB
), 2
0lo
g(H
(ω))
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 11 / 41
![Page 29: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/29.jpg)
Gain for general spectral subtraction
|Xm(ω)|α = |Ym(ω)|α − E [|Dm(ω)|α]Different gain functions for various α
Hm(ω) =|Xm(ω)||Ym(ω)|
=
(γ(ω)α/2 − 1
γ(ω)α/2
)1/α
0 5 10 15 20−50
−45
−40
−35
−30
−25
−20
−15
−10
−5
0
SNR (dB), 10log(γ(ω))
Ga
in (
dB
), 2
0lo
g(H
(ω))
α = 1
α = 2
α = 3
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 12 / 41
![Page 30: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/30.jpg)
Example of spectral subtraction
Noisy speech
Power spectral subtraction
Magnitude spectral subtraction
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 13 / 41
![Page 31: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/31.jpg)
Musical noise
Q. Why do we get musical noise?
A1. Inaccurate estimate of unknown variables.
From Ym(ω) = Xm(ω) +Dm(ω),
|Xm(ω)|2 = |Ym(ω)|2 − |Dm(ω)|2− (Xm(ω)D∗m(ω) +X∗m(ω)Dm(ω))
|Dm(ω)|2 ≈ E[|Dm(ω)|2
]2Re{Xm(ω)D∗m(ω)} ≈ E [2Re{Xm(ω)D∗m(ω)}] = 0
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 14 / 41
![Page 32: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/32.jpg)
Musical noise
Q. Why do we get musical noise?
A1. Inaccurate estimate of unknown variables.
From Ym(ω) = Xm(ω) +Dm(ω),
|Xm(ω)|2 = |Ym(ω)|2 − |Dm(ω)|2− (Xm(ω)D∗m(ω) +X∗m(ω)Dm(ω))
|Dm(ω)|2 ≈ E[|Dm(ω)|2
]2Re{Xm(ω)D∗m(ω)} ≈ E [2Re{Xm(ω)D∗m(ω)}] = 0
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 14 / 41
![Page 33: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/33.jpg)
Noise estimation
Simple method is to average power spectra when there is no speechactivity |Dm(ω)|2 = E
[|Ym(ω)|2
]= E
[|Dm(ω)|2
]Need an accurate voice activity detector (VAD)Issues with non-stationary noise (babble noise) conditions
Issues with |Dm(ω)|2 = E[|Dm(ω)|2
]The noise power spectrum (periodogram) has high variance withrespect to the underlying power spectral density
20 40 60 80 100 120−30
−25
−20
−15
−10
−5
0
5
FFT Index
Po
we
r (d
B)
Periodogram
True power
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 15 / 41
![Page 34: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/34.jpg)
Cross term spectrum
2Re{Xm(ω)D∗m(ω)} requires the phase information which is difficult
to estimate
0 1000 2000 3000 4000 5000 6000 7000 8000−100
−80
−60
−40
−20
0
20
40
Freq (Hz)
Pow
er
Magnitude (
dB
)
Frame 18
|X(ω)|2
|Y(ω)|2
|2Re(X(ω)D*(ω))|
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 16 / 41
![Page 35: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/35.jpg)
Musical noise
Q. Why do we get musical noise?
A1. In accurate estimate of unknown variables.
A2. How we engineer situations when we have a bad estimate.
Half rectify negative values.
|Xm(ω)|2 = max{|Ym(ω)|2 − E[|Dm(ω)|2
], 0}
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 17 / 41
![Page 36: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/36.jpg)
Artifacts from half-wave rectifying
1
1Image from [Bol79]Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 18 / 41
![Page 37: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/37.jpg)
Oversubtraction [BSM79]
Over-subtract the noise estimate to reduce noise peaks
|Xm(ω)|2 = |Ym(ω)|2 − αE[|Dm(ω)|2
]Comes at the expense of attenuating the underlying signal
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 1, Frame 25
noisy speech
noise estimate
denoised speech
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 2, Frame 25
noisy speech
noise estimate
denoised speech
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 3, Frame 25
noisy speech
noise estimate
denoised speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 19 / 41
![Page 38: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/38.jpg)
Oversubtraction [BSM79]
Fill in valleys at frequencies to mask residue noise
|Xm(ω)|2 = max{|Ym(ω)|2 − αE[|Dm(ω)|2
],
βE[|Dm(ω)|2
]}
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 2, Frame 25
noisy speech
noise estimate
denoised speech
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 2, β = 0.001, Frame 25
noisy speech
noise estimate
denoised speech
0 1000 2000 3000 4000 5000 6000 7000 8000−80
−70
−60
−50
−40
−30
−20
−10
0
10
Freq (Hz)
Pow
er
magnitude (
dB
)
α = 2, β = 0.5, Frame 25
noisy speech
noise estimate
denoised speech
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 20 / 41
![Page 39: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/39.jpg)
Oversubtraction [BSM79]
α should be dependent on the frame segmental SNR (γ)
Less attenuation (small α) for high SNR, and more attenuation (largeα) for low SNR
−10 −5 0 5 10 15 20 250
1
2
3
4
5
6
7
8
SNR (dB)
α
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 21 / 41
![Page 40: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/40.jpg)
Example of over spectral subtraction
Noisy speech
Power spectral subtraction
Spectral over subtraction: α = 15, β = 0.01
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 22 / 41
![Page 41: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/41.jpg)
Wiener filter
Find optimal linear filter that outputs the desired signal (clean speech)
e(n) = x(n)− x(n) = x(n)−M−1∑k=0
hky(n− k)
Find h∗ that minimizes E[e2(n)
]by solving
∂E[e2(n)]∂h = 0
h∗ = R−1yy ryx = (Rxx + Rdd)
−1rxx
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 23 / 41
![Page 42: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/42.jpg)
Wiener filter in frequency domain
If we assume a non-causal IIR filter, using the convolution theorem,i.e., x(n) ∗ h(n)↔ X(w)H(w)
E(ω) = X(w)−H(w)Y (w)
If we minimize E[|E(ω)|2
]with respect to H(ω), we have
H(ω) =E [X(ω)Y ∗(ω)]
E [|Y (w)|2] =E [X(ω)(X(ω)∗ +D(ω)∗)]
E [|Y (w)|2]
=E[|X(ω)|2
]E [|Y (ω)|2] =
E[|X(ω)|2
]E [|X(ω)|2] + E [|D(ω)|2]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 24 / 41
![Page 43: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/43.jpg)
Parametric wiener filters
Wiener filter
H(ω) =E[|X(ω)|2
]E [|Y (ω)|2] =
E[|X(ω)|2
]E [|X(ω)|2] + E [|D(ω)|2]
More generally,
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 25 / 41
![Page 44: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/44.jpg)
Connection with spectral subtraction
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
E[|X(ω)|2
]is unknown
If α = 1, β = 1/2, and E[|X(ω)|2
]= |X(ω)|2 then...
|X(ω)| = H(ω)|Y (ω)| =√
|X(ω)|2|X(ω)|2 + E [|D(ω)|2]
|Y (ω)|
|X(ω)|2(|X(ω)|2 + E[|D(ω)|2
]) = |X(ω)|2|Y (ω)|2
gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2
]or |X(ω)|2 = 0.
which is essentially the power spectral subtraction algorithm
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41
![Page 45: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/45.jpg)
Connection with spectral subtraction
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
E[|X(ω)|2
]is unknown
If α = 1, β = 1/2, and E[|X(ω)|2
]= |X(ω)|2 then...
|X(ω)| = H(ω)|Y (ω)| =√
|X(ω)|2|X(ω)|2 + E [|D(ω)|2]
|Y (ω)|
|X(ω)|2(|X(ω)|2 + E[|D(ω)|2
]) = |X(ω)|2|Y (ω)|2
gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2
]or |X(ω)|2 = 0.
which is essentially the power spectral subtraction algorithm
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41
![Page 46: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/46.jpg)
Connection with spectral subtraction
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
E[|X(ω)|2
]is unknown
If α = 1, β = 1/2, and E[|X(ω)|2
]= |X(ω)|2 then...
|X(ω)| = H(ω)|Y (ω)| =√
|X(ω)|2|X(ω)|2 + E [|D(ω)|2]
|Y (ω)|
|X(ω)|2(|X(ω)|2 + E[|D(ω)|2
]) = |X(ω)|2|Y (ω)|2
gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2
]or |X(ω)|2 = 0.
which is essentially the power spectral subtraction algorithm
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41
![Page 47: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/47.jpg)
Connection with spectral subtraction
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
E[|X(ω)|2
]is unknown
If α = 1, β = 1/2, and E[|X(ω)|2
]= |X(ω)|2 then...
|X(ω)| = H(ω)|Y (ω)| =√
|X(ω)|2|X(ω)|2 + E [|D(ω)|2]
|Y (ω)|
|X(ω)|2(|X(ω)|2 + E[|D(ω)|2
]) = |X(ω)|2|Y (ω)|2
gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2
]or |X(ω)|2 = 0.
which is essentially the power spectral subtraction algorithm
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41
![Page 48: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/48.jpg)
Connection with spectral subtraction
H(ω) =
(E[|X(ω)|2
]E [|X(ω)|2] + αE [|D(ω)|2]
)β
E[|X(ω)|2
]is unknown
If α = 1, β = 1/2, and E[|X(ω)|2
]= |X(ω)|2 then...
|X(ω)| = H(ω)|Y (ω)| =√
|X(ω)|2|X(ω)|2 + E [|D(ω)|2]
|Y (ω)|
|X(ω)|2(|X(ω)|2 + E[|D(ω)|2
]) = |X(ω)|2|Y (ω)|2
gives two solutions |X(ω)|2 = |Y (ω)|2 −E[|D(ω)|2
]or |X(ω)|2 = 0.
which is essentially the power spectral subtraction algorithm
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 26 / 41
![Page 49: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/49.jpg)
Wiener filter gain
If we replace E[|X(ω)|2] = |Y (ω)|2 − E[|D(ω)|2
]Wiener filter
H(ω) =E[|X(ω)|2]
E[|X(ω)|2] + E [|D(ω)|2] =γ(ω)− 1
γ(ω)
, where γ(ω) = |Y (ω)|2E[|D(ω)|2] .
The square root wiener filter = power spectral subtraction
H(ω) 12=
√γ(ω)− 1
γ(ω)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 27 / 41
![Page 50: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/50.jpg)
Wiener filter gain
0 5 10 15 20−45
−40
−35
−30
−25
−20
−15
−10
−5
0
A−posteriori SNR (dB), 10log(γ(ω))
Ga
in (
dB
), 2
0lo
g(H
(ω))
Wiener filter
Square root wiener filter(i.e., Power spectral subtraction
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 28 / 41
![Page 51: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/51.jpg)
Connection with spectral subtraction
If α 6= 1, using the same method
|X(ω)|2 = |Y (ω)|2 − αE[|D(ω)|2
]which is the spectral over subtraction method
Note the Wiener filter is zero-phase, and thus ∠X(ω) = ∠Y (ω), justlike the spectral subtraction method
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 29 / 41
![Page 52: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/52.jpg)
MMSE-STSA Estimator
Suggested by Eprahim and Malah [EM84]
Estimator that minimizes the mean square error of the spectralmagnitude
Given X(ωk) = Xkej∠X(ωk),
minE[(Xk − Xk)
2]
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 30 / 41
![Page 53: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/53.jpg)
Comparison of MMSE-STSA Estimator with Wiener filter
1 MMSE in the complex spectrum vs. magnitude spectrum
Wiener: minE[(X(ωk)− X(ωk))
2)]
MMSE-STSA: minE[(Xk − Xk)
2]
2 Linear assumption vs. assumption on distribution of Xk
Wiener: minE[(X(ωk)−H(ωk)Y (ωk))
2]
MMSE-STSA: minE[(Xk − Xk)
2], where expectation is taken over
p(Y (ωk), Xk)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 31 / 41
![Page 54: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/54.jpg)
MMSE-STSA Estimator
minE[(Xk − Xk)
2]
From Bayesian statistics the optimal MMSE estimator is,
Xk = E [Xk|Y (ωk)]
=
∫ ∞0
xkp(xk|Y (ωk))dxk
=
∫∞0 xkp(Y (ωk)|xk)p(xk)dxk
p(Y (ωk))
We need knowledge on the distribution of X(ωk) and Y (ωk)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 32 / 41
![Page 55: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/55.jpg)
Distribution assumption for X(wk), D(wk), and Y (wk)
Fourier transform coefficients (of both speech and noise) are Gaussiandistributed.
From central limit theorem: Y (ωk) =∑N−1
n=0 y(n)e−jωkn
CLT holds for weakly dependent signals tooThe variance of the distribution E|Y (ωk)|2 is time varying
X(wk) ∼ N (0, E[|X(wk)|2
])
D(wk) ∼ N (0, E[|D(wk)|2
])
Y (wk) ∼ N (0, E[|X(wk)|2
]+ E
[|D(wk)|2
])
Xk ∼ Rayleigh(σ), with σ =√E [|X(wk)|2] /2
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 33 / 41
![Page 56: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/56.jpg)
Spectral gain of MMSE-STSA estimator
The spectral gain can be represented with two variables.
The a-priori SNR: ξk =E[|X(ωk)|2]E[|D(ωk)|2]
The a-posteriori SNR: γk = |Y (ωk)|2E[|D(ωk)|2]
Using a temporary variable, νk =ξk
1+ξkγk,
Xk =
√π
2
√νkγk
exp(−νk
2
) [(1 + νk)I0
(νk2
)+ νkI1
(νk2
)]Yk
= G(ξk, γk)Yk
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 34 / 41
![Page 57: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/57.jpg)
Gain as a function of a-priori SNR
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 35 / 41
![Page 58: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/58.jpg)
Estimating the a-priori SNR
ξk =E[|X(ωk)|2
]E [|D(ωk)|2]
Instantaneous SNR: ξk =|Y (ωk)|2−E[|D(ωk)|2]
E[|D(ωk)|2]= |Y (ωk)|2
E[|D(ωk)|2]− 1
Decision directed approach
ξk(m) = aX2k(m− 1)
E [|D(ωk,m− 1)|2] + (1− a)( |Y (ωk)|2E [|D(ωk)|2]
− 1
)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 36 / 41
![Page 59: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/59.jpg)
Effect of smoothed SNR
Instantaneous SNR:
ξk = γ(ωk)− 1 =|Y (ωk)|2
E [|D(ωk)|2]− 1
Decision directed approach
ξk(m) = aX2k(m− 1)
E [|D(ωk,m− 1)|2] + (1− a)( |Y (ωk)|2E [|D(ωk)|2]
− 1
)
0 20 40 60 80 100 120 140 160 180−50
−40
−30
−20
−10
0
10
20
30
Time
SN
R (
dB
)
ξ
k
γk − 1
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 37 / 41
![Page 60: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/60.jpg)
Examples with decision directed a-priori SNR estimation
Noisy speech
MMSE-STSA (Instantaneous SNR)
MMSE-STSA (Decision Directed SNR)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 38 / 41
![Page 61: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/61.jpg)
Examples with decision directed a-priori SNR estimation
Noisy speech
Wiener Filter (Instantaneous SNR)
Wiener Filter (Decision Directed SNR)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 39 / 41
![Page 62: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/62.jpg)
Summary
Model noise when speech is absent. |D(ω)| = E [|D(ω)|]Separate speech by applying gain on the noisy spectrum.
1 Spectral subtraction: |X(ω)| = |Y (ω)| − |D(ω)|2 Wiener filter: X(ω) =
E[|X(ω)|2]E[|X(ω)|2]+E[|D(ω)|2]Y (ω)
3 STSA-MSME: X(ω) = G(ξ(ω), γ(ω))Y (ω)
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 40 / 41
![Page 63: Source Separation Tutorial Mini-Series Injb/teaching/sstutorial/part...Speech enhancement: A source separation perspective Source separation: Decoupling of two or more sources with](https://reader035.vdocument.in/reader035/viewer/2022062920/5f026bb37e708231d4043010/html5/thumbnails/63.jpg)
References I
Steven Boll, Suppression of acoustic noise in speech using spectralsubtraction, Acoustics, Speech and Signal Processing, IEEETransactions on 27 (1979), no. 2, 113–120.
M. Berouti, M. Schwartz, and J. Makhoul, Enhancement of speechcorrupted by acoustic noise, IEEE Int. Conf. Acoust., Speech, SignalProcessing (ICASSP ’79), 1979, pp. 208–211.
Y. Ephraim and D. Malah, Speech enhancement using a minimummean-square error short-time spectral amplitude estimator, IEEETransactions on Acoustics, Speech and Signal Processing (1984),1109–1121.
Jae S Lim and Alan V Oppenheim, Enhancement and bandwidthcompression of noisy speech, Proceedings of the IEEE 67 (1979),no. 12, 1586–1604.
Eunjoon Cho ( Stanford University, EE ) Source Separation Tutorial Mini-Series I April 2nd, 2013 41 / 41