Zhijian Ou, Yang Zhang.
Probabilistic acoustic tube: a probabilistic generative model of speech for
speech analysis/synthesis. In: Proc. International Conference on Artificial
Intelligence and Statistics (AISTATS), La Palma, Spain, 2012,4.
Supplementary
Material
This web page contains
links directing you to audio and pdf files that help explaining or
demonstrating the performance of PAT. The model and the experiments are
described in our paper. The relevant sections will be listed beside each
subtitle.
1 Speech Synthesis (See section 3.2 in the
paper)
We choose 4 out of 100
sentences from Edinburgh dataset along with their Z_SYNTHESIS, MU_SYNTHESIS and
LPC synthesized versions and list them in the following table. Click the links
to listen. Note that both the LPC-based and PAT-based syntheses use the classic
OLA method. The comparisons are fair. Both can benefit from using more
complicated signal-processing method to convert from spectrogram to waveform.
|
Original |
Z_SYNTHESIS |
MU_SYNTHESIS |
LPC |
Sentence 1 |
||||
Sentence 2 |
||||
Sentence 3 |
||||
Sentence 4 |
2 Phoneme Clustering (See section 3.3 in
the paper)
The utterance for
phoneme clustering, which contains /ɑ:/ and /u:/ with a rising tone. |
3 Speech Enhancement (See section 3.4 in
the paper)
The clean speech, the
noisy speech with 0dB SNR corrupted by white Gaussian noise, and the enhanced
speech are listed in the following table. Click the links to listen. For speech
enhancement, it is widely known that most signal filtering methods, e.g.
spectral subtraction and wiener filtering, suffer from some residual noise
known as musical noise. Note that you can hardly hear any musical noise in the
PAT-based enhanced speech.
Clean
Speech |
Noisy Speech (0dB SNR) |
Enhanced Speech |
4 Parameter Estimation (See section 2.4 in
the paper)
The details of parameter
estimation for PAT using the EM algorithm are provided as an appendix.