Zhijian Ou, Yang Zhang. Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis. In: Proc. International Conference on Artificial Intelligence and Statistics (AISTATS), La Palma, Spain, 2012,4.

Supplementary Material

 

 

This web page contains links directing you to audio and pdf files that help explaining or demonstrating the performance of PAT. The model and the experiments are described in our paper. The relevant sections will be listed beside each subtitle.

 

1  Speech Synthesis (See section 3.2 in the paper)

We choose 4 out of 100 sentences from Edinburgh dataset along with their Z_SYNTHESIS, MU_SYNTHESIS and LPC synthesized versions and list them in the following table. Click the links to listen. Note that both the LPC-based and PAT-based syntheses use the classic OLA method. The comparisons are fair. Both can benefit from using more complicated signal-processing method to convert from spectrogram to waveform.

 

 

Original

Z_SYNTHESIS

MU_SYNTHESIS

LPC

Sentence 1

speech_syn\1_original.wav

speech_syn\1_z_syn.wav

speech_syn\1_mu_syn.wav

speech_syn\1_lpc.wav

Sentence 2

speech_syn\2_original.wav

speech_syn\2_z_syn.wav

speech_syn\2_mu_syn.wav

speech_syn\2_lpc.wav

Sentence 3

speech_syn\3_original.wav

speech_syn\3_z_syn.wav

speech_syn\3_mu_syn.wav

speech_syn\3_lpc.wav

Sentence 4

speech_syn\4_original.wav

speech_syn\4_z_syn.wav

speech_syn\4_mu_syn.wav

speech_syn\4_lpc.wav

 

2  Phoneme Clustering (See section 3.3 in the paper)

 

The utterance for phoneme clustering, which contains /ɑ:/ and /u:/ with a rising tone.

phone_cluster\freq_robust.wav

 

3  Speech Enhancement (See section 3.4 in the paper)

The clean speech, the noisy speech with 0dB SNR corrupted by white Gaussian noise, and the enhanced speech are listed in the following table. Click the links to listen. For speech enhancement, it is widely known that most signal filtering methods, e.g. spectral subtraction and wiener filtering, suffer from some residual noise known as musical noise. Note that you can hardly hear any musical noise in the PAT-based enhanced speech.

 

Clean Speech

Noisy Speech (0dB SNR)

Enhanced Speech

speech_enhance\clean_speech.wav

speech_enhance\noisy_speech.wav

speech_enhance\enhanced_speech.wav

 

4  Parameter Estimation (See section 2.4 in the paper)

The details of parameter estimation for PAT using the EM algorithm are provided as an appendix.

 

pat_appendix.pdf