COMBINING HMM-BASED MELODY EXTRACTION AND NMF-BASED SOFT MASKING FOR SEPARATING VOICE AND ACCOMPANIMENT FROM MONAURAL AUDIO, Yun Wang & Zhijian Ou

COMBINING HMM-BASED MELODY EXTRACTION AND NMF-BASED SOFT MASKING
FOR SEPARATING VOICE AND ACCOMPANIMENT FROM MONAURAL AUDIO

Yun Wang (Maigo), Zhijian Ou
Department of Electronic Engineering, Tsinghua University

Contents of Durrieu's database:

Durrieu's database consists of three subsets:

Subset A: SiSEC professionally produced material
Subset B: M. Lagrange's database
Subset C: Shannon Hurley's songs (Creative common licence)

Below is a list of the songs in each subset. The songs longer than 1 minute are cut into clips of 1 minute, and the final clip is discarded if it contains no human voice. The "Clip ID" consists of two letters and a digit: the first letter specifies the subset, the second letter identifies the song, and the final digit is the index of the clip.

Subset	Clip ID	Title	Gender of Singer	Length
A	Ab1 At1	Bearlin -- Roads Tamy -- Que Pena Tanto Faz	Male Female	14'' 13''
B	Bb1 ~ Bb4 Bc1 ~ Bc4 Bp1 ~ Bp4 Bs1 ~ Bs3 Bu1 ~ Bu4	Bent Out of Shape Chevalier Bran Le Pub Schizosonic Into the Unknown	Male Male Male Male Male	2'40'' 4'56'' 4'36'' 3'08'' 2'38''
C	Ch1 ~ Ch5 Ci1 ~ Ci4 Cl1 ~ Cl4 Cm1 ~ Cm5 Cu1 ~ Cu4	Shame Silence We Are In Love Matter of Time Sunrise	Female Female Female Female Female	4'20'' 4'12'' 3'42'' 4'37'' 3'16''

Download:

Separation examples from Durrieu's database:

Separation examples from the MIR-1K database:

* Considering that A/U/V decision is not carefully applied in Durrieu's algorithm, the SDR's here are calculated only on the annotated voiced segments.