COMBINING HMM-BASED MELODY EXTRACTION AND NMF-BASED SOFT MASKING
FOR SEPARATING VOICE AND ACCOMPANIMENT FROM MONAURAL AUDIO

Yun Wang (Maigo), Zhijian Ou
Department of Electronic Engineering, Tsinghua University

 

Contents of Durrieu's database:

Durrieu's database consists of three subsets:

Below is a list of the songs in each subset. The songs longer than 1 minute are cut into clips of 1 minute, and the final clip is discarded if it contains no human voice. The "Clip ID" consists of two letters and a digit: the first letter specifies the subset, the second letter identifies the song, and the final digit is the index of the clip.

SubsetClip IDTitleGender of SingerLength
A Ab1
At1
Bearlin -- Roads
Tamy -- Que Pena Tanto Faz
Male
Female
14''
13''
B Bb1 ~ Bb4
Bc1 ~ Bc4
Bp1 ~ Bp4
Bs1 ~ Bs3
Bu1 ~ Bu4
Bent Out of Shape
Chevalier Bran
Le Pub
Schizosonic
Into the Unknown
Male
Male
Male
Male
Male
2'40''
4'56''
4'36''
3'08''
2'38''
C Ch1 ~ Ch5
Ci1 ~ Ci4
Cl1 ~ Cl4
Cm1 ~ Cm5
Cu1 ~ Cu4
Shame
Silence
We Are In Love
Matter of Time
Sunrise
Female
Female
Female
Female
Female
4'20''
4'12''
3'42''
4'37''
3'16''

Download:

Separation examples from Durrieu's database:

Clip IDOriginalSDRDurrieu's OutputSDROur OutputSDRComments

Separation examples from the MIR-1K database:

Clip IDOriginalSDR*Durrieu's OutputSDR*Our OutputSDR*Comment

* Considering that A/U/V decision is not carefully applied in Durrieu's algorithm, the SDR's here are calculated only on the annotated voiced segments.