COMBINING HMM-BASED MELODY EXTRACTION AND NMF-BASED SOFT MASKING
FOR SEPARATING VOICE AND ACCOMPANIMENT FROM MONAURAL AUDIO
Yun Wang (Maigo), Zhijian Ou
Department of Electronic Engineering, Tsinghua University
 
Durrieu's database consists of three subsets:
Below is a list of the songs in each subset. The songs longer than 1 minute are cut into clips of 1 minute, and the final clip is discarded if it contains no human voice. The "Clip ID" consists of two letters and a digit: the first letter specifies the subset, the second letter identifies the song, and the final digit is the index of the clip.
Subset | Clip ID | Title | Gender of Singer | Length |
---|---|---|---|---|
A | Ab1 At1 | Bearlin -- Roads Tamy -- Que Pena Tanto Faz | Male Female | 14'' 13'' |
B | Bb1 ~ Bb4 Bc1 ~ Bc4 Bp1 ~ Bp4 Bs1 ~ Bs3 Bu1 ~ Bu4 | Bent Out of Shape Chevalier Bran Le Pub Schizosonic Into the Unknown | Male Male Male Male Male | 2'40'' 4'56'' 4'36'' 3'08'' 2'38'' |
C | Ch1 ~ Ch5 Ci1 ~ Ci4 Cl1 ~ Cl4 Cm1 ~ Cm5 Cu1 ~ Cu4 | Shame Silence We Are In Love Matter of Time Sunrise | Female Female Female Female Female | 4'20'' 4'12'' 3'42'' 4'37'' 3'16'' |
Clip ID | Original | SDR | Durrieu's Output | SDR | Our Output | SDR | Comments |
---|
Clip ID | Original | SDR* | Durrieu's Output | SDR* | Our Output | SDR* | Comment |
---|
* Considering that A/U/V decision is not carefully applied in Durrieu's algorithm, the SDR's here are calculated only on the annotated voiced segments.