Indexed by:
Abstract:
Sound source separation is a hot topic in array signal processing. In fact, it is difficult to separate multiple sound sources in reverberant environments. To solve this problem, a two-stage multiple sound source separation method is proposed based on the fully-convolutional time-domain audio separation network (Conv-TasNet) and deep neural network (DNN). In the first stage, the end-to-end separation network (i.e., the Conv-TasNet) is employed to separate the sound sources from the signal recorded in reverberant environments. We use the encoder to generate the time domain representation of the recorded signal, and the mask is obtained by the Conv-TasNet model. The recovery of each separated sound source signal is conducted by multiplying the output of the encoder by the mask. In the second stage, the separated signal is enhanced with the single DNN. The training target of the DNN is to obtain the ideal enhancement mask, which is calculated by integrating the amplitude of the frequency domain coefficients of both separated signal and clean signal. The amplitude of the frequency domain coefficients of the separated signals is used as the input of DNN in order to predict the ideal ratio mask (IRM). IRM is multiplied by the original amplitude to obtain the enhanced amplitude of the frequency domain coefficients, which is integrated with phase to obtain the enhanced source signal. The results of the subjective and objective evaluation show that, compared with the reference methods, the proposed method achieves better separation quality in both reverberant and anechoic acoustic environments. © 2021 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2021
Page: 264-269
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 7
Affiliated Colleges: