Indexed by:
Abstract:
Nowadays, due to the application of deep neural network (DNNS), speech enhancement (SE) technology has been significantly developed. However, most of current approaches need the parallel corpus that consists of noisy signals, corresponding speech signals and noise on the DNNs training stage. This means that a large number of realistic noisy speech signals is difficult to train the DNNs. As a result, the performance of the DNNs is restricted. In this research, a new weakly supervised speech enhancement approach is proposed to break this restriction, using the cycle-consistent generative adversarial network (CycleGAN). There are two stage for our methods. In training stage, a forward generator is employed to estimate ideal time-frequency (T-F) mask and an inverse generator is utilized to acquire noisy speech magnitude spectrum (MS). Additionally, two discriminators are used to distinguish the real clean and noisy speech from generated speech, respectively. In enhancement stage, the T-F mask is directly estimated by using the well-trained forward generator for speech enhancement. Experimental results indicate that our strategy can not only achieve satisfied performance for non-parallel data, but also acquire the higher score in speech quality and intelligibility for the DNN-based speech enhancement using parallel data. © 2020 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2020
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 2
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 10
Affiliated Colleges: