• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Yang, Xue (Yang, Xue.) | Bao, Changchun (Bao, Changchun.) | Chen, Xianhong (Chen, Xianhong.)

Indexed by:

EI Scopus SCIE

Abstract:

Although time-domain speech separation methods have exhibited the outstanding performance in anechoic scenarios, their effectiveness is considerably reduced in the reverberant scenarios. Compared to the time-domain methods, the speech separation methods in time-frequency (T-F) domain mainly concern the structured T-F representations and have shown a great potential recently. In this paper, we propose a coarse-to-fine speech separation method in the T-F domain, which involves two steps: 1) a rough separation conducted in the coarse phase and 2) a precise extraction accomplished in the refining phase. In the coarse phase, the speech signals of all speakers are initially separated in a rough manner, resulting in some level of distortion in the estimated signals. In the refining phase, the T-F representation of each estimated signal acts as a guide to extract the residual T-F representation for the corresponding speaker, which helps to reduce the distortions caused in the coarse phase. Besides, the specially designed networks used for the coarse and refining phases are jointly trained for superior performance. Furthermore, utilizing the recurrent attention with parallel branches (RAPB) block to fully exploit the contextual information contained in the whole T-F features, the proposed model demonstrates competitive performance on clean datasets with a small number of parameters. Additionally, the proposed method shows more robustness and achieves state-of-the-art results on more realistic datasets. © 2023 Elsevier B.V.

Keyword:

Frequency domain analysis Recurrent neural networks Refining Speech enhancement Speech analysis Source separation Time domain analysis

Author Community:

  • [ 1 ] [Yang, Xue]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 2 ] [Bao, Changchun]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 3 ] [Chen, Xianhong]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Speech Communication

ISSN: 0167-6393

Year: 2023

Volume: 155

3 . 2 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 4

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 9

Affiliated Colleges:

Online/Total:1138/10614288
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.