• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Yang, Xue (Yang, Xue.) | Bao, Changchun (Bao, Changchun.) | Zhang, Xu (Zhang, Xu.) | Chen, Xianhong (Chen, Xianhong.)

Indexed by:

EI

Abstract:

Target speaker extraction (TSE) is a practical solution to the cocktail party problem. Recently, a novel embedding-free TSE method was proposed. In this method, the enrollment and the mixed signal are directly interacted to exploit the contextual information within the enrollment. In the absence of noise, the derived guidance exhibits the onset, offset and voice activity similar to the mixed signal. However, in the presence of noise, such similarity may be destroyed since the enrollment is interacted with both speech and noise signals in the mixture. If the noise (e.g., babble noise) contains components that resemble the enrollment to some extent, the misleading guidance may be generated after the direct interaction. To tackle this issue, an additional enhancer is designed in this paper to derive an auxiliary guidance that emphasizes the active speech. Specifically, this enhancer consists of a processing block and an interaction block. The processing block mainly utilizes the recurrent layers to model the temporal dynamics of the enrollment and mixed signal. In this block, the speech and noise signals are modeled in different manners and the similarity between the enrollment and noise can be reduced. Afterwards, the processed representations of the enrollment and mixed signal are utilized to derive an enhanced representation in the interaction block. This enhanced representation emphasizes the active speech and is employed as an auxiliary guidance for the extraction. Experimental results demonstrate the effectiveness of our proposed method in complex acoustic environments. © 2024 IEEE.

Keyword:

Speech enhancement Background noise Audio signal processing

Author Community:

  • [ 1 ] [Yang, Xue]Institute of Speech and Audio Information Processing, School of Information Science and Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 2 ] [Bao, Changchun]Institute of Speech and Audio Information Processing, School of Information Science and Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 3 ] [Zhang, Xu]Institute of Speech and Audio Information Processing, School of Information Science and Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 4 ] [Chen, Xianhong]Institute of Speech and Audio Information Processing, School of Information Science and Technology, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2024

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 8

Affiliated Colleges:

Online/Total:616/10500094
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.