Target Speaker Extraction Method by Emphasizing the Active Speech with an Additional Enhancer - Details

Author：

Yang, Xue (Yang, Xue.) | Bao, Changchun (Bao, Changchun.) | Zhang, Xu (Zhang, Xu.) | Chen, Xianhong (Chen, Xianhong.)

Indexed by：

Abstract：

Target　speaker　extraction　(TSE)　is　a　practical　solution　to　the　cocktail　party　problem.　Recently,　a　novel　embedding-free　TSE　method　was　proposed.　In　this　method,　the　enrollment　and　the　mixed　signal　are　directly　interacted　to　exploit　the　contextual　information　within　the　enrollment.　In　the　absence　of　noise,　the　derived　guidance　exhibits　the　onset,　offset　and　voice　activity　similar　to　the　mixed　signal.　However,　in　the　presence　of　noise,　such　similarity　may　be　destroyed　since　the　enrollment　is　interacted　with　both　speech　and　noise　signals　in　the　mixture.　If　the　noise　(e.g.,　babble　noise)　contains　components　that　resemble　the　enrollment　to　some　extent,　the　misleading　guidance　may　be　generated　after　the　direct　interaction.　To　tackle　this　issue,　an　additional　enhancer　is　designed　in　this　paper　to　derive　an　auxiliary　guidance　that　emphasizes　the　active　speech.　Specifically,　this　enhancer　consists　of　a　processing　block　and　an　interaction　block.　The　processing　block　mainly　utilizes　the　recurrent　layers　to　model　the　temporal　dynamics　of　the　enrollment　and　mixed　signal.　In　this　block,　the　speech　and　noise　signals　are　modeled　in　different　manners　and　the　similarity　between　the　enrollment　and　noise　can　be　reduced.　Afterwards,　the　processed　representations　of　the　enrollment　and　mixed　signal　are　utilized　to　derive　an　enhanced　representation　in　the　interaction　block.　This　enhanced　representation　emphasizes　the　active　speech　and　is　employed　as　an　auxiliary　guidance　for　the　extraction.　Experimental　results　demonstrate　the　effectiveness　of　our　proposed　method　in　complex　acoustic　environments.　©　2024　IEEE.

Keyword：

Speech enhancement Background noise Audio signal processing

Author Community：

[ 1 ] [Yang, Xue]Institute of Speech and Audio Information Processing, School of Information Science and Technology, Beijing University of Technology, Beijing; 100124, China
[ 2 ] [Bao, Changchun]Institute of Speech and Audio Information Processing, School of Information Science and Technology, Beijing University of Technology, Beijing; 100124, China
[ 3 ] [Zhang, Xu]Institute of Speech and Audio Information Processing, School of Information Science and Technology, Beijing University of Technology, Beijing; 100124, China
[ 4 ] [Chen, Xianhong]Institute of Speech and Audio Information Processing, School of Information Science and Technology, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

GEV Beamforming with BAN Integrating LPS Estimation and Post-filtering
2020，2020 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2020
Speech Enhancement with Phase Correction based on Modified DNN Architecture
2018，10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018
Linear Prediction-based Part-defined Auto-encoder Used for Speech Enhancement
2019，44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
IRM with phase parameterization for speech enhancement
2019，2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019

Source ：

Year： 2024

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 8

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to