A Beam-TFDPRNN Based Speech Separation Method in Reverberant Environments - Details

Author：

Zhang, Xu (Zhang, Xu.) | Bao, Changchun (Bao, Changchun.) | Zhou, Jing (Zhou, Jing.) | Yang, Xue (Yang, Xue.)

Indexed by：

Abstract：

Recently,　the　beamforming　methods　based　on　the　time　domain　audio　separation　network　(Beam-TasNet)　have　shown　satisfactory　performance.　For　example,　the　performance　of　minimum　variance　distortionless　response　(MVDR)　beamformer　can　be　effectively　improved　by　using　the　time　domain　audio　separation　network　(TasNet).　However,　the　reverberation　will　draw　a　significant　performance　degradation　to　the　Beam-TasNet　since　the　multiple　reflection　sounds　damage　the　time　domain　features　extracted　by　the　TasNet.　Fortunately,　the　recent　studies　show　that　the　frequency　domain　features　have　better　anti-interference　capability　in　the　reverberation　environment.　Therefore,　this　paper　proposed　a　MVDR　beamforming　method　based　on　the　time-frequency　domain　Dual-Path　Recurrent　Neural　Network　(TFDPRNN)　for　the　task　of　speech　separation,　and　we　call　it　Beam-TFDPRNN.　In　this　method,　the　TFDPRNN　uses　a　path　scanning　mechanism　to　capture　the　time-frequency　features　more　comprehensively　by　repeatedly　scanning　input　speech　signal　in　both　the　time　and　frequency　dimensions.　The　scanned　time-frequency　features　could　describe　the　characteristics　of　the　speech　sources　in　reverberation　environment　more　robustly.　As　a　result,　a　better　pre-separation　result　is　obtained　by　the　TFDPRNN.　Furthermore,　by　using　the　pre-separation　signals　to　calculate　the　spatial　covariance　matrices,　a　more　robust　MVDR　beamformer　is　obtained　so　that　the　speech　separation　in　the　reverberation　environment　is　achieved　efficiently.　The　experiment　results　based　on　the　WSJ0-2mix　corpus　show　that　the　proposed　method　achieves　superior　performance　compared　to　the　reference　methods.　©　2023　IEEE.

Keyword：

Beamforming Reverberation Source separation Recurrent neural networks Speech analysis Frequency domain analysis Covariance matrix Time domain analysis

Author Community：

[ 1 ] [Zhang, Xu]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing; 100124, China
[ 2 ] [Bao, Changchun]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing; 100124, China
[ 3 ] [Zhou, Jing]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing; 100124, China
[ 4 ] [Yang, Xue]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing; 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Kronecker Product Based Linear Prediction Kalman Filter for Dereverberation and Noise Reduction
2021，2021 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2021
Coarse-to-fine speech separation method in the time-frequency domain
2023，Speech Communication
Performance analysis of eigen-space based adaptive beamformer
2010，
Multiple sound source separation by jointing single source zone detection and linearly constrained minimum variance
2020，9th International Conference on Computing and Pattern Recognition, ICCPR 2020

Source ：

Year： 2023

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 2

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 12

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to