Coarse-to-fine speech separation method in the time-frequency domain - Details

Author：

Yang, Xue (Yang, Xue.) | Bao, Changchun (Bao, Changchun.) | Chen, Xianhong (Chen, Xianhong.)

Indexed by：

EI Scopus SCIE

Abstract：

Although　time-domain　speech　separation　methods　have　exhibited　the　outstanding　performance　in　anechoic　scenarios,　their　effectiveness　is　considerably　reduced　in　the　reverberant　scenarios.　Compared　to　the　time-domain　methods,　the　speech　separation　methods　in　time-frequency　(T-F)　domain　mainly　concern　the　structured　T-F　representations　and　have　shown　a　great　potential　recently.　In　this　paper,　we　propose　a　coarse-to-fine　speech　separation　method　in　the　T-F　domain,　which　involves　two　steps:　1)　a　rough　separation　conducted　in　the　coarse　phase　and　2)　a　precise　extraction　accomplished　in　the　refining　phase.　In　the　coarse　phase,　the　speech　signals　of　all　speakers　are　initially　separated　in　a　rough　manner,　resulting　in　some　level　of　distortion　in　the　estimated　signals.　In　the　refining　phase,　the　T-F　representation　of　each　estimated　signal　acts　as　a　guide　to　extract　the　residual　T-F　representation　for　the　corresponding　speaker,　which　helps　to　reduce　the　distortions　caused　in　the　coarse　phase.　Besides,　the　specially　designed　networks　used　for　the　coarse　and　refining　phases　are　jointly　trained　for　superior　performance.　Furthermore,　utilizing　the　recurrent　attention　with　parallel　branches　(RAPB)　block　to　fully　exploit　the　contextual　information　contained　in　the　whole　T-F　features,　the　proposed　model　demonstrates　competitive　performance　on　clean　datasets　with　a　small　number　of　parameters.　Additionally,　the　proposed　method　shows　more　robustness　and　achieves　state-of-the-art　results　on　more　realistic　datasets.　©　2023　Elsevier　B.V.

Keyword：

Frequency domain analysis Recurrent neural networks Refining Speech enhancement Speech analysis Source separation Time domain analysis

Author Community：

[ 1 ] [Yang, Xue]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 2 ] [Bao, Changchun]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 3 ] [Chen, Xianhong]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

A Beam-TFDPRNN Based Speech Separation Method in Reverberant Environments
2023，2023 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2023
Multiple Sound Sources Separation Using Two-stage Network Model
2021，4th International Conference on Information Communication and Signal Processing, ICICSP 2021
End-to-end speech enhancement using fully convolutional networks with skip connections
2019，2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
HMM-based speech enhancement using vector Taylor series and parallel modeling in Mel-frequency domain
2014，2014 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2014

Source ：

Speech Communication

ISSN： 0167-6393

Year： 2023

Volume： 155

3 . 2 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 9

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to