Triple-Path RNN Network: A Time-and-Frequency Joint Domain Speech Separation Model - Details

Author：

Indexed by：

EI Scopus

Abstract：

Studies　in　speech　separation　have　achieved　significant　success　in　recent　years.　To　correctly　separate　the　mixture　signals,　it　is　critical　to　encode　the　signals　into　an　appropriate　latent　space.　Existing　speech　separation　methods　include　transforming　mixed　signals　into　frequency　domain　space　or　time　domain　space.　The　frequency　domain　features　(spectrogram)　are　generated　by　STFT,　which　is　closely　related　to　speech　articulation　and　reflects　the　energy　of　speech　directly.　The　time　domain　features　are　learned　from　a　latent　embedding　space,　and　the　separation　effect　is　facilitated　by　the　end-to-end　structure.　However,　these　methods　are　based　on　the　representations　from　only　one　domain,　which　is　insufficient　for　providing　a　speech　separation　encoding　space　that　is　completely　separable.　Therefore,　a　Triple-Path　Recurrent　Neural　Network　(TPRNN)　that　fuse　features　from　two　domains　is　proposed.　It　employs　a　spectrogram　as　auxiliary　information　to　improve　the　performance　of　speech　separation.　Experimental　results　on　the　Wall　Street　Journal　(WSJ0)　dataset　show　that　this　approach　is　beneficial　to　improve　speech　separation　performance.　©　2024,　The　Author(s),　under　exclusive　license　to　Springer　Nature　Singapore　Pte　Ltd.

Keyword：

Speech separation waveform deep learning spectrogram

Author Community：

[ 1 ] [Zhai Y.-H.]College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
[ 2 ] [Hua Q.]College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
[ 3 ] [Wang X.-W.]College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
[ 4 ] [Dong C.-R.]College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
[ 5 ] [Zhang F.]College of Mathematics and Information Science, Hebei University, Baoding, 071002, China
[ 6 ] [Xu D.-C.]Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments
2023，EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING
DPTNet-based Beamforming for Speech Separation
2022，
TARGET SPEAKER EXTRACTION BY DIRECTLY EXPLOITING CONTEXTUAL INFORMATION IN THE TIME-FREQUENCY DOMAIN
2024，2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024)
A deep-learning method for radar micro-doppler spectrogram restoration
2020，Sensors (Switzerland)

Source ：

ISSN： 1876-1100

Year： 2024

Volume： 1112 LNEE

Page： 239-248

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 12

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to