DPTNet-based Beamforming for Speech Separation - Details

Author：

Zhao, T. (Zhao, T..) | Bao, C. (Bao, C..) | Yang, X. (Yang, X..) | Zhang, X. (Zhang, X..)

Indexed by：

EI Scopus

Abstract：

Filter-and-sum　beamforming　framework　could　separate　speech　effectively　from　the　complicated　acoustic　scenarios　by　using　dual-path　recurrent　neural　network　(DPRNN)　to　estimate　the　beamforming　filters.　Since　the　concerned　context　information　was　modeled　by　recurrent　layers　of　the　intermediate　states,　only　the　suboptimal　separation　performance　can　be　achieved.　To　increase　the　performance,　the　dual-path　transformer　network　(DPTNet)　is　employed　to　estimate　beamforming　filters　instead　of　DPRNN　in　this　paper　because　the　DPTNet　takes　advantage　of　self-attention　mechanism　and　makes　high　dimension　feature　sequences　interacted　directly.　Specifically,　to　provide　the　spatial　and　context　information　of　multi-channel　speech　signals,　the　cosine　similarities　between　different　channels　are　first　concatenated　with　the　transformed　speech　signals　to　serve　as　the　input.　Then,　the　DPTNet　and　transform-averaged-concatenation　operation　are　used　to　extract　context　information　for　estimating　beamforming　filter　of　each　channel.　Finally,　the　observed　signal　of　each　channel　is　filtered　and　added　to　obtain　the　desired　speech.　Compared　with　the　existing　FaSNet,　the　proposed　method　can　achieve　better　separation　performance.　　©　2022　IEEE.

Keyword：

filter-and-sum network self-attention mechanism Speech separation deep learning Microphone array

Author Community：

[ 1 ] [Zhao T.]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, 100124, China
[ 2 ] [Bao C.]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, 100124, China
[ 3 ] [Yang X.]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, 100124, China
[ 4 ] [Zhang X.]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Iteratively Refined Multi-Channel Speech Separation
2024，APPLIED SCIENCES-BASEL
Research Situation and Prospects of Multi-speaker Separation and Target Speaker Extraction; [多说话人分离与目标说话人提取的研究现状与展望]
2024，Journal of Data Acquisition and Processing
Multi-speaker Speech Separation under Reverberation Conditions Using Conv-Tasnet
2023，Journal of Advances in Information Technology
Triple-Path RNN Network: A Time-and-Frequency Joint Domain Speech Separation Model
2024，

Source ：

Year： 2022

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 7

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to