• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Zhao, T. (Zhao, T..) | Bao, C. (Bao, C..) | Yang, X. (Yang, X..) | Zhang, X. (Zhang, X..)

Indexed by:

EI Scopus

Abstract:

Filter-and-sum beamforming framework could separate speech effectively from the complicated acoustic scenarios by using dual-path recurrent neural network (DPRNN) to estimate the beamforming filters. Since the concerned context information was modeled by recurrent layers of the intermediate states, only the suboptimal separation performance can be achieved. To increase the performance, the dual-path transformer network (DPTNet) is employed to estimate beamforming filters instead of DPRNN in this paper because the DPTNet takes advantage of self-attention mechanism and makes high dimension feature sequences interacted directly. Specifically, to provide the spatial and context information of multi-channel speech signals, the cosine similarities between different channels are first concatenated with the transformed speech signals to serve as the input. Then, the DPTNet and transform-averaged-concatenation operation are used to extract context information for estimating beamforming filter of each channel. Finally, the observed signal of each channel is filtered and added to obtain the desired speech. Compared with the existing FaSNet, the proposed method can achieve better separation performance.  © 2022 IEEE.

Keyword:

filter-and-sum network self-attention mechanism Speech separation deep learning Microphone array

Author Community:

  • [ 1 ] [Zhao T.]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, 100124, China
  • [ 2 ] [Bao C.]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, 100124, China
  • [ 3 ] [Yang X.]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, 100124, China
  • [ 4 ] [Zhang X.]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, 100124, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

Year: 2022

Language: English

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 1

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 7

Affiliated Colleges:

Online/Total:1048/10533531
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.