Indexed by:
Abstract:
Filter-and-sum beamforming framework could separate speech effectively from the complicated acoustic scenarios by using dual-path recurrent neural network (DPRNN) to estimate the beamforming filters. Since the concerned context information was modeled by recurrent layers of the intermediate states, only the suboptimal separation performance can be achieved. To increase the performance, the dual-path transformer network (DPTNet) is employed to estimate beamforming filters instead of DPRNN in this paper because the DPTNet takes advantage of self-attention mechanism and makes high dimension feature sequences interacted directly. Specifically, to provide the spatial and context information of multi-channel speech signals, the cosine similarities between different channels are first concatenated with the transformed speech signals to serve as the input. Then, the DPTNet and transform-averaged-concatenation operation are used to extract context information for estimating beamforming filter of each channel. Finally, the observed signal of each channel is filtered and added to obtain the desired speech. Compared with the existing FaSNet, the proposed method can achieve better separation performance. © 2022 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2022
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 7
Affiliated Colleges: