• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Li, Yue (Li, Yue.) | Li, Ruwei (Li, Ruwei.) | Li, Man (Li, Man.)

Indexed by:

EI

Abstract:

Most of the existing audio-video fusion mechanisms are fused by directly splicing or summing audio-visual cues, which cannot make full use of both kinds of information to work together. In order to solve this problem, a multimodal target speech extraction algorithm based on long short term attention mechanism is proposed in this paper. In this algorithm, firstly, the features of audio and lip are extracted by convolutional neural network and chunked with an overlap factor of 50% on the time axis. Secondly, the short term correlation and the long term correlation between the sequences are calculated by the long short term attention mechanism (LSTA). Finally, the obtained target speech mask sequence is multiplied with the target speech sequence and passed through the decoder to obtain the estimated target speaker speech. Experimental results show that compared with contrast algorithms, the proposed method performs better in the scale-invariant signal-to-noise ratio improvement (SI-SNRi) and perceptual evaluation of speech quality (PESQ), and achieves consistent improvement in cross-dataset evaluation. © 2023 IEEE.

Keyword:

Quality control Convolutional neural networks Signal to noise ratio Extraction

Author Community:

  • [ 1 ] [Li, Yue]Beijing University of Technology, Facult of Information Technology, Department of Information and Communication Engineering, Beijing, China
  • [ 2 ] [Li, Ruwei]Beijing University of Technology, Facult of Information Technology, Department of Information and Communication Engineering, Beijing, China
  • [ 3 ] [Li, Man]Beijing University of Technology, Facult of Information Technology, Department of Information and Communication Engineering, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2023

Language: English

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 7

Affiliated Colleges:

Online/Total:553/10583001
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.