A Multimodal Target Speech Extraction Algorithm Based on Long Short Term Attention Mechanism - Details

Author：

Li, Yue (Li, Yue.) | Li, Ruwei (Li, Ruwei.) | Li, Man (Li, Man.)

Indexed by：

Abstract：

Most　of　the　existing　audio-video　fusion　mechanisms　are　fused　by　directly　splicing　or　summing　audio-visual　cues,　which　cannot　make　full　use　of　both　kinds　of　information　to　work　together.　In　order　to　solve　this　problem,　a　multimodal　target　speech　extraction　algorithm　based　on　long　short　term　attention　mechanism　is　proposed　in　this　paper.　In　this　algorithm,　firstly,　the　features　of　audio　and　lip　are　extracted　by　convolutional　neural　network　and　chunked　with　an　overlap　factor　of　50%　on　the　time　axis.　Secondly,　the　short　term　correlation　and　the　long　term　correlation　between　the　sequences　are　calculated　by　the　long　short　term　attention　mechanism　(LSTA).　Finally,　the　obtained　target　speech　mask　sequence　is　multiplied　with　the　target　speech　sequence　and　passed　through　the　decoder　to　obtain　the　estimated　target　speaker　speech.　Experimental　results　show　that　compared　with　contrast　algorithms,　the　proposed　method　performs　better　in　the　scale-invariant　signal-to-noise　ratio　improvement　(SI-SNRi)　and　perceptual　evaluation　of　speech　quality　(PESQ),　and　achieves　consistent　improvement　in　cross-dataset　evaluation.　©　2023　IEEE.

Keyword：

Quality control Convolutional neural networks Signal to noise ratio Extraction

Author Community：

[ 1 ] [Li, Yue]Beijing University of Technology, Facult of Information Technology, Department of Information and Communication Engineering, Beijing, China
[ 2 ] [Li, Ruwei]Beijing University of Technology, Facult of Information Technology, Department of Information and Communication Engineering, Beijing, China
[ 3 ] [Li, Man]Beijing University of Technology, Facult of Information Technology, Department of Information and Communication Engineering, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

A Face Quality Evaluation Method Based on DCNN
2020，32nd Chinese Control and Decision Conference, CCDC 2020
Experimental research on the dynamics of the image quality reconstructed from holographic dual-monomers photopolymer
2012，Acta Optica Sinica
A novel rate-quality model based D.264/AVC frame layer rate control method
2007，2007 6th International Conference on Information, Communications and Signal Processing, ICICS
On-site data-processing algorithm and optimization for airborne ice sounding radar configured on the 'snow eagle 601'
2021，2021 24th ISPRS Congress Commission III: Imaging Today, Foreseeing Tomorrow

Source ：

Year： 2023

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 7

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to