LAS-Transformer: An Enhanced Transformer Based on the Local Attention Mechanism for Speech Recognition - Details

Author：

Fu, Pengbin (Fu, Pengbin.) | Liu, Daxing (Liu, Daxing.) | Yang, Huirong (Yang, Huirong.)

Indexed by：

EI Scopus

Abstract：

Recently,　Transformer-based　models　have　shown　promising　results　in　automatic　speech　recognition　(ASR),　outperforming　models　based　on　recurrent　neural　networks　(RNNs)　and　convolutional　neural　networks　(CNNs).　However,　directly　applying　a　Transformer　to　the　ASR　task　does　not　exploit　the　correlation　among　speech　frames　effectively,　leaving　the　model　trapped　in　a　sub-optimal　solution.　To　this　end,　we　propose　a　local　attention　Transformer　model　for　speech　recognition　that　combines　the　high　correlation　among　speech　frames.　Specifically,　we　use　relative　positional　embedding,　rather　than　absolute　positional　embedding,　to　improve　the　generalization　of　the　Transformer　for　speech　sequences　of　different　lengths.　Secondly,　we　add　local　attention　based　on　parametric　positional　relations　to　the　self-attentive　module　and　explicitly　incorporate　prior　knowledge　into　the　self-attentive　module　to　make　the　training　process　insensitive　to　hyperparameters,　thus　improving　the　performance.　Experiments　carried　out　on　the　LibriSpeech　dataset　show　that　our　proposed　approach　achieves　a　word　error　rate　of　2.3/5.5%　by　language　model　fusion　without　any　external　data　and　reduces　the　word　error　rate　by　17.8/9.8%　compared　to　the　baseline.　The　results　are　also　close　to,　or　better　than,　other　state-of-the-art　end-to-end　models.

Keyword：

end-to-end model Transformer local attention speech recognition

Author Community：

[ 1 ] [Fu, Pengbin]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 2 ] [Liu, Daxing]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 3 ] [Yang, Huirong]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China

Reprint Author's Address：

Email：

fupengbin@bjut.edu.cn |
liudx@emails.bjut.edu.cn |
yanghuirong@bjut.edu.cn

Show more details

Related Keywords：

Survey of machine reading comprehension based on deep learning
2022，CAAI Transactions on Intelligent Systems
ChannelMix-based transformer and convolutional multi-view feature fusion network for unsupervised domain adaptation in EEG emotion recognition
2025，Expert Systems with Applications
Analysis of variation on intra-speakers speech recognition performances
2007，International Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE 2007
System-level modeling and performance evaluation of speech recognition system based on SystemC
2010，Journal of Beijing University of Technology

Source ：

INFORMATION

Year： 2022

Issue： 5

Volume： 13

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 4

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 7

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to