Multi-view Self-supervised Learning and Multi-scale Feature Fusion for Automatic Speech Recognition - Details

Author：

Zhao, J. (Zhao, J..) | Li, R. (Li, R..) | Tian, M. (Tian, M..) | An, W. (An, W..)

Indexed by：

EI Scopus SCIE

Abstract：

To　address　the　challenges　of　the　poor　representation　capability　and　low　data　utilization　rate　of　end-to-end　speech　recognition　models　in　deep　learning,　this　study　proposes　an　end-to-end　speech　recognition　model　based　on　multi-scale　feature　fusion　and　multi-view　self-supervised　learning　(MM-ASR).　It　adopts　a　multi-task　learning　paradigm　for　training.　The　proposed　method　emphasizes　the　importance　of　inter-layer　information　within　shared　encoders,　aiming　to　enhance　the　model’s　characterization　capability　via　the　multi-scale　feature　fusion　module.　Moreover,　we　apply　multi-view　self-supervised　learning　to　effectively　exploit　data　information.　Our　approach　is　rigorously　evaluated　on　the　Aishell-1　dataset　and　further　validated　its　effectiveness　on　the　English　corpus　WSJ.　The　experimental　results　demonstrate　a　noteworthy　4.6%　reduction　in　character　error　rate,　indicating　significantly　improved　speech　recognition　performance.　These　findings　showcase　the　effectiveness　and　potential　of　our　proposed　MM-ASR　model　for　end-to-end　speech　recognition　tasks.　©　The　Author(s)　2024.

Keyword：

End-to-end speech recognition Multi-scale feature fusion Multi-task learning paradigm Multi-view self-supervised learning

Author Community：

[ 1 ] [Zhao J.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
[ 2 ] [Li R.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
[ 3 ] [Tian M.]Faculty of Information Technology, Beijing University of Technology, Beijing, China
[ 4 ] [An W.]Faculty of Information Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Mutual Learning Offline Handwritten Mathematical Expression Recognition Based on Multi-Scale Feature Fusion; [基于多尺度特征融合的互学习脱机手写数学公式识别]
2024，Journal of South China University of Technology (Natural Science)
AMFF-Net: An attention-based multi-scale feature fusion network for allergic pollen detection
2024，EXPERT SYSTEMS WITH APPLICATIONS
A Novel Deep Learning Model for Accurate Pest Detection and Edge Computing Deployment
2023，INSECTS
Accurate prediction of ice surface and bottom boundary based on multi-scale feature fusion network
2022，APPLIED INTELLIGENCE

Source ：

Neural Processing Letters

ISSN： 1370-4621

Year： 2024

Issue： 4

Volume： 56

3 . 1 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 2

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 7

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to