• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Zhao, Jingyu (Zhao, Jingyu.) | Li, Ruwei (Li, Ruwei.) | Tian, Maocun (Tian, Maocun.) | An, Weidong (An, Weidong.)

Indexed by:

EI Scopus SCIE

Abstract:

To address the challenges of the poor representation capability and low data utilization rate of end-to-end speech recognition models in deep learning, this study proposes an end-to-end speech recognition model based on multi-scale feature fusion and multi-view self-supervised learning (MM-ASR). It adopts a multi-task learning paradigm for training. The proposed method emphasizes the importance of inter-layer information within shared encoders, aiming to enhance the model's characterization capability via the multi-scale feature fusion module. Moreover, we apply multi-view self-supervised learning to effectively exploit data information. Our approach is rigorously evaluated on the Aishell-1 dataset and further validated its effectiveness on the English corpus WSJ. The experimental results demonstrate a noteworthy 4.6 % \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} reduction in character error rate, indicating significantly improved speech recognition performance . These findings showcase the effectiveness and potential of our proposed MM-ASR model for end-to-end speech recognition tasks.

Keyword:

Multi-task learning paradigm Multi-scale feature fusion Multi-view self-supervised learning End-to-end speech recognition

Author Community:

  • [ 1 ] [Zhao, Jingyu]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 2 ] [Li, Ruwei]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 3 ] [Tian, Maocun]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China
  • [ 4 ] [An, Weidong]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China

Reprint Author's Address:

  • [Li, Ruwei]Beijing Univ Technol, Fac Informat Technol, Beijing, Peoples R China;;

Show more details

Related Keywords:

Source :

NEURAL PROCESSING LETTERS

ISSN: 1370-4621

Year: 2024

Issue: 4

Volume: 56

3 . 1 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 0

Affiliated Colleges:

Online/Total:1201/10846728
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.