Indexed by:
Abstract:
To enhance speech recognition performance with limited network capacity, an end-To-end speech recognition algorithm based on multi-layer-enriched supervision (MES-ASR) is proposed in this paper. This approach allows the model to learn more speaking content information despite the capacity constraints. Specifically, the intermediate layer supervision loss is optimized by introducing an additional supervision loss at that layer. This forces the lower layers of the network to focus on acquiring more speaking content information, leading to improved recognition performance. Notably, this method adds the supervisory loss solely in the middle layer, ensuring simplicity during training and no additional memory or computational overhead during testing. The proposed approach is evaluated on the Aishell-1 dataset, employing WeNet as the baseline model. The experimental results demonstrate a reduction in character error rate and an enhancement in speech recognition performance. © 2023 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2023
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 7
Affiliated Colleges: