Indexed by:
Abstract:
The success of retrieval-augmented language models in various natural language processing (NLP) tasks has been constrained in automatic speech recognition (ASR) applications due to challenges in constructing fine-grained audio-text datastores. This paper presents kNN-CTC, a novel approach that overcomes these challenges by leveraging Connectionist Temporal Classification (CTC) pseudo labels to establish frame-level audio-text key-value pairs, circumventing the need for precise ground truth alignments. We further introduce a “skip-blank” strategy, which strategically ignores CTC blank frames, to reduce datastore size. By incorporating a k-nearest neighbors retrieval mechanism into pretrained CTC ASR systems and leveraging a fine-grained, pruned datastore, kNN-CTC consistently achieves substantial improvements in performance under various experimental settings. Our code is available at https://github.com/NKUHLT/KNN-CTC. © 2024 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
ISSN: 1520-6149
Year: 2024
Page: 11006-11010
Language: English
Cited Count:
SCOPUS Cited Count: 2
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 0
Affiliated Colleges: