Indexed by:
Abstract:
Under the constrained real-time condition of network, packet loss often occurs in voice communications like voice over internet protocol (VoIP). Packet loss concealment (PLC) techniques often use the previous packets to recover the lost packet for improving the quality of speech communication. In this paper, a novel deep PLC approach is proposed, which uses a called Demucs network structure, i.e., a deep U-Net architecture with a long-short time memory (LSTM) network, to predict the lost packet in the time domain. Firstly, by combing the convolutions with gated linear unit (GLU), the encoder of network can systematically extract the high-level feature of each speech frame. Secondly, the LSTM layers are used to learn the long-term dependencies of speech frames. Finally, the U-Net architecture of the network is used to improve the gradient of information flow by using skip connections, which enhances the decoder’s ability of reconstructing the lost speech frames. Additionally, the proposed architecture is optimized by utilizing multiple loss functions in the time and frequency domains. The experimental results show that the proposed method has better performance in perceptual evaluation of speech quality (PESQ) and short-term objective intelligibility (STOI). © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
Keyword:
Reprint Author's Address:
Email:
Source :
ISSN: 1865-0929
Year: 2024
Volume: 2006
Page: 227-234
Language: English
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 4
Affiliated Colleges: