Implementation of Multichannel Speech Coding Based on the Opus Codec and Spatial Parameters - Details

Author：

Indexed by：

Abstract：

Multichannel　speech　processing　has　been　widely　studied　since　the　spatial　information　contained　in　the　multichannel　signals　can　be　exploited.　To　facilitate　the　efficient　transmission　and　preserve　the　spatial　information,　the　multichannel　speech　coding　technique　is　needed.　Recently,　a　multichannel　speech　coding　method　based　on　the　Opus　codec　and　spatial　parameters　was　proposed.　In　the　encoding　stage,　the　speech　signal　of　the　reference　channel　is　encoded　with　the　Opus　codec.　The　multichannel　speech　signals　are　decomposed　through　the　Gammatone　filter　bank　and　the　spatial　parameters　are　extracted　and　quantized　for　each　sub-band　signals.　In　the　decoding　stage,　the　encoded　signal　of　the　reference　channel　is　decoded　with　the　Opus　codec.　Subsequently,　this　decoded　signal　is　combined　with　the　quantized　spatial　parameters　to　recover　the　speech　signals　of　the　remaining　channels.　In　this　paper,　an　improved　implementation　of　this　coding　method　is　detailed.　Specifically,　a　framing　pattern　more　suitable　for　multichannel　speech　coding　with　the　Gammatone　filter　bank　is　proposed.　Besides,　a　newly　designed　window　is　then　employed　on　each　sub-band　signal　for　the　precise　extraction　of　spatial　parameters　in　the　frequency　domain.　Additionally,　the　extracted　spatial　parameters　are　quantized　non-uniformly.　The　experimental　results　show　the　effectiveness　of　the　proposed　implementation.　This　implementation　can　achieve　high　speech　quality　with　a　reduced　bitrate.　Furthermore,　the　spatial　information　contained　in　the　multichannel　speech　signals　can　be　better　preserved.　©　2024　IEEE.

Keyword：

Audio signal processing Channel coding Decoding Frequency domain analysis Quantization (signal) Speech enhancement Signal encoding Encoding (symbols) Microphone array Image coding

Author Community：

[ 1 ] [Yang, Xue]Institute of Speech and Audio Information Processing, Beijing University of Technology, Faculty of Information Technology, China
[ 2 ] [Bao, Changchun]Institute of Speech and Audio Information Processing, Beijing University of Technology, Faculty of Information Technology, China
[ 3 ] [Zhou, Jing]Institute of Speech and Audio Information Processing, Beijing University of Technology, Faculty of Information Technology, China
[ 4 ] [Zhang, Xu]Institute of Speech and Audio Information Processing, Beijing University of Technology, Faculty of Information Technology, China
[ 5 ] [Duan, Haiwei]Institute of Speech and Audio Information Processing, Beijing University of Technology, Faculty of Information Technology, China
[ 6 ] [Zhao, Yunhao]Institute of Speech and Audio Information Processing, Beijing University of Technology, Faculty of Information Technology, China
[ 7 ] [Li, Wenwen]Institute of Speech and Audio Information Processing, Beijing University of Technology, Faculty of Information Technology, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

A Dual-path Conformer-based Network for Neural Speech Coding
2024，14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
Linear Prediction-based Part-defined Auto-encoder Used for Speech Enhancement
2019，44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
A New Parametric Coding Method Combined Linear Microphone Array Topology
2022，2022 Data Compression Conference, DCC 2022
Multi-stage encoding scheme for multiple audio objects using compressed sensing
2015，Cybernetics and Information Technologies

Source ：

Year： 2024

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 9

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to