Query:
学者姓名:贾懋珅
Refining:
Year
Type
Indexed by
Source
Complex
Co-Author
Language
Clean All
Abstract :
Audio Coding has made significant progress with the development of deep neural networks. Recently, neural speech codecs based on vector quantized variational autoencoder have become increasingly popular among researchers due to their elegant design and superior performance, but their application to high bitrate audio coding has not been further expanded. In this paper, we propose a novel high fidelity end-to-end neural audio codec called time frequency fusion codec (TFF-Codec), which is capable of high-quality reconstruction of 32 kHz audio in the time-frequency domain at 48 and 64 kbps. In this paper, a dual-path time-frequency filtering module is proposed to capture the local structure of the spectrogram and the long-term time dependence between consecutive frames. The architecture of the proposed codec is composed of encoder, the time-frequency filtering module, vector quantizer and decoder. First, the input audio is fed into the encoder to obtain its potential representation. Then, it is modeled in the frequency domain in the time-frequency filtering module. Subsequently, it is further compressed by a vector quantizer. Finally, the reconstructed audio is obtained by the decoder. We also use a combination of multiple loss functions in TFF-Codec to ensure that the reconstructed audio is balanced in terms of objective metrics and subjective listening experience. To evaluate the performance of TFF-Codec, comparative experiments are conducted with the traditional audio codec Opus and several recent neural audio codecs. Both subjective and objective evaluation tests demonstrate the superiority of our proposed method.
Keyword :
Audio codec Audio codec End to end neural network End to end neural network High fidelity audio generation High fidelity audio generation Auto encoder Auto encoder
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Zhao, Yuhao , Jia, Maoshen , Ru, Jiawei et al. TFF-Codec: A High Fidelity End-to-End Neural Audio Codec [J]. | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2025 . |
MLA | Zhao, Yuhao et al. "TFF-Codec: A High Fidelity End-to-End Neural Audio Codec" . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING (2025) . |
APA | Zhao, Yuhao , Jia, Maoshen , Ru, Jiawei , Wang, Lizhong , Wen, Liang . TFF-Codec: A High Fidelity End-to-End Neural Audio Codec . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2025 . |
Export to | NoteExpress RIS BibTex |
Abstract :
In recent years, the speaker-independent, single-channel speech separation problem has made significant progress with the development of deep neural networks (DNNs). However, separating the speech of each interested speaker from an environment that includes the speech of other speakers, background noise, and room reverberation remains challenging. In order to solve this problem, a speech separation method for a noisy reverberation environment is proposed. Firstly, the time-domain end-to-end network structure of a deep encoder/decoder dual-path neural network is introduced in this paper for speech separation. Secondly, to make the model not fall into local optimum during training, a loss function stretched optimal scale-invariant signal-to-noise ratio (SOSISNR) was proposed, inspired by the scale-invariant signal-to-noise ratio (SISNR). At the same time, in order to make the training more appropriate to the human auditory system, the joint loss function is extended based on short-time objective intelligibility (STOI). Thirdly, an alignment operation is proposed to reduce the influence of time delay caused by reverberation on separation performance. Combining the above methods, the subjective and objective evaluation metrics show that this study has better separation performance in complex sound field environments compared to the baseline methods.
Keyword :
Speech enhancement Speech enhancement Deep learning Deep learning Speech separation Speech separation SISNR SISNR
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Wang, Chunxi , Jia, Maoshen , Zhang, Xinfeng . Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments [J]. | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2023 , 2023 (1) . |
MLA | Wang, Chunxi et al. "Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments" . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2023 . 1 (2023) . |
APA | Wang, Chunxi , Jia, Maoshen , Zhang, Xinfeng . Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2023 , 2023 (1) . |
Export to | NoteExpress RIS BibTex |
Abstract :
Multisource localization occupies an important position in the field of acoustic signal processing and is widely applied in scenarios, such as human-machine interaction and spatial acoustic parameter acquisition. The direction-of-arrival (DOA) of a sound source is convenient to render spatial sound in the audio metaverse. A multisource localization method in a reverberation environment is proposed based on the angle distribution of time-frequency (TF) points using a first-order ambisonics (FOA) microphone. The method is implemented in three steps. 1) By exploring the angle distribution of TF points, a single-source zone (SSZ) detection method is proposed by using a standard deviation-based measure, which reveals the degree of convergence of TF point angles in a zone. 2) To reduce the effect of outliers on localization, an outlier removal method is designed to remove the TF points whose angles are far from the real DOAs, where the median angle of each detected zone is adopted to construct the outlier set. 3) DOA estimates of multiple sources are obtained by postprocessing of the angle histogram. Experimental results in both the simulated and real scenarios verify the effectiveness of the proposed method in a reverberation environment, which also show that the proposed method outperforms reference methods.
Keyword :
speech processing speech processing signal processing signal processing
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Tao, Liang , Jia, Maoshen , Li, Lu et al. Multisource localization based on angle distribution of time-frequency points using an FOA microphone [J]. | CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY , 2023 , 8 (3) : 807-823 . |
MLA | Tao, Liang et al. "Multisource localization based on angle distribution of time-frequency points using an FOA microphone" . | CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 8 . 3 (2023) : 807-823 . |
APA | Tao, Liang , Jia, Maoshen , Li, Lu , Wang, Jing , Xiang, Yang . Multisource localization based on angle distribution of time-frequency points using an FOA microphone . | CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY , 2023 , 8 (3) , 807-823 . |
Export to | NoteExpress RIS BibTex |
Abstract :
This paper proposes a diffuseness estimation-based single-source time-frequency point (SSTP) detection method for multisource direction of arrival (DOA) estimation. According to the composition, time-frequency (TF) points are divided into three types: SSTP, multisource TF, and interference TF. SSTPs and multisource TF points are defined as weak interference time-frequency points (WITPs). An SSTP is a TF point consisting only of the direct component of one sound source, which is beneficial for DOA estimation. Therefore, multisource DOA estimation is transformed into single-source DOA estimation by SSTP detection. Diffuseness estimation is introduced for a sound field microphone array. WITPs are detected by a diffuseness estimation-based detection method. Phase similarity determination is adopted to identify SSTPs from detected WITPs. Multiple sound source localization is completed by searching peaks in the normalized histogram of DOA estimates corresponding to the detected SSTPs. Experiments demonstrate that the proposed method achieves the precise detection of SSTPs, and evaluations show that it has superior accuracy of multiple sound source counting and localization in reverberant and noisy environments.
Keyword :
Sparsity component analysis Sparsity component analysis Reverberation Reverberation Diffuseness estimation Diffuseness estimation Direction of arrival Direction of arrival
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Zhang, Yu , Jia, Maoshen , Gao, Shang et al. Diffuseness Estimation-Based SSTP Detection for Multiple Sound Source Localization in Reverberant Environments [J]. | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2023 , 42 (8) : 4713-4739 . |
MLA | Zhang, Yu et al. "Diffuseness Estimation-Based SSTP Detection for Multiple Sound Source Localization in Reverberant Environments" . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING 42 . 8 (2023) : 4713-4739 . |
APA | Zhang, Yu , Jia, Maoshen , Gao, Shang , Wang, Jing . Diffuseness Estimation-Based SSTP Detection for Multiple Sound Source Localization in Reverberant Environments . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2023 , 42 (8) , 4713-4739 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Multiple speech source separation plays an important role in many applications such as automatic speech recognition, acoustical surveillance, and teleconferencing. In this study, we propose a method for the separation of multiple speech sources in a reverberant environment based on sparse component enhancement. In a recorded signal (i.e., a mixed signal of multiple speech sources), there are always time-frequency points where only one source is active or dominant. It is the sparsity of speech signals. Such time-frequency points are called sparse component points. However, in a reverberant environment, the sparsity of the speech signal is affected, resulting in a decrease in the number of sparse component points in the recorded signal, which affects the quality of the separated source signal. In this study, for mixture signals recorded by a soundfield microphone (a microphone array), we first experimentally analyze the negative impact of reverberation on sparse components and then develop a sparse component enhancement method to increase the number of these points. Then, the sparse components are identified and classified according to the directions of arrival estimate of the sources. Next, the sparse components are used to guide the recovery of the non-sparse components. Finally, multiple source separation is achieved by the joint restoration of the sparse and non-sparse components of each source. The proposed method has low computational complexity and applies to underdetermined scenarios. Through a series of subjective and objective evaluation experiments, the effectiveness of the method is verified.
Keyword :
Multiple source separation Multiple source separation Sparse component Sparse component Reverberation Reverberation Soundfield microphone Soundfield microphone
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Li, Lu , Jia, Maoshen , Liu, Jinxiang et al. Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement [J]. | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2023 , 42 (10) : 6001-6028 . |
MLA | Li, Lu et al. "Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement" . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING 42 . 10 (2023) : 6001-6028 . |
APA | Li, Lu , Jia, Maoshen , Liu, Jinxiang , Pai, Tun-Wen . Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2023 , 42 (10) , 6001-6028 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Estimating the direction of arrival (DOA) is an important topic in the array signal processing. This paper addresses the issue of multisource localization in a closed environment. We propose a single source zone (SSZ) detection method based on first-order relative harmonic coefficient (RHC), and designs a dynamic SSZ detection rule. Finally, 2-D kernel density estimation (KDE) and peak search are used to achieve multisource DOA estimation. The proposed method is evaluated by simulation experiment and compared with the reference methods, and the effectiveness of the proposed method is verified.
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Tao, Liang , Jia, Maoshen , Bu, Bing et al. Single Source Zone Detection in the Spherical Harmonic Domain for Multisource Localization [J]. | 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC , 2023 : 996-1001 . |
MLA | Tao, Liang et al. "Single Source Zone Detection in the Spherical Harmonic Domain for Multisource Localization" . | 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC (2023) : 996-1001 . |
APA | Tao, Liang , Jia, Maoshen , Bu, Bing , Yao, Dingding . Single Source Zone Detection in the Spherical Harmonic Domain for Multisource Localization . | 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC , 2023 , 996-1001 . |
Export to | NoteExpress RIS BibTex |
Abstract :
This paper presents a method for direction of arrival (DOA) estimation of multiple speech sources based on the temporal correlation and local-frequency stationarity of speech signals. The distribution analysis of single-source points (SSPs) in a recorded signal shows that in the time-frequency (T-F) domain, the SSPs are distributed in the form of a small cluster. According to this distribution, a method for DOA estimation of multiple sound sources is developed based on the continuity between adjacent T-F points. In addition, low-reverberation single-source (LRSS) points are detected based on the phase consistency and used as guidance to detect whether adjacent T-F points are SSPs. The direction deviations between adjacent frequency points and between adjacent frames are used as the SSP detection criteria considering the temporal correlation and local-frequency stationarity. The kernel density estimation and peak search are performed to obtain the dynamic DOA estimation range of each source. Finally, DOA estimates of each source are obtained by statistical weighting-based fine localization. Experiments under both simulated and real conditions show that the proposed method can achieve better localization performance than several existing methods. (c) 2022 Elsevier Ltd. All rights reserved.
Keyword :
Direction of arrival estimation Direction of arrival estimation Single-source point detection Single-source point detection Temporal correlation Temporal correlation
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Li, Lu , Jia, Maoshen , Wang, Jing . DOA estimation of multiple speech sources based on the single-source point detection using an FOA microphone [J]. | APPLIED ACOUSTICS , 2022 , 195 . |
MLA | Li, Lu et al. "DOA estimation of multiple speech sources based on the single-source point detection using an FOA microphone" . | APPLIED ACOUSTICS 195 (2022) . |
APA | Li, Lu , Jia, Maoshen , Wang, Jing . DOA estimation of multiple speech sources based on the single-source point detection using an FOA microphone . | APPLIED ACOUSTICS , 2022 , 195 . |
Export to | NoteExpress RIS BibTex |
Abstract :
Multiple sound source separation in a reverberant environment has become popular in recent years. To improve the quality of the separated signal in a reverberant environment, a separation method based on a DOA cue and a deep neural network (DNN) is proposed in this paper. Firstly, a pre-processing model based on non-negative matrix factorization (NMF) is utilized for recorded signal dereverberation, which makes source separation more efficient. Then, we propose a multi-source separation algorithm combining sparse and non-sparse component points recovery to obtain each sound source signal from the dereverberated signal. For sparse component points, the dominant sound source for each sparse component point is determined by a DOA cue. For non-sparse component points, a DNN is used to recover each sound source signal. Finally, the signals separated from the sparse and non-sparse component points are well matched by temporal correlation to obtain each sound source signal. Both objective and subjective evaluation results indicate that compared with the existing method, the proposed separation approach shows a better performance in the case of a high-reverberation environment.
Keyword :
direction of arrival direction of arrival multi-source separation multi-source separation deep neural network deep neural network dereverberation dereverberation
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Zhang, Yu , Jia, Maoshen , Jia, Xinyu et al. A Multi-Source Separation Approach Based on DOA Cue and DNN [J]. | APPLIED SCIENCES-BASEL , 2022 , 12 (12) . |
MLA | Zhang, Yu et al. "A Multi-Source Separation Approach Based on DOA Cue and DNN" . | APPLIED SCIENCES-BASEL 12 . 12 (2022) . |
APA | Zhang, Yu , Jia, Maoshen , Jia, Xinyu , Pai, Tun-Wen . A Multi-Source Separation Approach Based on DOA Cue and DNN . | APPLIED SCIENCES-BASEL , 2022 , 12 (12) . |
Export to | NoteExpress RIS BibTex |
Abstract :
Speech emotion recognition (SER) is a hot topic in speech signal processing. When the training data and the test data come from different corpus, their feature distributions are different, which leads to the degradation of the recognition performance. Therefore, in order to solve this problem, a cross-corpus speech emotion recognition method is proposed based on subspace learning and domain adaptation in this paper. Specifically, training set data and the test set data are used to form the source domain and target domain, respectively. Then, the Hessian matrix is introduced to obtain the subspace for the extracted features in both source and target domains. In addition, an information entropy-based domain adaption method is introduced to construct the common space. In the common space, the difference between the feature distributions in the source domain and target domain is reduced as much as possible. To evaluate the performance of the proposed method, extensive experiments are conducted on cross-corpus speech emotion recognition. Experimental results show that the proposed method achieves better performance compared with some existing subspace learning and domain adaptation methods.
Keyword :
Cross-corpus Cross-corpus Domain adaption Domain adaption Subspace learning Subspace learning Speech emotion recognition Speech emotion recognition
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Cao, Xuan , Jia, Maoshen , Ru, Jiawei et al. Cross-corpus speech emotion recognition using subspace learning and domain adaption [J]. | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2022 , 2022 (1) . |
MLA | Cao, Xuan et al. "Cross-corpus speech emotion recognition using subspace learning and domain adaption" . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2022 . 1 (2022) . |
APA | Cao, Xuan , Jia, Maoshen , Ru, Jiawei , Pai, Tun-wen . Cross-corpus speech emotion recognition using subspace learning and domain adaption . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2022 , 2022 (1) . |
Export to | NoteExpress RIS BibTex |
Abstract :
Multiple sound source localization is a hot issue of concern in recent years. The Single Source Zone (SSZ) based localization methods achieve good performance due to the detection and utilization of the Time-Frequency (T-F) zone where only one source is dominant. However, some T-F points consisting of components from multiple sources are also included in the detected SSZ sometimes. Once a T-F point in SSZ is contributed by multiple components, this point is defined as an outlier. The existence of outliers within the detected SSZ is usually an unavoidable problem for SSZ-based methods. To solve this problem, a multi-source localization by using offset residual weight is proposed in this paper. In this method, an assumption is developed: the direction estimated by all the T-F points within the detected SSZ has a difference along with the actual direction of sources. But this difference is much smaller than the difference between the directions estimated by the outliers along with the actual source localization. After verifying this assumption experimentally, Point Offset Residual Weight (PORW) and Source Offset Residual Weight (SORW) are proposed to reduce the influence of outliers on the localization results. Then, a composite weight is formed by combining PORW and SORW, which can effectively distinguish the outliers and desired points. After that, the outliers are removed by composite weight. Finally, a statistical histogram of DOA estimation with outliers removed is used for multi-source localization. The objective evaluation of the proposed method is conducted in various simulated environments. The results show that the proposed method achieves a better performance compared with the reference methods in sources localization.
Keyword :
Multiple sound sources localization Multiple sound sources localization Direction of arrival estimation Direction of arrival estimation Soundfield microphone Soundfield microphone Reverberation Reverberation
Cite:
Copy from the list or Export to your reference management。
GB/T 7714 | Jia, Maoshen , Gao, Shang , Bao, Changchun . Multi-source localization by using offset residual weight [J]. | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2021 , 2021 (1) . |
MLA | Jia, Maoshen et al. "Multi-source localization by using offset residual weight" . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2021 . 1 (2021) . |
APA | Jia, Maoshen , Gao, Shang , Bao, Changchun . Multi-source localization by using offset residual weight . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2021 , 2021 (1) . |
Export to | NoteExpress RIS BibTex |
Export
Results: |
Selected to |
Format: |