• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索
High Impact Results & Cited Count Trend for Year Keyword Cloud and Partner Relationship

Query:

学者姓名:贾懋珅

Refining:

Source

Submit Unfold

Language

Submit

Clean All

Sort by:
Default
  • Default
  • Title
  • Year
  • WOS Cited Count
  • Impact factor
  • Ascending
  • Descending
< Page ,Total 5 >
TFF-Codec: A High Fidelity End-to-End Neural Audio Codec SCIE
期刊论文 | 2025 | CIRCUITS SYSTEMS AND SIGNAL PROCESSING
Abstract&Keyword Cite

Abstract :

Audio Coding has made significant progress with the development of deep neural networks. Recently, neural speech codecs based on vector quantized variational autoencoder have become increasingly popular among researchers due to their elegant design and superior performance, but their application to high bitrate audio coding has not been further expanded. In this paper, we propose a novel high fidelity end-to-end neural audio codec called time frequency fusion codec (TFF-Codec), which is capable of high-quality reconstruction of 32 kHz audio in the time-frequency domain at 48 and 64 kbps. In this paper, a dual-path time-frequency filtering module is proposed to capture the local structure of the spectrogram and the long-term time dependence between consecutive frames. The architecture of the proposed codec is composed of encoder, the time-frequency filtering module, vector quantizer and decoder. First, the input audio is fed into the encoder to obtain its potential representation. Then, it is modeled in the frequency domain in the time-frequency filtering module. Subsequently, it is further compressed by a vector quantizer. Finally, the reconstructed audio is obtained by the decoder. We also use a combination of multiple loss functions in TFF-Codec to ensure that the reconstructed audio is balanced in terms of objective metrics and subjective listening experience. To evaluate the performance of TFF-Codec, comparative experiments are conducted with the traditional audio codec Opus and several recent neural audio codecs. Both subjective and objective evaluation tests demonstrate the superiority of our proposed method.

Keyword :

Audio codec Audio codec End to end neural network End to end neural network High fidelity audio generation High fidelity audio generation Auto encoder Auto encoder

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Zhao, Yuhao , Jia, Maoshen , Ru, Jiawei et al. TFF-Codec: A High Fidelity End-to-End Neural Audio Codec [J]. | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2025 .
MLA Zhao, Yuhao et al. "TFF-Codec: A High Fidelity End-to-End Neural Audio Codec" . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING (2025) .
APA Zhao, Yuhao , Jia, Maoshen , Ru, Jiawei , Wang, Lizhong , Wen, Liang . TFF-Codec: A High Fidelity End-to-End Neural Audio Codec . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2025 .
Export to NoteExpress RIS BibTex
Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments SCIE
期刊论文 | 2023 , 2023 (1) | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING
WoS CC Cited Count: 2
Abstract&Keyword Cite

Abstract :

In recent years, the speaker-independent, single-channel speech separation problem has made significant progress with the development of deep neural networks (DNNs). However, separating the speech of each interested speaker from an environment that includes the speech of other speakers, background noise, and room reverberation remains challenging. In order to solve this problem, a speech separation method for a noisy reverberation environment is proposed. Firstly, the time-domain end-to-end network structure of a deep encoder/decoder dual-path neural network is introduced in this paper for speech separation. Secondly, to make the model not fall into local optimum during training, a loss function stretched optimal scale-invariant signal-to-noise ratio (SOSISNR) was proposed, inspired by the scale-invariant signal-to-noise ratio (SISNR). At the same time, in order to make the training more appropriate to the human auditory system, the joint loss function is extended based on short-time objective intelligibility (STOI). Thirdly, an alignment operation is proposed to reduce the influence of time delay caused by reverberation on separation performance. Combining the above methods, the subjective and objective evaluation metrics show that this study has better separation performance in complex sound field environments compared to the baseline methods.

Keyword :

Speech enhancement Speech enhancement Deep learning Deep learning Speech separation Speech separation SISNR SISNR

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Wang, Chunxi , Jia, Maoshen , Zhang, Xinfeng . Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments [J]. | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2023 , 2023 (1) .
MLA Wang, Chunxi et al. "Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments" . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2023 . 1 (2023) .
APA Wang, Chunxi , Jia, Maoshen , Zhang, Xinfeng . Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2023 , 2023 (1) .
Export to NoteExpress RIS BibTex
Multisource localization based on angle distribution of time-frequency points using an FOA microphone SCIE
期刊论文 | 2023 , 8 (3) , 807-823 | CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY
WoS CC Cited Count: 1
Abstract&Keyword Cite

Abstract :

Multisource localization occupies an important position in the field of acoustic signal processing and is widely applied in scenarios, such as human-machine interaction and spatial acoustic parameter acquisition. The direction-of-arrival (DOA) of a sound source is convenient to render spatial sound in the audio metaverse. A multisource localization method in a reverberation environment is proposed based on the angle distribution of time-frequency (TF) points using a first-order ambisonics (FOA) microphone. The method is implemented in three steps. 1) By exploring the angle distribution of TF points, a single-source zone (SSZ) detection method is proposed by using a standard deviation-based measure, which reveals the degree of convergence of TF point angles in a zone. 2) To reduce the effect of outliers on localization, an outlier removal method is designed to remove the TF points whose angles are far from the real DOAs, where the median angle of each detected zone is adopted to construct the outlier set. 3) DOA estimates of multiple sources are obtained by postprocessing of the angle histogram. Experimental results in both the simulated and real scenarios verify the effectiveness of the proposed method in a reverberation environment, which also show that the proposed method outperforms reference methods.

Keyword :

speech processing speech processing signal processing signal processing

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Tao, Liang , Jia, Maoshen , Li, Lu et al. Multisource localization based on angle distribution of time-frequency points using an FOA microphone [J]. | CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY , 2023 , 8 (3) : 807-823 .
MLA Tao, Liang et al. "Multisource localization based on angle distribution of time-frequency points using an FOA microphone" . | CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY 8 . 3 (2023) : 807-823 .
APA Tao, Liang , Jia, Maoshen , Li, Lu , Wang, Jing , Xiang, Yang . Multisource localization based on angle distribution of time-frequency points using an FOA microphone . | CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY , 2023 , 8 (3) , 807-823 .
Export to NoteExpress RIS BibTex
Diffuseness Estimation-Based SSTP Detection for Multiple Sound Source Localization in Reverberant Environments SCIE
期刊论文 | 2023 , 42 (8) , 4713-4739 | CIRCUITS SYSTEMS AND SIGNAL PROCESSING
Abstract&Keyword Cite

Abstract :

This paper proposes a diffuseness estimation-based single-source time-frequency point (SSTP) detection method for multisource direction of arrival (DOA) estimation. According to the composition, time-frequency (TF) points are divided into three types: SSTP, multisource TF, and interference TF. SSTPs and multisource TF points are defined as weak interference time-frequency points (WITPs). An SSTP is a TF point consisting only of the direct component of one sound source, which is beneficial for DOA estimation. Therefore, multisource DOA estimation is transformed into single-source DOA estimation by SSTP detection. Diffuseness estimation is introduced for a sound field microphone array. WITPs are detected by a diffuseness estimation-based detection method. Phase similarity determination is adopted to identify SSTPs from detected WITPs. Multiple sound source localization is completed by searching peaks in the normalized histogram of DOA estimates corresponding to the detected SSTPs. Experiments demonstrate that the proposed method achieves the precise detection of SSTPs, and evaluations show that it has superior accuracy of multiple sound source counting and localization in reverberant and noisy environments.

Keyword :

Sparsity component analysis Sparsity component analysis Reverberation Reverberation Diffuseness estimation Diffuseness estimation Direction of arrival Direction of arrival

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Zhang, Yu , Jia, Maoshen , Gao, Shang et al. Diffuseness Estimation-Based SSTP Detection for Multiple Sound Source Localization in Reverberant Environments [J]. | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2023 , 42 (8) : 4713-4739 .
MLA Zhang, Yu et al. "Diffuseness Estimation-Based SSTP Detection for Multiple Sound Source Localization in Reverberant Environments" . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING 42 . 8 (2023) : 4713-4739 .
APA Zhang, Yu , Jia, Maoshen , Gao, Shang , Wang, Jing . Diffuseness Estimation-Based SSTP Detection for Multiple Sound Source Localization in Reverberant Environments . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2023 , 42 (8) , 4713-4739 .
Export to NoteExpress RIS BibTex
Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement SCIE
期刊论文 | 2023 , 42 (10) , 6001-6028 | CIRCUITS SYSTEMS AND SIGNAL PROCESSING
Abstract&Keyword Cite

Abstract :

Multiple speech source separation plays an important role in many applications such as automatic speech recognition, acoustical surveillance, and teleconferencing. In this study, we propose a method for the separation of multiple speech sources in a reverberant environment based on sparse component enhancement. In a recorded signal (i.e., a mixed signal of multiple speech sources), there are always time-frequency points where only one source is active or dominant. It is the sparsity of speech signals. Such time-frequency points are called sparse component points. However, in a reverberant environment, the sparsity of the speech signal is affected, resulting in a decrease in the number of sparse component points in the recorded signal, which affects the quality of the separated source signal. In this study, for mixture signals recorded by a soundfield microphone (a microphone array), we first experimentally analyze the negative impact of reverberation on sparse components and then develop a sparse component enhancement method to increase the number of these points. Then, the sparse components are identified and classified according to the directions of arrival estimate of the sources. Next, the sparse components are used to guide the recovery of the non-sparse components. Finally, multiple source separation is achieved by the joint restoration of the sparse and non-sparse components of each source. The proposed method has low computational complexity and applies to underdetermined scenarios. Through a series of subjective and objective evaluation experiments, the effectiveness of the method is verified.

Keyword :

Multiple source separation Multiple source separation Sparse component Sparse component Reverberation Reverberation Soundfield microphone Soundfield microphone

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Li, Lu , Jia, Maoshen , Liu, Jinxiang et al. Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement [J]. | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2023 , 42 (10) : 6001-6028 .
MLA Li, Lu et al. "Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement" . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING 42 . 10 (2023) : 6001-6028 .
APA Li, Lu , Jia, Maoshen , Liu, Jinxiang , Pai, Tun-Wen . Separation of Multiple Speech Sources in Reverberant Environments Based on Sparse Component Enhancement . | CIRCUITS SYSTEMS AND SIGNAL PROCESSING , 2023 , 42 (10) , 6001-6028 .
Export to NoteExpress RIS BibTex
Single Source Zone Detection in the Spherical Harmonic Domain for Multisource Localization CPCI-S
期刊论文 | 2023 , 996-1001 | 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC
Abstract&Keyword Cite

Abstract :

Estimating the direction of arrival (DOA) is an important topic in the array signal processing. This paper addresses the issue of multisource localization in a closed environment. We propose a single source zone (SSZ) detection method based on first-order relative harmonic coefficient (RHC), and designs a dynamic SSZ detection rule. Finally, 2-D kernel density estimation (KDE) and peak search are used to achieve multisource DOA estimation. The proposed method is evaluated by simulation experiment and compared with the reference methods, and the effectiveness of the proposed method is verified.

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Tao, Liang , Jia, Maoshen , Bu, Bing et al. Single Source Zone Detection in the Spherical Harmonic Domain for Multisource Localization [J]. | 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC , 2023 : 996-1001 .
MLA Tao, Liang et al. "Single Source Zone Detection in the Spherical Harmonic Domain for Multisource Localization" . | 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC (2023) : 996-1001 .
APA Tao, Liang , Jia, Maoshen , Bu, Bing , Yao, Dingding . Single Source Zone Detection in the Spherical Harmonic Domain for Multisource Localization . | 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC , 2023 , 996-1001 .
Export to NoteExpress RIS BibTex
DOA estimation of multiple speech sources based on the single-source point detection using an FOA microphone SCIE
期刊论文 | 2022 , 195 | APPLIED ACOUSTICS
WoS CC Cited Count: 11
Abstract&Keyword Cite

Abstract :

This paper presents a method for direction of arrival (DOA) estimation of multiple speech sources based on the temporal correlation and local-frequency stationarity of speech signals. The distribution analysis of single-source points (SSPs) in a recorded signal shows that in the time-frequency (T-F) domain, the SSPs are distributed in the form of a small cluster. According to this distribution, a method for DOA estimation of multiple sound sources is developed based on the continuity between adjacent T-F points. In addition, low-reverberation single-source (LRSS) points are detected based on the phase consistency and used as guidance to detect whether adjacent T-F points are SSPs. The direction deviations between adjacent frequency points and between adjacent frames are used as the SSP detection criteria considering the temporal correlation and local-frequency stationarity. The kernel density estimation and peak search are performed to obtain the dynamic DOA estimation range of each source. Finally, DOA estimates of each source are obtained by statistical weighting-based fine localization. Experiments under both simulated and real conditions show that the proposed method can achieve better localization performance than several existing methods. (c) 2022 Elsevier Ltd. All rights reserved.

Keyword :

Direction of arrival estimation Direction of arrival estimation Single-source point detection Single-source point detection Temporal correlation Temporal correlation

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Li, Lu , Jia, Maoshen , Wang, Jing . DOA estimation of multiple speech sources based on the single-source point detection using an FOA microphone [J]. | APPLIED ACOUSTICS , 2022 , 195 .
MLA Li, Lu et al. "DOA estimation of multiple speech sources based on the single-source point detection using an FOA microphone" . | APPLIED ACOUSTICS 195 (2022) .
APA Li, Lu , Jia, Maoshen , Wang, Jing . DOA estimation of multiple speech sources based on the single-source point detection using an FOA microphone . | APPLIED ACOUSTICS , 2022 , 195 .
Export to NoteExpress RIS BibTex
A Multi-Source Separation Approach Based on DOA Cue and DNN SCIE
期刊论文 | 2022 , 12 (12) | APPLIED SCIENCES-BASEL
Abstract&Keyword Cite

Abstract :

Multiple sound source separation in a reverberant environment has become popular in recent years. To improve the quality of the separated signal in a reverberant environment, a separation method based on a DOA cue and a deep neural network (DNN) is proposed in this paper. Firstly, a pre-processing model based on non-negative matrix factorization (NMF) is utilized for recorded signal dereverberation, which makes source separation more efficient. Then, we propose a multi-source separation algorithm combining sparse and non-sparse component points recovery to obtain each sound source signal from the dereverberated signal. For sparse component points, the dominant sound source for each sparse component point is determined by a DOA cue. For non-sparse component points, a DNN is used to recover each sound source signal. Finally, the signals separated from the sparse and non-sparse component points are well matched by temporal correlation to obtain each sound source signal. Both objective and subjective evaluation results indicate that compared with the existing method, the proposed separation approach shows a better performance in the case of a high-reverberation environment.

Keyword :

direction of arrival direction of arrival multi-source separation multi-source separation deep neural network deep neural network dereverberation dereverberation

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Zhang, Yu , Jia, Maoshen , Jia, Xinyu et al. A Multi-Source Separation Approach Based on DOA Cue and DNN [J]. | APPLIED SCIENCES-BASEL , 2022 , 12 (12) .
MLA Zhang, Yu et al. "A Multi-Source Separation Approach Based on DOA Cue and DNN" . | APPLIED SCIENCES-BASEL 12 . 12 (2022) .
APA Zhang, Yu , Jia, Maoshen , Jia, Xinyu , Pai, Tun-Wen . A Multi-Source Separation Approach Based on DOA Cue and DNN . | APPLIED SCIENCES-BASEL , 2022 , 12 (12) .
Export to NoteExpress RIS BibTex
Cross-corpus speech emotion recognition using subspace learning and domain adaption SCIE
期刊论文 | 2022 , 2022 (1) | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING
WoS CC Cited Count: 4
Abstract&Keyword Cite

Abstract :

Speech emotion recognition (SER) is a hot topic in speech signal processing. When the training data and the test data come from different corpus, their feature distributions are different, which leads to the degradation of the recognition performance. Therefore, in order to solve this problem, a cross-corpus speech emotion recognition method is proposed based on subspace learning and domain adaptation in this paper. Specifically, training set data and the test set data are used to form the source domain and target domain, respectively. Then, the Hessian matrix is introduced to obtain the subspace for the extracted features in both source and target domains. In addition, an information entropy-based domain adaption method is introduced to construct the common space. In the common space, the difference between the feature distributions in the source domain and target domain is reduced as much as possible. To evaluate the performance of the proposed method, extensive experiments are conducted on cross-corpus speech emotion recognition. Experimental results show that the proposed method achieves better performance compared with some existing subspace learning and domain adaptation methods.

Keyword :

Cross-corpus Cross-corpus Domain adaption Domain adaption Subspace learning Subspace learning Speech emotion recognition Speech emotion recognition

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Cao, Xuan , Jia, Maoshen , Ru, Jiawei et al. Cross-corpus speech emotion recognition using subspace learning and domain adaption [J]. | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2022 , 2022 (1) .
MLA Cao, Xuan et al. "Cross-corpus speech emotion recognition using subspace learning and domain adaption" . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2022 . 1 (2022) .
APA Cao, Xuan , Jia, Maoshen , Ru, Jiawei , Pai, Tun-wen . Cross-corpus speech emotion recognition using subspace learning and domain adaption . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2022 , 2022 (1) .
Export to NoteExpress RIS BibTex
Multi-source localization by using offset residual weight SCIE
期刊论文 | 2021 , 2021 (1) | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING
WoS CC Cited Count: 2
Abstract&Keyword Cite

Abstract :

Multiple sound source localization is a hot issue of concern in recent years. The Single Source Zone (SSZ) based localization methods achieve good performance due to the detection and utilization of the Time-Frequency (T-F) zone where only one source is dominant. However, some T-F points consisting of components from multiple sources are also included in the detected SSZ sometimes. Once a T-F point in SSZ is contributed by multiple components, this point is defined as an outlier. The existence of outliers within the detected SSZ is usually an unavoidable problem for SSZ-based methods. To solve this problem, a multi-source localization by using offset residual weight is proposed in this paper. In this method, an assumption is developed: the direction estimated by all the T-F points within the detected SSZ has a difference along with the actual direction of sources. But this difference is much smaller than the difference between the directions estimated by the outliers along with the actual source localization. After verifying this assumption experimentally, Point Offset Residual Weight (PORW) and Source Offset Residual Weight (SORW) are proposed to reduce the influence of outliers on the localization results. Then, a composite weight is formed by combining PORW and SORW, which can effectively distinguish the outliers and desired points. After that, the outliers are removed by composite weight. Finally, a statistical histogram of DOA estimation with outliers removed is used for multi-source localization. The objective evaluation of the proposed method is conducted in various simulated environments. The results show that the proposed method achieves a better performance compared with the reference methods in sources localization.

Keyword :

Multiple sound sources localization Multiple sound sources localization Direction of arrival estimation Direction of arrival estimation Soundfield microphone Soundfield microphone Reverberation Reverberation

Cite:

Copy from the list or Export to your reference management。

GB/T 7714 Jia, Maoshen , Gao, Shang , Bao, Changchun . Multi-source localization by using offset residual weight [J]. | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2021 , 2021 (1) .
MLA Jia, Maoshen et al. "Multi-source localization by using offset residual weight" . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2021 . 1 (2021) .
APA Jia, Maoshen , Gao, Shang , Bao, Changchun . Multi-source localization by using offset residual weight . | EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING , 2021 , 2021 (1) .
Export to NoteExpress RIS BibTex
10| 20| 50 per page
< Page ,Total 5 >

Export

Results:

Selected

to

Format:
Online/Total:857/10422379
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.