Multi-Source Localization Method Based on the Log-Mel Spectrum Augmented Noise Subspace - Details

Author：

Duan, Haiwei (Duan, Haiwei.) | Bao, Changchun (Bao, Changchun.) | Zhou, Jing (Zhou, Jing.)

Indexed by：

Abstract：

The　deep　learning　(DL)　based　direction-of-arrival　(DOA)　estimation　is　one　of　the　research　hotspots,　and　many　methods　have　been　proposed　recently.　However,　most　of　those　methods　will　face　serious　performance　degradation,　since　the　adverse　impacts　caused　by　the　sources　overlapping,　noise　and　reverberation.　One　of　the　primary　impacts　is　that　the　performance　degradation　is　susceptible　to　some　pre-extracted　features　that　often　result　in　spectral　aliasing　and　peak　confusion　in　a　complex　scenario.　In　this　paper,　a　new　feature　stacked　with　the　log-Mel　spectrum　and　the　noise　subspace　of　the　covariance　matrix　of　the　relative　sound　pressure　is　proposed　and　further　used　for　the　DL-based　DOA　estimation,　which　is　referred　to　log-Mel　spectrum　augmented　noise　subspace　(LMNS).　The　LMNS　is　more　robust　compared　with　the　conventional　features　since　it　can　represent　both　spectral　and　spatial　information　effectively.　Meanwhile,　the　LMNS　is　used　as　the　input　feature　and　fed　to　a　Conformer　based　residual　network　to　map　the　spatial　pseudo-spectrum,　thereby　the　DOAs　of　the　sound　sources　can　be　obtained.　The　experimental　results　show　that　the　proposed　method　has　better　performance　on　the　DOA　estimation,　which　verifies　that　the　proposed　feature　LMNS　is　more　robust　and　effective　in　the　scenarios　with　multi-source,　noise　and　reverberation.　©　2023　IEEE.

Keyword：

Deep learning Reverberation Direction of arrival Covariance matrix Acoustic noise

Author Community：

[ 1 ] [Duan, Haiwei]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, China
[ 2 ] [Bao, Changchun]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, China
[ 3 ] [Zhou, Jing]Beijing University of Technology, Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Binaural Target Sound Source Localization Based on Time-frequency Units Selection
2019，Journal of Electronics and Information Technology
Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments
2023，EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING
Robust Coherent sources Localization Based on Hankel Matrix Reconstruction
2024，14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
Real time estimation and tracking method for the direction of arrival of single sound source based on Kalman filtering and frequency focusing
2024，Journal of Tsinghua University

Source ：

Year： 2023

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 6

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to