Research Situation and Prospects of Multi-speaker Separation and Target Speaker Extraction; [多说话人分离与目标说话人提取的研究现状与展望] - Details

Author：

Bao, C. (Bao, C..) | Yang, X. (Yang, X..)

Indexed by：

Scopus

Abstract：

As　a　cutting-edge　technology　in　speech　signal　processing,　speech　separation　has　significant　research　value　and　broad　application　prospects.　Typically,　the　signal　captured　by　the　microphones　contains　speech　signals　from　multiple　speakers,　noise　and　reverberation.　To　improve　the　user　experience　and　the　performance　of　backend　devices,　it　is　necessary　to　perform　speech　separation.　Speech　separation　originated　from　the　well-known　cocktail　party　problem.　It　aims　to　separate　the　speech　signals　from　the　mixed　signal.　In　recent　years,　researchers　have　proposed　a　large　number　of　speech　separation　methods,　which　have　significantly　improved　separation　performance.　This　paper　systematically　reviews　and　summarizes　these　methods.　First,　based　on　whether　the　auxiliary　information　of　the　target　speaker　is　leveraged,　speech　separation　is　divided　into　two　categories,　i.　e.,　multi-speaker　separation　and　target　speaker　extraction.　Second,　these　methods　are　introduced　in　detail,　following　the　progression　from　conventional　approaches　to　deep　learning-based　techniques.　Finally,　the　existing　challenges　in　speech　separation　are　discussed　and　prospective　research　in　the　future　are　highlighted.　©　2024　Nanjing　University　of　Aeronautics　an　Astronautics.　All　rights　reserved.

Keyword：

cocktail party problem deep learning target speaker extraction multi-speaker separation speech separation

Author Community：

[ 1 ] [Bao C.]Institute of Speech and Audio Information Processing, School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China
[ 2 ] [Yang X.]Institute of Speech and Audio Information Processing, School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

TARGET SPEAKER EXTRACTION BY DIRECTLY EXPLOITING CONTEXTUAL INFORMATION IN THE TIME-FREQUENCY DOMAIN
2024，2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2024)
Multi-speaker Speech Separation under Reverberation Conditions Using Conv-Tasnet
2023，Journal of Advances in Information Technology
Triple-Path RNN Network: A Time-and-Frequency Joint Domain Speech Separation Model
2024，
Deep encoder/decoder dual-path neural network for speech separation in noisy reverberation environments
2023，EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING

Source ：

Journal of Data Acquisition and Processing

ISSN： 1004-9037

Year： 2024

Issue： 5

Volume： 39

Page： 1044-1061

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 15

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to