Joint ideal ratio mask and generative adversarial networks for monaural speech enhancement - Details

Author：

Yuan, Jing (Yuan, Jing.) | Bao, Changchun (Bao, Changchun.) (Scholars：鲍长春)

Indexed by：

EI Scopus

Abstract：

Speech　enhancement　is　the　task　of　improving　some　perceptual　aspects　of　noisy　speech.　Recently,　Generative　Adversarial　Networks　(GAN)　is　becoming　a　popular　deep　learning　method　and　different　GAN＇s　structures　have　been　proposed　[1],　[2].　In　this　paper,　we　propose　a　new　framework　for　speech　enhancement　task　by　using　GAN.　We　train　two　models:　a　generative　model　G　and　a　discriminative　model　D.　The　G　and　D　are　both　defined　by　the　feedforward　multilayer　perceptions　(MLPs)　[3].　The　difference　between　the　generator　and　the　discriminator　is　the　generator　G　employs　deep　neural　network　(DNN)　based　on　the　masking　technique　in　which　the　magnitude　spectrum　of　noise　and　the　magnitude　spectrum　of　clean　speech　are　estimated　from　noisy　speech　features　simultaneously.　Meanwhile,　the　discriminator　D　uses　the　MLPS　structure　to　directly　predict　clean　speech　magnitude　spectrum.　The　model　D　discriminates　data　that　comes　from　clean　speech　or　generated　speech　by　G　network.　Moreover,　in　our　work,　G　network　is　used　to　perform　the　speech　enhancement.　The　objective　evaluation　and　experimental　results　show　that　the　proposed　framework　significantly　improves　the　performance　of　traditional　deep　neural　network　(DNN)　and　recent　GAN-based　speech　enhancement　methods.　©　2018　IEEE.

Keyword：

Speech enhancement Deep learning Deep neural networks Learning systems Neural networks Queueing networks Signal processing

Author Community：

[ 1 ] [Yuan, Jing]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 2 ] [Bao, Changchun]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Speech Enhancement with Phase Correction based on Modified DNN Architecture
2018，10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018
Speech enhancement based on binaural cues
2017，9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
An ideal wiener filter correction-based cIRM speech enhancement method using deep neural networks with skip connections
2018，14th IEEE International Conference on Signal Processing, ICSP 2018
IRM with phase parameterization for speech enhancement
2019，2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019

Source ：

Year： 2018

Volume： 2018-August

Page： 276-280

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 2

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 9

Affiliated Colleges：

信息科学技术学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to