• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Wang, Xianyun (Wang, Xianyun.) | Bao, Changchun (Bao, Changchun.) (Scholars:鲍长春)

Indexed by:

EI Scopus SCIE

Abstract:

For deep neural network (DNN)-based methods, the time-frequency (T-F) masks are commonly used as the training target. However, most of them do not focus on the phase information, while recent studies have revealed that incorporating phase information into the T-F mask can effectively help improve the speech quality of the enhanced speech. In this paper, we present two techniques to obtain the T-F mask considering phase information. In the first technique, the characteristics about spectral structures of two phase differences, which include the phase difference (PD) between clean and noisy speech and the PD between noise and noisy speech, are firstly discussed. Then, considering the specific characteristics of two PDs, a parametric ideal ratio mask (IRM) whose parameters are controlled by the cosines of the two aforementioned PDs is proposed, which is termed as a bounded IRM with phase constraint (BIRMP). In the second technique, an optimal estimator based on generalized maximum a posteriori (GMAP) probability of complex speech spectrum is proposed and defined as an optimal GMAP estimation of complex spectrum (OGMAPC). The OGMAPC estimator can dynamically adjust the scale of prior information of spectral magnitude and phase. Considering the difficult predictability of speech phase in the DNN-based method, the second technique exploits the spectral magnitude part of the OGMAPC estimator to calculate an optimal magnitude mask with the phase information and its ideal value is used for DNN training. The experiments show that the proposed methods can outperform the reference methods. (C) 2019 Elsevier Ltd. All rights reserved.

Keyword:

Phase-sensitive Deep neural network Monaural speech enhancement Mask estimation MAP

Author Community:

  • [ 1 ] [Wang, Xianyun]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China
  • [ 2 ] [Bao, Changchun]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China

Reprint Author's Address:

  • 鲍长春

    [Bao, Changchun]Beijing Univ Technol, Fac Informat Technol, Speech & Audio Signal Proc Lab, Beijing 100124, Peoples R China

Show more details

Related Keywords:

Related Article:

Source :

APPLIED ACOUSTICS

ISSN: 0003-682X

Year: 2019

Volume: 156

Page: 101-112

3 . 4 0 0

JCR@2022

ESI Discipline: PHYSICS;

ESI HC Threshold:123

JCR Journal Grade:2

Cited Count:

WoS CC Cited Count: 9

SCOPUS Cited Count: 13

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 5

Online/Total:910/10595668
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.