• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Chen, Xianhong (Chen, Xianhong.) | Bao, Changchun (Bao, Changchun.) (Scholars:鲍长春)

Indexed by:

EI Scopus SCIE

Abstract:

Variations of speech content increase the difficulty of speaker verification. In this paper, to alleviate the negative effect of the variations, phoneme-unit-specific time-delay neural network (PUSTDNN) is proposed and applied to the state-of-the-art x-vector system. It models each phoneme unit with an individual time-delay neural network (TDNN). That is to say, each TDNN mainly deals with a phoneme unit. Compared with handling all phoneme units together, when handling a phoneme unit, a TDNN can extract more discriminative speaker information, thus improving the system performance. Two realizations of the PUSTDNN are proposed. The first one can retain speech temporal information. The second one further combines all the TDNNs in a PUSTDNN into a larger TDNN to reduce computational complexity. To avoid model overfitting, the phoneme units are obtained by clustering phonemes based on the phonetic knowledge and phonetic sparsity degree. The PUSTDNN is also compared with two other techniques, i.e., phonetic vector and multitask. Experiments on the Fisher, NIST SRE10, and VoxCeleb datasets show that the phonetic vector technique is most robust to the phoneme unit recognition accuracy. When the accuracy is high enough, the multitask performs better than the phonetic vector, and the PUSTDNN performs best and can achieve over 10% relative improvement compared with the x-vector baseline. © 2014 IEEE.

Keyword:

Timing circuits Speech recognition Linguistics Time delay Neural networks Vectors

Author Community:

  • [ 1 ] [Chen, Xianhong]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
  • [ 2 ] [Bao, Changchun]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address:

  • 鲍长春

    [bao, changchun]speech and audio signal processing laboratory, faculty of information technology, beijing university of technology, beijing; 100124, china

Show more details

Related Keywords:

Related Article:

Source :

ACM Transactions on Audio Speech and Language Processing

ISSN: 2329-9290

Year: 2021

Volume: 29

Page: 1243-1255

5 . 4 0 0

JCR@2022

ESI Discipline: ENGINEERING;

ESI HC Threshold:87

JCR Journal Grade:1

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 20

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 1

Online/Total:431/10563893
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.