Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification - Details

Author：

Chen, Xianhong (Chen, Xianhong.) | Bao, Changchun (Bao, Changchun.) (Scholars：鲍长春)

Indexed by：

EI Scopus SCIE

Abstract：

Variations　of　speech　content　increase　the　difficulty　of　speaker　verification.　In　this　paper,　to　alleviate　the　negative　effect　of　the　variations,　phoneme-unit-specific　time-delay　neural　network　(PUSTDNN)　is　proposed　and　applied　to　the　state-of-the-art　x-vector　system.　It　models　each　phoneme　unit　with　an　individual　time-delay　neural　network　(TDNN).　That　is　to　say,　each　TDNN　mainly　deals　with　a　phoneme　unit.　Compared　with　handling　all　phoneme　units　together,　when　handling　a　phoneme　unit,　a　TDNN　can　extract　more　discriminative　speaker　information,　thus　improving　the　system　performance.　Two　realizations　of　the　PUSTDNN　are　proposed.　The　first　one　can　retain　speech　temporal　information.　The　second　one　further　combines　all　the　TDNNs　in　a　PUSTDNN　into　a　larger　TDNN　to　reduce　computational　complexity.　To　avoid　model　overfitting,　the　phoneme　units　are　obtained　by　clustering　phonemes　based　on　the　phonetic　knowledge　and　phonetic　sparsity　degree.　The　PUSTDNN　is　also　compared　with　two　other　techniques,　i.e.,　phonetic　vector　and　multitask.　Experiments　on　the　Fisher,　NIST　SRE10,　and　VoxCeleb　datasets　show　that　the　phonetic　vector　technique　is　most　robust　to　the　phoneme　unit　recognition　accuracy.　When　the　accuracy　is　high　enough,　the　multitask　performs　better　than　the　phonetic　vector,　and　the　PUSTDNN　performs　best　and　can　achieve　over　10%　relative　improvement　compared　with　the　x-vector　baseline.　©　2014　IEEE.

Keyword：

Timing circuits Speech recognition Linguistics Time delay Neural networks Vectors

Author Community：

[ 1 ] [Chen, Xianhong]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 2 ] [Bao, Changchun]Speech and Audio Signal Processing Laboratory, Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address：

鲍长春
[bao, changchun]speech and audio signal processing laboratory, faculty of information technology, beijing university of technology, beijing; 100124, china

Email：

baochch@bjut.edu.cn

Show more details

Related Keywords：

Knowledge-based neural models for modelling high-frequency electronics circuits
2019，6th International Conference on Systems and Informatics, ICSAI 2019
Synchronic analysis in a neural network based on FitzHugh-Nagumo equations with time delay
2013，2013 7th ICME International Conference on Complex Medical Engineering, CME 2013
A novel time delay estimation method in noisy and reverberant environments
2013，2013 IEEE International Conference on Signal Processing, Communications and Computing, ICSPCC 2013
Source localization based on time delay estimation in complex environment
2014，Journal on Communications

Source ：

ACM Transactions on Audio Speech and Language Processing

ISSN： 2329-9290

Year： 2021

Volume： 29

Page： 1243-1255

5 . 4 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：87

JCR Journal Grade：1

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count： 20

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

信息科学技术学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to