• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Fu, Pengbin (Fu, Pengbin.) | Ma, Yuchen (Ma, Yuchen.) | Yang, Huirong (Yang, Huirong.)

Indexed by:

EI Scopus SCIE

Abstract:

The speaker diarization task pertains to the automated differentiation of speakers within an audio recording, while lacking any prior information regarding the speakers. The introduction of the self-attention mechanism in End-to-End Neural Speaker Diarization (EEND) has elegantly resolved the issue of overlapping speakers. The Transformer model equipped with self-attention mechanism has shown great potential in collecting global information, yielding remarkable outcomes in various tasks. However, the individual speaker characteristics are predominantly reflected in the contextual information, which conventional self-attention would not adequately address. In this study, we propose a hierarchical encoders model to augment the encoders' acquisition of speaker information in two distinct ways: (1) Constraining the perceptual field of the self-attentive mechanism with left-right windows or Gaussian weights to highlight contextual information; (2) Utilizing a pre-trained time-delay neural network based speaker embedding extractor to alleviate the shortcomings of speaker feature extraction ability. We evaluate the proposed methods on a simulated dataset of two speakers and a real conversation dataset. The model with the most favorable outcomes among the proposed enhancements achieves a diarization error rate of 7.74% on the simulated dataset and 21.92% on MagicData-RAMC after adaptation. These results compellingly demonstrate the efficacy of the proposed methods. © 2023 - IOS Press. All rights reserved.

Keyword:

Embeddings Audio recordings Signal encoding

Author Community:

  • [ 1 ] [Fu, Pengbin]Faculty of Information Technology, Beijing University of Technology, Xidawang Road, Beijing, China
  • [ 2 ] [Ma, Yuchen]Faculty of Information Technology, Beijing University of Technology, Xidawang Road, Beijing, China
  • [ 3 ] [Yang, Huirong]Faculty of Information Technology, Beijing University of Technology, Xidawang Road, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Journal of Intelligent and Fuzzy Systems

ISSN: 1064-1246

Year: 2023

Issue: 5

Volume: 45

Page: 9169-9180

2 . 0 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 11

Affiliated Colleges:

Online/Total:1261/10544604
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.