• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Zhang, Yumei (Zhang, Yumei.) | Jia, Maoshen (Jia, Maoshen.) | Cao, Xuan (Cao, Xuan.) | Zhao, Zichen (Zhao, Zichen.)

Indexed by:

EI

Abstract:

Speech Emotion Recognition (SER) has always been an important topic in the field of human-computer interaction. Most of the existing methods use hand-crafted features, which may ignore emotion-related information contained in raw speech signals. In recent years, speech Self-supervised Learning (SSL) models such as Wav2vec 2.0 (W2V2) have emerged and been employed to extract general speech representations for the downstream SER tasks. However, the large number of parameters introduced by SSL models is unnecessary. In this paper, a SER model is proposed on the basis of the shallow structure of W2V2 and the attention mechanism. The W2V2-based module is constructed using the first seven Conv1d blocks of W2V2 to extract local feature representations from raw speech signals. The attention-based module is used to globally capture the contextual emotional information from the local feature representations. Within this module, three multi-head self-attention blocks are cascaded for multi-level feature fusion. Experimental results show that the proposed model achieves better performance than the baselines on the IEMOCAP and EMODB datasets. ©2024 IEEE.

Keyword:

Audio signal processing Self-supervised learning Emotion Recognition Speech analysis Semi-supervised learning

Author Community:

  • [ 1 ] [Zhang, Yumei]School of Information Science and Technology, Beijing University of Technology, Beijing, China
  • [ 2 ] [Jia, Maoshen]School of Information Science and Technology, Beijing University of Technology, Beijing, China
  • [ 3 ] [Cao, Xuan]School of Information Science and Technology, Beijing University of Technology, Beijing, China
  • [ 4 ] [Zhao, Zichen]College of Engineering, Yanbian University, Yanji, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2024

Page: 398-402

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 10

Affiliated Colleges:

Online/Total:616/10495178
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.