Indexed by:
Abstract:
Analyzing character sentiment information through audio is an essential and challenging task. Using convolutional and recurrent neural network approaches for audio sentiment analysis has achieved initial results, proving that deep learning methods can be helpful for sentiment analysis of audio. With the rise of multimodal research, using audio and text feature fusion for sentiment analysis has better results. However, this approach uses the pipeline approach for analysis, which needs to use an additional speech-to-text model to get text information first so that the error of text information conversion will affect the subsequent judgment. In order to solve this problem, we propose a new end-to-end model, which adopts the model architecture of seq2seq, encodes audio information by the encoder, generates text information of audio by the decoder and understands the textual content, and finally obtains the final emotional state by fusing the features extracted by the decoder and the encoder. Our model has experimented on actual psychological assistance hotline data, and the results show that our method is significantly better than the baseline method, which is a meaningful method. © 2022 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2022
Page: 1438-1443
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 19
Affiliated Colleges: