Indexed by:
Abstract:
The long-tailed distribution of data poses a significant challenge in text classification tasks. The imbalanced distribution of samples among categories often hinders effective classification of categories with a limited number of samples (referred to as tail class), leading to suboptimal overall classification performance. To address this issue and accurately classify the texts of complaints and reports, this paper proposes a text classification method tailored for long-tailed distribution, leveraging a rebalanced loss function to adjust the weights of samples from different categories. In the proposed approach, we first improve the classic loss function by incorporating the Gumbel activation function to replace the conventional activation function. This modification imparts varying gradients to both head class and tail class, thereby mitigating classification bias. Subsequently, to counteract overfitting in the tail class, regularization constraints is introduced within the loss function, enhancing its generalization capability. Experimental results demonstrate that, when compared to alternative loss functions, the method presented in this paper yields superior classification results in addressing multi-classification problems characterized by long-tailed distribution of text data. © 2024 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
ISSN: 2689-6621
Year: 2024
Page: 1901-1906
Language: English
Cited Count:
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 8
Affiliated Colleges: