Indexed by:
Abstract:
Recently, Transformer has demonstrated superior performance in object detection tasks by virtue of its powerful capability of modeling image global information. However, due to Transformer's expensive computational overhead and high memory occupation, the detection efficiency and performance are often affected when processing high-resolution or long sequence images. Later, some improvement works reduced the size of key and value in Transformer through pooling or convolution operations, which alleviated the above problems to a certain extent. Nonetheless, due to the smooth-ness of image data itself, the image patches in the smooth region do not contribute significantly to the final output of the model, resulting in data redundancy. Therefore, we propose a novel Sparse Attention-based Pyramid Pooling Transformer Network (SA-P2T) for object detection. Specifically, in SA-P2T, we introduce a sparse attention module, which measures the query sparsity via the Kullback-Leibler divergence, and then screen out the query that makes a large contribution to the self-attention as the new query. This module can filter out redundant information while retaining essential information, enabling the model to process image data more efficiently, and further reducing the computational complexity and memory space occupation of the model. Our experimental results on MS-COCO dataset illustrate that SA-P2T not only reduces computational complexity, but also enhances the accuracy and speed of detection, demonstrating the effectiveness of our proposed method. The code will be released at https://github.com/Genbao-Xu/SA-P2T. © 2024 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
Year: 2024
Page: 946-951
Language: English
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 7
Affiliated Colleges: