• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Ma, Nan (Ma, Nan.) | Xu, Genbao (Xu, Genbao.) | Han, Yiheng (Han, Yiheng.) | Sun, Beining (Sun, Beining.)

Indexed by:

EI Scopus

Abstract:

Recently, Transformer has demonstrated superior performance in object detection tasks by virtue of its powerful capability of modeling image global information. However, due to Transformer's expensive computational overhead and high memory occupation, the detection efficiency and performance are often affected when processing high-resolution or long sequence images. Later, some improvement works reduced the size of key and value in Transformer through pooling or convolution operations, which alleviated the above problems to a certain extent. Nonetheless, due to the smooth-ness of image data itself, the image patches in the smooth region do not contribute significantly to the final output of the model, resulting in data redundancy. Therefore, we propose a novel Sparse Attention-based Pyramid Pooling Transformer Network (SA-P2T) for object detection. Specifically, in SA-P2T, we introduce a sparse attention module, which measures the query sparsity via the Kullback-Leibler divergence, and then screen out the query that makes a large contribution to the self-attention as the new query. This module can filter out redundant information while retaining essential information, enabling the model to process image data more efficiently, and further reducing the computational complexity and memory space occupation of the model. Our experimental results on MS-COCO dataset illustrate that SA-P2T not only reduces computational complexity, but also enhances the accuracy and speed of detection, demonstrating the effectiveness of our proposed method. The code will be released at https://github.com/Genbao-Xu/SA-P2T. © 2024 IEEE.

Keyword:

Image enhancement Image segmentation Object detection Optical data processing Metadata Query processing Data reduction Human computer interaction Information filtering Object recognition

Author Community:

  • [ 1 ] [Ma, Nan]Beijing University of Technology, Faculty of Information Technology, Beijing; 100124, China
  • [ 2 ] [Xu, Genbao]Beijing University of Technology, Faculty of Information Technology, Beijing; 100124, China
  • [ 3 ] [Han, Yiheng]Beijing University of Technology, Faculty of Information Technology, Beijing; 100124, China
  • [ 4 ] [Sun, Beining]Beijing University of Technology, Faculty of Information Technology, Beijing; 100124, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Related Article:

Source :

Year: 2024

Page: 946-951

Language: English

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 7

Affiliated Colleges:

Online/Total:483/10577681
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.