SA-P2T: Sparse Attention-based Pyramid Pooling Transformer Network for Object Detection - Details

Author：

Ma, Nan (Ma, Nan.) | Xu, Genbao (Xu, Genbao.) | Han, Yiheng (Han, Yiheng.) | Sun, Beining (Sun, Beining.)

Indexed by：

EI Scopus

Abstract：

Recently,　Transformer　has　demonstrated　superior　performance　in　object　detection　tasks　by　virtue　of　its　powerful　capability　of　modeling　image　global　information.　However,　due　to　Transformer＇s　expensive　computational　overhead　and　high　memory　occupation,　the　detection　efficiency　and　performance　are　often　affected　when　processing　high-resolution　or　long　sequence　images.　Later,　some　improvement　works　reduced　the　size　of　key　and　value　in　Transformer　through　pooling　or　convolution　operations,　which　alleviated　the　above　problems　to　a　certain　extent.　Nonetheless,　due　to　the　smooth-ness　of　image　data　itself,　the　image　patches　in　the　smooth　region　do　not　contribute　significantly　to　the　final　output　of　the　model,　resulting　in　data　redundancy.　Therefore,　we　propose　a　novel　Sparse　Attention-based　Pyramid　Pooling　Transformer　Network　(SA-P2T)　for　object　detection.　Specifically,　in　SA-P2T,　we　introduce　a　sparse　attention　module,　which　measures　the　query　sparsity　via　the　Kullback-Leibler　divergence,　and　then　screen　out　the　query　that　makes　a　large　contribution　to　the　self-attention　as　the　new　query.　This　module　can　filter　out　redundant　information　while　retaining　essential　information,　enabling　the　model　to　process　image　data　more　efficiently,　and　further　reducing　the　computational　complexity　and　memory　space　occupation　of　the　model.　Our　experimental　results　on　MS-COCO　dataset　illustrate　that　SA-P2T　not　only　reduces　computational　complexity,　but　also　enhances　the　accuracy　and　speed　of　detection,　demonstrating　the　effectiveness　of　our　proposed　method.　The　code　will　be　released　at　https://github.com/Genbao-Xu/SA-P2T.　©　2024　IEEE.

Keyword：

Image enhancement Image segmentation Object detection Optical data processing Metadata Query processing Data reduction Human computer interaction Information filtering Object recognition

Author Community：

[ 1 ] [Ma, Nan]Beijing University of Technology, Faculty of Information Technology, Beijing; 100124, China
[ 2 ] [Xu, Genbao]Beijing University of Technology, Faculty of Information Technology, Beijing; 100124, China
[ 3 ] [Han, Yiheng]Beijing University of Technology, Faculty of Information Technology, Beijing; 100124, China
[ 4 ] [Sun, Beining]Beijing University of Technology, Faculty of Information Technology, Beijing; 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

combining motion and statistical features for static object detection
2012，Journal of Beijing University of Technology
An On-Board Real-Time Target Detection and Recognition Method of High-Resolution Remote Sensing Satellite Images
2023，2023 China Automation Congress, CAC 2023
An Object Detection and Pose Estimation Method for AR Application
2022，7th International Symposium on Artificial Intelligence and Robotics, ISAIR 2022
BFNet: Brain-like Feedback Network for Object Detection under Severe Weather
2023，3rd IEEE International Conference on Digital Twins and Parallel Intelligence, DTPI 2023

Source ：

Year： 2024

Page： 946-951

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 7

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to