TransIFC: Invariant Cues-aware Feature Concentration Learning for Efficient Fine-grained Bird Image Classification - Details

Author：

Indexed by：

EI Scopus SCIE

Abstract：

Fine-grained　bird　image　classification　(FBIC)　is　not　only　meaningful　for　endangered　bird　observation　and　protection　but　also　a　prevalent　task　for　image　classification　in　multimedia　processing　and　computer　vision.　However,　FBIC　suffers　from　several　challenges,　such　as　bird　molting,　complex　background,　and　arbitrary　bird　posture.　To　effectively　tackle　these　challenges,　we　present　a　novel　invariant　cues-aware　feature　concentration　Transformer　(TransIFC),　which　learns　invariant　and　core　information　in　bird　images.　To　this　end,　two　novel　modules　are　proposed　to　leverage　the　characteristics　of　bird　images,　namely,　the　hierarchy　stage　feature　aggregation　(HSFA)　module　and　the　feature　in　feature　abstraction　(FFA)　module.　The　HSFA　module　aggregates　the　multiscale　information　of　bird　images　by　concatenating　multilayer　features.　The　FFA　module　extracts　the　invariant　cues　of　birds　through　feature　selection　based　on　discrimination　scores.　Transformer　is　employed　as　the　backbone　to　reveal　the　long-dependent　semantic　relationships　in　bird　images.　Moreover,　abundant　visualizations　are　provided　to　prove　the　interpretability　of　the　HSFA　and　FFA　modules　in　TransIFC.　Comprehensive　experiments　demonstrate　that　TransIFC　can　achieve　state-of-the-art　performance　on　the　CUB-200-2011　dataset　(91.0%)　and　the　NABirds　dataset　(90.9%).　Finally,　extended　experiments　have　been　conducted　on　the　Stanford　Cars　dataset　to　suggest　the　potential　of　generalizing　our　method　on　other　fine-grained　visual　classification　tasks.　IEEE

Keyword：

Semantics Image recognition Invariant cues Feature extraction Task analysis Birds Transformers Transformer Image classification Deep learning

Author Community：

[ 1 ] [Liu H.]National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, China
[ 2 ] [Zhang C.]National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, China
[ 3 ] [Deng Y.]College of Computer Science, Beijing University of Technology, Beijing, China
[ 4 ] [Xie B.]Department of Mechanical Engineering, City University of Hong Kong, Kowloon, Hong Kong
[ 5 ] [Liu T.]School of Education, Hubei University, Wuhan, Hubei, China
[ 6 ] [Zhang Z.]National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, China
[ 7 ] [Li Y.]Department of Mechanical Engineering, City University of Hong Kong, Kowloon, Hong Kong

Reprint Author's Address：

Email：

Show more details

Related Keywords：

OASNet: Object Affordance State Recognition Network with Joint Visual Features and Relational Semantic Embeddings
2023，IEEE Transactions on Circuits and Systems for Video Technology
Domain-aware Prototype Network for Generalized Zero-Shot Learning
2023，IEEE Transactions on Circuits and Systems for Video Technology
DHHG-TAC: Fusion of Dynamic Heterogeneous Hypergraphs and Transformer Attention Mechanism for Visual Question Answering Tasks
2024，IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
GCFormer: Global Context-Aware Transformer for Remote Sensing Image Change Detection
2024，IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Source ：

IEEE Transactions on Multimedia

ISSN： 1520-9210

Year： 2023

Page： 1-14

7 . 3 0 0

JCR@2022

ESI Discipline： COMPUTER SCIENCE;

ESI HC Threshold：19

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 64

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 16

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to