• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Zhang, Xiaodan (Zhang, Xiaodan.) | Jia, Aozhe (Jia, Aozhe.) | Ji, Junzhong (Ji, Junzhong.) (Scholars:冀俊忠) | Qu, Liangqiong (Qu, Liangqiong.) | Ye, Qixiang (Ye, Qixiang.)

Indexed by:

EI Scopus SCIE

Abstract:

Multi-head attention (MA), which allows the model to jointly attend to crucial information from diverse representation subspaces through its heads, has yielded remarkable achievement in image captioning. However, there is no explicit mechanism to ensure MA attends to appropriate positions in diverse subspaces, resulting in overfocused attention for each head and redundancy between heads. In this paper, we propose a novel Intra- and Inter-Head Orthogonal Attention (I(2)OA) to efficiently improve MA in image captioning by introducing a concise orthogonal regularization to heads. Specifically, Intra-Head Orthogonal Attention enhances the attention learning of MA by introducing orthogonal constraint to each head, which decentralizes the object-centric attention to more comprehensive content-aware attention. Inter-Head Orthogonal Attention reduces the heads redundancy by applying orthogonal constraint between heads, which enlarges the diversity of representation subspaces and improves the representation ability for MA. Moreover, the proposed I(2)OA is flexible to combine with various multi-head attention based image captioning methods and improve the performances without increasing model complexity and parameters. Experiments on the MS COCO dataset demonstrate the effectiveness of the proposed model.

Keyword:

Visualization Transformers multi-head attention (MA) Redundancy Accuracy orthogonal constraint Head Decoding Feature extraction Optimization Dogs Correlation Image captioning

Author Community:

  • [ 1 ] [Zhang, Xiaodan]Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
  • [ 2 ] [Jia, Aozhe]Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
  • [ 3 ] [Ji, Junzhong]Beijing Univ Technol, Coll Comp Sci, Beijing 100124, Peoples R China
  • [ 4 ] [Qu, Liangqiong]Univ Hong Kong, Sch Comp & Data Sci, Hong Kong, Peoples R China
  • [ 5 ] [Ye, Qixiang]Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China

Reprint Author's Address:

  • [Ye, Qixiang]Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China

Show more details

Related Keywords:

Related Article:

Source :

IEEE TRANSACTIONS ON IMAGE PROCESSING

ISSN: 1057-7149

Year: 2025

Volume: 34

Page: 594-607

1 0 . 6 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 17

Affiliated Colleges:

Online/Total:858/10622070
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.