• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Ji, Junzhong (Ji, Junzhong.) | Wang, Mingzhan (Wang, Mingzhan.) | Zhang, Xiaodan (Zhang, Xiaodan.) | Lei, Minglong (Lei, Minglong.) | Qu, Liangqiong (Qu, Liangqiong.)

Indexed by:

EI Scopus SCIE

Abstract:

Self-attention based Transformer has been successfully introduced in the encoder-decoder framework of image captioning, which is superior in modeling the inner relations of inputs, i.e., image regions or semantic words. However, relations in self-attention are usually too dense to be fully optimized, which may result in noisy relations and attentions. Meanwhile, the prior relations, e.g., visual relation and semantic relation between objects, which are essential for understanding and describing an image, are ignored by current self-attention. Thus, the relation learning of self-attention in image captioning is biased, which leads to a dilution of the concentration of attentions. In this paper, we propose a Relation Constraint Self-Attention (RCSA) model to enhance the relation learning of self-attention in image captioning by constraining self-attention with prior relations. RCSA exploits the prior visual and semantic relation information from scene graph as constraint factors. And then it builds constraints for self-attention through two sub-modules: an RCSA-E encoder module and an RCSA-D decoder module. RCSA-E introduces the visual relation information to self-attention in encoder, which helps generate a sparse attention map by omitting the attention weights of irrelevant regions to highlight relevant visual features. RCSA-D extends the keys and values of self-attention in decoder with the semantic relation information to constrain the learning of semantic relation, and improve the accuracy of generated semantic words. Intuitively, RCSA-E endows model with an ability to figure out which region to omit and which region to focus by visual relation information; RCSA-D then strengthens the relation learning of the focused regions and improves the sentence generation with semantic relation information. Experiments on the MSCOCO dataset demonstrate the effectiveness of our proposed RCSA.(c) 2022 Elsevier B.V. All rights reserved.

Keyword:

Scene graph Image captioning Transformer Relation constraint self -attention

Author Community:

  • [ 1 ] [Ji, Junzhong]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
  • [ 2 ] [Wang, Mingzhan]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
  • [ 3 ] [Zhang, Xiaodan]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
  • [ 4 ] [Lei, Minglong]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
  • [ 5 ] [Ji, Junzhong]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing 100124, Peoples R China
  • [ 6 ] [Wang, Mingzhan]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing 100124, Peoples R China
  • [ 7 ] [Zhang, Xiaodan]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing 100124, Peoples R China
  • [ 8 ] [Lei, Minglong]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing 100124, Peoples R China
  • [ 9 ] [Qu, Liangqiong]Stanford Univ, Dept Biomed Data Sci, Palo Alto, CA 94304 USA

Reprint Author's Address:

Show more details

Related Keywords:

Source :

NEUROCOMPUTING

ISSN: 0925-2312

Year: 2022

Volume: 501

Page: 778-789

6 . 0

JCR@2022

6 . 0 0 0

JCR@2022

ESI Discipline: COMPUTER SCIENCE;

ESI HC Threshold:46

JCR Journal Grade:2

CAS Journal Grade:2

Cited Count:

WoS CC Cited Count: 17

SCOPUS Cited Count: 18

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 8

Affiliated Colleges:

Online/Total:386/10633078
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.