• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Liu, L. (Liu, L..) | Jiao, Y. (Jiao, Y..) | Li, X. (Li, X..) | Li, J. (Li, J..) | Wang, H. (Wang, H..) | Cao, X. (Cao, X..)

Indexed by:

EI Scopus

Abstract:

The objective of image captioning involves empowering computers to autonomously produce human-like sentences that depict a provided image. To address the issues of insufficient accuracy in image feature extraction and underutilization of visual information, we propose a Swin Transformer-based image captioning model with feature enhancement and multi-stage fusion. First, the Swin Transformer is employed in the capacity of an encoder for the purpose of extracting image features, and feature enhancement is adopted to capture more information about image features. Then, a multi-stage image and semantic fusion module is constructed to utilize the semantic information from past time steps. Finally, LSTM is used to decode the semantic and image information and generate captions. The proposed model achieves better results in baseline tests on the public datasets Flickr8K and Flickr30K. © 2023 IEEE.

Keyword:

LSTM Image captioning Deep learning Attention mechanism Swin Transformer

Author Community:

  • [ 1 ] [Liu L.]Beijing University of Technology, Faculty of Science, Beijing, China
  • [ 2 ] [Liu L.]Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, China
  • [ 3 ] [Jiao Y.]Beijing University of Technology, Faculty of Science, Beijing, China
  • [ 4 ] [Li X.]Beijing University of Technology, Faculty of Science, Beijing, China
  • [ 5 ] [Li J.]Beijing University of Technology, Faculty of Science, Beijing, China
  • [ 6 ] [Wang H.]China National Institute of Standardization, Fundamental Standardization, Beijing, China
  • [ 7 ] [Cao X.]China National Institute of Standardization, Fundamental Standardization, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

Year: 2023

Language: English

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 1

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 2

Affiliated Colleges:

Online/Total:241/10511429
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.