Indexed by:
Abstract:
Image captioning technology has become an important solution for intelligent robots to understand image content. How to extract image information effectively is the key to generate accurate and reliable captions. In this paper, we propose a dual self-attention based network (DSAN) for image captioning. Specifically, we design a Dual Self-Attention Module (DSAM) embedded into an encoding-decoding architecture to capture the contextual information in the image, which can adaptively integrate local features with global dependencies. The DSAM can significantly improve the caption results by modeling rich contextual dependencies over local features. Experimental results on the MS COCO dataset show that the proposed DSAN can achieve better performance than existing methods.
Keyword:
Reprint Author's Address:
Source :
PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021)
ISSN: 1948-9439
Year: 2021
Page: 1590-1595
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 6
Affiliated Colleges: