Swin Transformer-based Image Captioning with Feature Enhancement and Multi-stage Fusion - Details

Author：

Indexed by：

EI Scopus

Abstract：

The　objective　of　image　captioning　involves　empowering　computers　to　autonomously　produce　human-like　sentences　that　depict　a　provided　image.　To　address　the　issues　of　insufficient　accuracy　in　image　feature　extraction　and　underutilization　of　visual　information,　we　propose　a　Swin　Transformer-based　image　captioning　model　with　feature　enhancement　and　multi-stage　fusion.　First,　the　Swin　Transformer　is　employed　in　the　capacity　of　an　encoder　for　the　purpose　of　extracting　image　features,　and　feature　enhancement　is　adopted　to　capture　more　information　about　image　features.　Then,　a　multi-stage　image　and　semantic　fusion　module　is　constructed　to　utilize　the　semantic　information　from　past　time　steps.　Finally,　LSTM　is　used　to　decode　the　semantic　and　image　information　and　generate　captions.　The　proposed　model　achieves　better　results　in　baseline　tests　on　the　public　datasets　Flickr8K　and　Flickr30K.　©　2023　IEEE.

Keyword：

LSTM Image captioning Deep learning Attention mechanism Swin Transformer

Author Community：

[ 1 ] [Liu L.]Beijing University of Technology, Faculty of Science, Beijing, China
[ 2 ] [Liu L.]Beijing Institute for Scientific and Engineering Computing, Beijing University of Technology, Beijing, China
[ 3 ] [Jiao Y.]Beijing University of Technology, Faculty of Science, Beijing, China
[ 4 ] [Li X.]Beijing University of Technology, Faculty of Science, Beijing, China
[ 5 ] [Li J.]Beijing University of Technology, Faculty of Science, Beijing, China
[ 6 ] [Wang H.]China National Institute of Standardization, Fundamental Standardization, Beijing, China
[ 7 ] [Cao X.]China National Institute of Standardization, Fundamental Standardization, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Swin-Caption: Swin Transformer-Based Image Captioning with Feature Enhancement and Multi-Stage Fusion
2024，INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS
A Multi-Classification Sentiment Analysis Model of Chinese Short Text Based on Gated Linear Units and Attention Mechanism
2021，ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING
Adaptive Prediction of Resources and Workloads for Cloud Computing Systems with Attention-based and Hybrid LSTM
2022，
Multi-indicator water quality prediction with attention-assisted bidirectional LSTM and encoder-decoder
2023，INFORMATION SCIENCES

Source ：

Year： 2023

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count： 1

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 2

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to