Spatio-Temporal Memory Attention for Image Captioning - Details

Author：

Ji, Junzhong (Ji, Junzhong.) (Scholars：冀俊忠) | Xu, Cheng (Xu, Cheng.) | Zhang, Xiaodan (Zhang, Xiaodan.) | Wang, Boyue (Wang, Boyue.) | Song, Xinhang (Song, Xinhang.)

Indexed by：

SSCI EI Scopus SCIE

Abstract：

Visual　attention　has　been　successfully　applied　in　image　captioning　to　selectively　incorporate　the　most　relevant　areas　to　the　language　generation　procedure.　However,　the　attention　in　current　image　captioning　methods　is　only　guided　by　the　hidden　state　of　language　model,　e.g.　LSTM　(Long-Short　Term　Memory),　indirectly　and　implicitly,　and　thus　the　attended　areas　are　weakly　relevant　at　different　time　steps.　Besides　the　spatial　relationship　of　attention　areas,　the　temporal　relationship　in　attention　is　crucial　for　image　captioning　according　to　the　attention　transmission　mechanism　of　human　vision.　In　this　paper,　we　propose　a　new　spatio-temporal　memory　attention　(STMA)　model　to　learn　the　spatio-temporal　relationship　in　attention　for　image　captioning.　The　STMA　introduces　the　memory　mechanism　to　the　attention　model　through　a　tailored　LSTM,　where　the　new　cell　is　used　to　memorize　and　propagate　the　attention　information,　and　the　output　gate　is　used　to　generate　attention　weights.　The　attention　in　STMA　transmits　with　memory　adaptively　and　dependently,　which　builds　strong　temporal　connections　of　attentions　and　learns　the　spatio-temporal　relationship　of　attended　areas　simultaneously.　Besides,　the　proposed　STMA　is　flexible　to　combine　with　attention-based　image　captioning　frameworks.　Experiments　on　MS　COCO　dataset　demonstrate　the　superiority　of　the　proposed　STMA　model　in　exploring　the　spatio-temporal　relationship　in　attention　and　improving　the　current　attention-based　image　captioning.

Keyword：

LSTM memory attention attention transmission Image captioning spatio-temporal relationship

Author Community：

[ 1 ] [Ji, Junzhong]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing 100124, Peoples R China
[ 2 ] [Xu, Cheng]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing 100124, Peoples R China
[ 3 ] [Zhang, Xiaodan]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing 100124, Peoples R China
[ 4 ] [Wang, Boyue]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing 100124, Peoples R China
[ 5 ] [Ji, Junzhong]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
[ 6 ] [Xu, Cheng]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
[ 7 ] [Zhang, Xiaodan]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
[ 8 ] [Wang, Boyue]Beijing Univ Technol, Fac Informat Technol, Beijing Municipal Key Lab Multimedia & Intelligen, Beijing 100124, Peoples R China
[ 9 ] [Song, Xinhang]Chinese Acad Sci, Inst Comp Technol, CAS, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China

Reprint Author's Address：

[Zhang, Xiaodan]Beijing Univ Technol, Beijing Artificial Intelligence Inst, Beijing 100124, Peoples R China

Email：

jjz01@bjut.edu.cn |
xucheng2017@emails.bjut.edu.cn |
zhangx-iaodan@bjut.edu.cn |
wby@bjut.edu.cn |
xinhang.song@ict.ac.cn

Show more details

Related Keywords：

Swin-Caption: Swin Transformer-Based Image Captioning with Feature Enhancement and Multi-Stage Fusion
2024，INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS
Swin Transformer-based Image Captioning with Feature Enhancement and Multi-stage Fusion
2023，
基于时间序列的网格化城市管理案件预测模型研究
2019，地理信息世界
基于循环自动编码器的间歇过程故障监测
2020，化工学报

Source ：

IEEE TRANSACTIONS ON IMAGE PROCESSING

ISSN： 1057-7149

Year： 2020

Volume： 29

Page： 7615-7628

1 0 . 6 0 0

JCR@2022

ESI Discipline： ENGINEERING;

ESI HC Threshold：115

Cited Count：

WoS CC Cited Count： 60

SCOPUS Cited Count： 71

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 7

Affiliated Colleges：

建筑与城市规划学院本学院/部未明确归属的数据

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to