VTR-Former: Vision Token Rolling Transformer for Weakly Supervised Temporal Text Localization - Details

Author：

Xi, Zeyu (Xi, Zeyu.) | Zhou, Xinlang (Zhou, Xinlang.) | Liu, Zilin (Liu, Zilin.) | Wu, Lifang (Wu, Lifang.)

Indexed by：

EI Scopus

Abstract：

Temporal　text　localization　(TTL)　task　refers　to　identify　a　segment　within　a　long　untrimmed　video　that　semantically　matches　a　given　textual　query.　However,　most　methods　require　extensive　manual　annotation　of　temporal　boundaries　for　each　query,　which　restricts　their　scalability　and　practicality　in　real-world　applications.　Moreover,　modeling　temporal　context　information　is　particularly　crucial　for　TTL　task.　In　this　paper,　a　Vision　Token　Rolling　Transformer　for　weakly　supervised　temporal　text　localization　(VTR-former)　is　developed.　VTR-former　does　not　rely　on　predefined　temporal　boundaries　during　training　or　testing.　It　significantly　improves　the　performance　of　the　model　in　temporal　information　capture　and　feature　representation　by　rolling　vision　tokens　and　utilizing　advanced　feature　learning　modules　based　on　the　transformer.　Experiments　on　two　challenging　benchmarks,　including　Charades-STA　and　ActivityNet　Captions,　demonstrate　that　VTR-former　outperforms　the　baseline　network　and　achieves　the　leading　performance.　©　The　Author(s),　under　exclusive　license　to　Springer　Nature　Singapore　Pte　Ltd.　2025.

Keyword：

Electric transformer testing Semantics

Author Community：

[ 1 ] [Xi, Zeyu]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 2 ] [Zhou, Xinlang]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 3 ] [Liu, Zilin]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China
[ 4 ] [Wu, Lifang]Faculty of Information Technology, Beijing University of Technology, Beijing; 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Expanding the Effective Receptive Field for Learned Image Compression
2024，26th IEEE International Workshop on Multimedia Signal Processing, MMSP 2024
Fuzzy constraint logic programming with answer set semantics
2007，2nd International Conference on Knowledge Science, Engineering and Management, KSEM 2007
Fuzzy constraint logic programming with answer set semantics
2007，2nd International Conference on Knowledge Science, Engineering Management
Non-Alternating ActorGame: Game semantics for actors without alternation
2013，2013 IEEE International Conference on Information and Automation, ICIA 2013

Source ：

ISSN： 1865-0929

Year： 2025

Volume： 2302 CCIS

Page： 203-217

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to