• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Wang, Mingui (Wang, Mingui.) | Cui, Di (Cui, Di.) | Wu, Lifang (Wu, Lifang.) (Scholars:毋立芳) | Jian, Meng (Jian, Meng.) | Chen, Yukun (Chen, Yukun.) | Wang, Dong (Wang, Dong.) | Liu, Xu (Liu, Xu.)

Indexed by:

EI Scopus SCIE

Abstract:

Weakly-supervised video object localization is a challenging yet important task. The system should spatially localize the object of interest in videos, where only the descriptive sentences and their corresponding video segments are given in the training stage. Recent efforts propose to apply image-based Multiple Instance Learning (MIL) theory in this video task, and propagate the supervision from the video into frames by applying different frame-weighting strategies. Despite their promising progress, the spatio-temporal correlation between different object regions in videos has been largely ignored. To fill the research gap, in this work we introduce a simple but effective feature expression and aggregation framework, which utilizes the self-attention mechanism to capture the latent spatio-temporal correlation between multimodal object features and design a multimodal interaction module to model the similarity between the semantic query in sentences and the object regions in videos. We conduct extensive experimental evaluation on the YouCookII and ActivityNet-Entities datasets, which demonstrates significant improvements over multiple competitive baselines. © 2021

Keyword:

Computation theory Semantics Object recognition

Author Community:

  • [ 1 ] [Wang, Mingui]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 2 ] [Cui, Di]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 3 ] [Wu, Lifang]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 4 ] [Jian, Meng]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 5 ] [Chen, Yukun]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 6 ] [Wang, Dong]Faculty of Information Technology, Beijing University of Technology, Beijing, China
  • [ 7 ] [Liu, Xu]Faculty of Information Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address:

  • [jian, meng]faculty of information technology, beijing university of technology, beijing, china

Show more details

Related Keywords:

Related Article:

Source :

Pattern Recognition Letters

ISSN: 0167-8655

Year: 2021

Volume: 145

Page: 232-239

5 . 1 0 0

JCR@2022

ESI Discipline: ENGINEERING;

ESI HC Threshold:87

JCR Journal Grade:2

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 6

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 9

Online/Total:781/10620279
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.