• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Guo, K. (Guo, K..) | Tian, D. (Tian, D..) | Hu, Y. (Hu, Y..) | Lin, C. (Lin, C..) | Qian, Z. (Qian, Z..) | Sun, Y. (Sun, Y..) | Zhou, J. (Zhou, J..) | Duan, X. (Duan, X..) | Gao, J. (Gao, J..) | Yin, B. (Yin, B..)

Indexed by:

EI Scopus SCIE

Abstract:

Traffic video question answering (TrafficVQA) constitutes a specialized VideoQA task designed to enhance the basic comprehension and intricate reasoning capacities of videos, specifically focusing on traffic events. Recent VideoQA models employ pretrained visual and textual encoder models to bridge the feature space gap between visual and textual data. However, in addressing the unique challenges inherent to the TrafficVQA task, three pivotal issues must be addressed: (i) Dimension Gap: Between the pretrained image (appearance feature) and video (motion feature) models, there exists a conspicuous dimension difference in static and dynamic visual data; (ii) Scene Gap: The common real-world datasets and the traffic event datasets differ in visual scene content; (iii) Modality Gap: A pronounced feature distribution discrepancy emerges between traffic video and text data. To alleviate these challenges, we introduce the coarse-fine multimodal contrastive alignment network (CFMMC-Align). This model leverages sequence-level and token-level multimodal features, grounded in an unsupervised visual multimodal contrastive loss to mitigate dimension and scene gaps and a supervised visual-textual contrastive loss to alleviate modality discrepancies. Finally, the model is validated on the challenging public TrafficVQA dataset SUTD-TrafficQA and outperforms the state-of-the-art method by a substantial margin (50.2% compared to 46.0%). The code is available at https://github.com/guokan987/CFMMC-Align. IEEE

Keyword:

Semantics Cognition Task analysis Contrastive Learning Question answering (information retrieval) Visualization Video Question Answering Roads Transformers

Author Community:

  • [ 1 ] [Guo K.]School of Transportation Science and Engineering, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems &
  • [ 2 ] Safety Control, Beihang University, Beijing, China
  • [ 3 ] [Tian D.]School of Transportation Science and Engineering, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems &
  • [ 4 ] Safety Control, Beihang University, Beijing, China
  • [ 5 ] [Hu Y.]the Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
  • [ 6 ] [Lin C.]School of Transportation Science and Engineering, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems &
  • [ 7 ] Safety Control, Beihang University, Beijing, China
  • [ 8 ] [Qian Z.]Civil and Environmental Engineering Department, H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, USA
  • [ 9 ] [Sun Y.]the Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China
  • [ 10 ] [Zhou J.]School of Transportation Science and Engineering, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems &
  • [ 11 ] Safety Control, Beihang University, Beijing, China
  • [ 12 ] [Duan X.]School of Transportation Science and Engineering, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beijing Key Laboratory for Cooperative Vehicle Infrastructure Systems &
  • [ 13 ] Safety Control, Beihang University, Beijing, China
  • [ 14 ] [Gao J.]Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, NSW, Australia
  • [ 15 ] [Yin B.]the Faculty of Information Technology, Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

IEEE Transactions on Circuits and Systems for Video Technology

ISSN: 1051-8215

Year: 2024

Issue: 11

Volume: 34

Page: 1-1

8 . 4 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count: 5

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 12

Affiliated Colleges:

Online/Total:492/10577508
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.