• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Wang, L. (Wang, L..) | Fu, F. (Fu, F..) | Xu, K. (Xu, K..) | Xu, H. (Xu, H..) | Yin, B. (Yin, B..)

Indexed by:

Scopus

Abstract:

To address the issue that the contextual information obtained by existing scene graph generation methods is limited, an effective context fusion module was proposed, which is the dual-stream multi-head attention module (DMA). By using DMA for object classification and relationship classification, the dual-stream multi-head attention-based scene graph generation network (DMA-Net) was suggested. The proposed method consists of object detection, object semantic parsing, and relationship semantic parsing. First, the object detection module located the objects in the image and extracted the features of the objects. Second, the object dual-stream multi-head attention (O-DMA) in object semantic parsing module was used to obtain the features fused with node contexts, which were decoded by the object semantic decoder to obtain the object labels. Finally, the features fused with edge contexts were output by the relationship dual-stream multi-head attention (R-DMA) in relationship semantic parsing module and decoded by the relationship semantic decoder to get the relationship labels. Comparisons with the proposed method and mainstream scene graph generation methods were conducted on the publicly available visual genome (VG) dataset, the graph constraint recall and no graph constraint recall of DMA-Net for three subtasks including scene graph detection, scene graph classification, and predicate classification were computed for each method. Results show that the proposed method can fully exploit the contextual information in the scene, which enhances the representation capability of features and improves the accuracy of the scene graph generation task. © 2024 Beijing University of Technology. All rights reserved.

Keyword:

context fusion dual-stream multi-head attention (DMA) relationship classification scene graph generation object detection object classification

Author Community:

  • [ 1 ] [Wang L.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 2 ] [Wang L.]Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 3 ] [Fu F.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 4 ] [Fu F.]Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 5 ] [Xu K.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 6 ] [Xu K.]Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 7 ] [Xu H.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 8 ] [Xu H.]Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 9 ] [Yin B.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
  • [ 10 ] [Yin B.]Beijing Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

Journal of Beijing University of Technology

ISSN: 0254-0037

Year: 2024

Issue: 10

Volume: 50

Page: 1198-1205

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 6

Affiliated Colleges:

Online/Total:1155/10481138
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.