• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Zuo, Guo-Yu (Zuo, Guo-Yu.) | Wang, Zi-Hao (Wang, Zi-Hao.) | Zhao, Min (Zhao, Min.) | Yu, Shuang-Yue (Yu, Shuang-Yue.)

Indexed by:

EI Scopus

Abstract:

To enable robots to safely grasp target objects in cluttered environments, a precise understanding of the spatial relationships between the target objects and their surrounding counterparts is essential. While convolutional neural networks (CNNs) demonstrate potential in relational reasoning, their primary focus on pixel-level feature extraction limits their ability to comprehend the global context and critical object relationships, subsequently affecting inference accuracy. To address these limitations, we propose a relationship reasoning model based on Graph Attention Networks aimed at enhancing the accuracy of spatial relationship understanding among objects. Initially, we employ EfficientNet-BO combined with a Bidirectional Feature Pyramid Network (BiFPN) for RGB feature extraction during the detection process. To alleviate the computational bürden, we filter out object pairs that lack clear contextual spatial relationships. We then utilize a sparsified Graph Attention Network that incorporates directional attention for effective relationship reasoning. The proposed model is trained and evaluated on the Visual Manipulation Relationship Dataset (VMRD), with attention visualized using Gradient-weighted Class Activation Mapping (Grad-CAM). The method proposed in this paper was evaluated on the VMRD dataset, achieving the highest precision across mAP, OR, and IA metrics. This reflects an improvement in both object detection and object relationship reasoning tasks. Specifically, the mAP metric reached 96.1%, indicating that the unique BiFPN structure in the EfficientDet network better integrates features of different scales within the image, effectively enhancing the average precision of image detection. The significant improvements in Object Recall (OR) and Image Accuracy (IA) demonstrate that our method can correctly infer a greater number of object relationship pairs during the reasoning phase. Comparative experiments against other methodologies on the same dataset reveal that our model significantly improves the accuracy of relationship reasoning, demonstrating its applicability and extensibility to real robotic arm grasping scenarios. The model achieves an image-based accuracy (IA) of 71. 1% in relational reasoning tasks. To validate the effectiveness of the proposed model, we employed a technique called Gradient-weighted Class Activation Mapping (Grad-CAM), which is used to interpret the decision-making process of deep convolutional neural networks. Its primary aim is to visualize the attention distribution of the neural network on input images during Classification tasks, aiding in the understanding of the model's predictive decision-making process. Grad-CAM visualizations further substantiate the model's capability to infer spatial relationships among multiple objects in cluttered scenes, under-scoring its suitability for real-world robotic applications. Additionally, we established a visual grasping experimental platform based on the AUBO-i5 robotic arm, equipped with a two-finger electric gripper and a depth camera. To validate the practical application and generalization ability of the proposed model, we constructed a specific test set in a laboratory environment. The collected real grasping scene examples were used as a new test set to assess the model's generalization capability. The results indicate that the method described in this paper still exhibits good Performance on the new dataset. While our RGB-based Graph Attention Network effectively predicts relationships among visible objects, it is validated for scenarios involving 2 to 5 objects. Future research will focus on integrating robotic operational actions and exploring methods to infer information about occluded objects based on the positional relationships of visible objects. We will also investigate strategies to enhance model Performance in scenarios with more than five objects and conduct physical experiments in increasingly complex real-world operational environments to validate effectiveness and identify additional areas for improvement. © 2025 Science Press. All rights reserved.

Keyword:

Network theory (graphs) Robotic arms Cams Inference engines Convolutional neural networks Mapping Graph neural networks Deep neural networks

Author Community:

  • [ 1 ] [Zuo, Guo-Yu]1. School of Information Science and Technology, Beijing University of Technology, Beijing 100124 2. Beijing Key Laboratory of Computing Intelligence and Intelligent Systems, Beijing 100124
  • [ 2 ] [Wang, Zi-Hao]1. School of Information Science and Technology, Beijing University of Technology, Beijing 100124 2. Beijing Key Laboratory of Computing Intelligence and Intelligent Systems, Beijing 100124
  • [ 3 ] [Zhao, Min]1. School of Information Science and Technology, Beijing University of Technology, Beijing 100124 2. Beijing Key Laboratory of Computing Intelligence and Intelligent Systems, Beijing 100124
  • [ 4 ] [Yu, Shuang-Yue]1. School of Information Science and Technology, Beijing University of Technology, Beijing 100124 2. Beijing Key Laboratory of Computing Intelligence and Intelligent Systems, Beijing 100124

Reprint Author's Address:

Email:

Show more details

Related Keywords:

Source :

Chinese Journal of Computers

ISSN: 0254-4164

Year: 2025

Issue: 3

Volume: 48

Page: 572-585

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 7

Affiliated Colleges:

Online/Total:1186/10538448
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.