Indexed by:
Abstract:
Self-supervised visual odometry (VO) has exhibited remarkable benefits over supervised methods, surpassing the reliance on the annotated ground-truth of training data. However, most existing self-supervised VO methods, namely scene appearance-based methods, have limitations in exploiting the complementary properties of cross-modal information between scene appearance and structure. To this end, we propose a novel self-supervised VO based on scene appearance-structure incremental fusion scheme. Specifically, a Global-Local Context awareness-based Depth estimation Network (GLC-DN) is designed to introduce the scene structural cues, thus laying the foundation for realizing the scene appearance-structure incremental fusion. Then, a Dual stream Pose estimation Network based on Scene Appearance-Structure Incremental Fusion (SASIF-DPN) is devised, which consists of a Dual Stream Network (DSN) and multiple Cross-Modal Complementary Fusion Modules (CM-CFMs). CM-CFM fully leverages the complementary properties between the RGB information and the predicted depth information, and the combination of multiple CM-CFMs facilitates the information interaction between the two modalities in an incremental fusion manner. Detailed evaluations of GLC-DN and SASIF-DPN provably confirm the effectiveness and design principles of each component we propose. Extensive comparison experiments have also been conducted, which clearly verify the superiority of our method compared to current counterparts.
Keyword:
Reprint Author's Address:
Email:
Source :
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
ISSN: 1524-9050
Year: 2025
8 . 5 0 0
JCR@2022
Cited Count:
SCOPUS Cited Count:
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 3
Affiliated Colleges: