Indexed by:
Abstract:
Complementarily fusing RGB and depth images while effectively suppressing task-irrelevant noise is crucial for achieving accurate indoor RGB-D semantic segmentation. In this paper, we propose a novel deep model that leverages dual-modal non-local context to guide the aggregation of complementary features and the suppression of noise at multiple stages. Specifically, we introduce a dual-modal non-local context encoding (DNCE) module to learn global representations for each modality at each stage, which are then utilized to facilitate crossmodal complementary clue aggregation (CCA). Subsequently, the enhanced features from both modalities are merged together. Additionally, we propose a semantic guided feature rectification (SGFR) module to exploit rich semantic clues in the top-level merged features for suppressing noise in the lower-stage merged features. Both the DNCE-CCA and the SGFR modules provide dual-modal global views that are essential for effective RGB-D fusion. Experimental results on two public indoor datasets, NYU Depth V2 and SUN-RGBD, demonstrate that our proposed method outperforms state-of-the-art models of similar complexity.
Keyword:
Reprint Author's Address:
Email:
Source :
EXPERT SYSTEMS WITH APPLICATIONS
ISSN: 0957-4174
Year: 2024
Volume: 255
8 . 5 0 0
JCR@2022
Cited Count:
WoS CC Cited Count: 5
SCOPUS Cited Count: 5
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 5
Affiliated Colleges: