Indexed by:
Abstract:
Cross-view geo-localization of satellite and unmanned aerial vehicles (UAVs) imagery has attracted extensive attention due to its tremendous potential for global navigation satellite system (GNSS) denied navigation. However, inadequate feature representation across different views coupled with positional shifts and distance-scale uncertainty are key challenges. Most of the existing research mainly focused on extracting comprehensive and fine-grained information, yet effective feature representation and alignment should be imposed equal importance. In this article, we propose an innovative transformer-based pipeline TransFG for robust cross-view image matching, which incorporates feature aggregation (FA) and gradient guidance (GG) module. TransFG synergically takes advantage of FA and GG, achieving an effective balance in feature representation and alignment. Specifically, the proposed FA module implicitly learns salient features and dynamically aggregates contextual features from the vision transformer (ViT). The proposed GG module uses the gradient information of local features to further enhance the cross-view feature representation and aligns specific instances across different views. Extensive experiments demonstrate that our pipeline outperforms existing methods in cross-view geo-localization. It achieves an impressive improvement in R@1 and AP than the state-of-the-art (SOTA) methods. The code has been released at https://github.com/happyboy1234/TransFG. © 1980-2012 IEEE.
Keyword:
Reprint Author's Address:
Email:
Source :
IEEE Transactions on Geoscience and Remote Sensing
ISSN: 0196-2892
Year: 2024
Volume: 62
Page: 1-12
8 . 2 0 0
JCR@2022
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 19
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 34
Affiliated Colleges: