Sparse-view planar 3D reconstruction method based on hierarchical token pooling Transformer - Details

Author：

Zhang, Jiahui (Zhang, Jiahui.) | Yang, Jinfu (Yang, Jinfu.) (Scholars：杨金福) | Fu, Fuji (Fu, Fuji.) | Ma, Jiaqi (Ma, Jiaqi.)

Indexed by：

EI Scopus SCIE

Abstract：

Sparse-view　planar　3D　reconstruction　aims　to　recover　scene　information　from　limited　camera　frames,　which　poses　a　fundamental　problem　in　computer　vision.　Although　previous　methods　have　made　significant　improvements　in　this　field,　they　have　not　adequately　considered　the　multi-scale　properties　of　the　surrounding　environment,　thus　limiting　the　reconstruction　performance.　Additionally,　the　conventional　feed-forward　network　in　the　vanilla　Transformer　is　constructed　using　fully　connected　layers,　lacking　the　ability　to　capture　local　information　from　image　features.　To　address　these　two　problems,　this　paper　proposes　a　sparse-view　planar　3D　reconstruction　method　based　on　hierarchical　token　pooling　Transformer　(i.e.　HTP-Formers).　Specifically,　we　utilize　average　pooling　layers　with　various　ratios　in　Transformer　model　to　capture　multi-scale　features.　Subsequently,　we　propose　a　depth-wise　convolution　based　inverted　residual　feed-forward　network　to　enhance　local　information　extraction　performance　at　negligible　computational　cost.　To　demonstrate　the　effectiveness　of　HTP-Formers　on　planar　3D　reconstruction　tasks,　we　thoroughly　evaluate　the　proposed　model　on　Matterport3D　public　dataset.　Especially,　HTP-Formers　improves　performance　by　6.1%　and　18.3%　in　translational　and　rotational　errors,　respectively,　outperforming　most　existing　planar　3D　reconstruction　methods　in　terms　of　planar　correspondence　inference　and　relative　camera　pose　estimation.

Keyword：

Feed-forward network Planar 3D reconstruction Hierarchical token pooling Depth-wise convolution

Author Community：

[ 1 ] [Zhang, Jiahui]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 2 ] [Yang, Jinfu]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 3 ] [Fu, Fuji]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 4 ] [Ma, Jiaqi]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 5 ] [Yang, Jinfu]Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China

Reprint Author's Address：

杨金福
[Yang, Jinfu]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China;;[Yang, Jinfu]Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China

Email：

zhangjiahui_2021@163.com |
jfyang@bjut.edu.cn |
fufj@emails.bjut.edu.cn |
majiaqi@emails.bjut.edu.cn

Show more details

Related Keywords：

PlaneAC: Line-guided planar 3D reconstruction based on self-attention and convolution hybrid model
2024，Pattern Recognition
Indoor Semantic Mapping with Efficient Convolutional Neural Networks for Resource-constrained SLAM System
2020，2020 5th International Conference on Intelligent Computing and Signal Processing, ICSP 2020
Visual tracking using transformer with a combination of convolution and attention
2023，Image and Vision Computing

Source ：

APPLIED SOFT COMPUTING

ISSN： 1568-4946

Year： 2025

Volume： 174

8 . 7 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 12

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to