Leveraging two-dimensional pre-trained vision transformers for three-dimensional model generation via masked autoencoders - Details

Author：

Indexed by：

Scopus SCIE

Abstract：

Although　the　Transformer　architecture　has　established　itself　as　the　industry　standard　for　jobs　involving　natural　language　processing,　it　still　has　few　uses　in　computer　vision.　In　vision,　attention　is　used　in　conjunction　with　convolutional　networks　or　to　replace　individual　convolutional　network　elements　while　preserving　the　overall　network　design.　Differences　between　the　two　domains,　such　as　significant　variations　in　the　scale　of　visual　things　and　the　higher　granularity　of　pixels　in　images　compared　to　words　in　the　text,　make　it　difficult　to　transfer　Transformer　from　language　to　vision.　Masking　autoencoding　is　a　promising　self-supervised　learning　approach　that　greatly　advances　computer　vision　and　natural　language　processing.　For　robust　2D　representations,　pre-training　with　large　image　data　has　become　standard　practice.　On　the　other　hand,　the　low　availability　of　3D　datasets　significantly　impedes　learning　high-quality　3D　features　because　of　the　high　data　processing　cost.　We　present　a　strong　multi-scale　MAE　prior　training　architecture　that　uses　a　trained　ViT　and　a　3D　representation　model　from　2D　images　to　let　3D　point　clouds　learn　on　their　own.　We　employ　the　adept　2D　information　to　direct　a　3D　masking-based　autoencoder,　which　uses　an　encoder-decoder　architecture　to　rebuild　the　masked　point　tokens　through　self-supervised　pre-training.　To　acquire　the　input　point　cloud＇s　multi-view　visual　characteristics,　we　first　use　pre-trained　2D　models.　Next,　we　present　a　two-dimensional　masking　method　that　preserves　the　visibility　of　semantically　significant　point　tokens.　Numerous　tests　demonstrate　how　effectively　our　method　works　with　pre-trained　models　and　how　well　it　generalizes　to　a　range　of　downstream　tasks.　In　particular,　our　pre-trained　model　achieved　93.63%　accuracy　for　linear　SVM　on　ScanObjectNN　and　91.31%　accuracy　on　ModelNet40.　Our　approach　demonstrates　how　a　straightforward　architecture　solely　based　on　conventional　transformers　may　outperform　specialized　transformer　models　from　supervised　learning.

Keyword：

3D Vision Transformers 2D Semantics 2D Masked Autoencoders

Author Community：

[ 1 ] [Sajid, Muhammad]Air Univ, Dept Comp Sci, Islamabad 44230, Pakistan
[ 2 ] [Malik, Kaleem Razzaq]Air Univ, Dept Comp Sci, Islamabad 44230, Pakistan
[ 3 ] [Rehman, Ateeq Ur]Saveetha Inst Med & Tech Sci, Saveetha Sch Engn, Comp Sci & Engn, Chennai, Tamilnadu, India
[ 4 ] [Rehman, Ateeq Ur]Appl Sci Private Univ, Appl Sci Res Ctr, Amman, Jordan
[ 5 ] [Rehman, Ateeq Ur]Chandigarh Univ, Univ Ctr Res & Dev, Mohali 140413, Punjab, India
[ 6 ] [Malik, Tauqeer Safdar]Bahauddin Zakariya Univ, Dept Informat & Commun Technol, Multan 60800, Punjab, Pakistan
[ 7 ] [Alajmi, Masoud]Taif Univ, Coll Comp & Informat Technol, Dept Comp Engn, Taif 21944, Saudi Arabia
[ 8 ] [Khan, Ali Haider]Beijing Univ Technol, Sch Software Engn, Beijing 100081, Peoples R China
[ 9 ] [Haider, Amir]Sejong Univ, Dept Artificial Intelligence & Robot, Seoul, South Korea
[ 10 ] [Hussen, Seada]Adama Sci & Technol Univ, Dept Elect Power, Adama 1888, Ethiopia

Reprint Author's Address：

[Haider, Amir]Sejong Univ, Dept Artificial Intelligence & Robot, Seoul, South Korea;;[Hussen, Seada]Adama Sci & Technol Univ, Dept Elect Power, Adama 1888, Ethiopia

Email：

Amirhaider@sejong.ac.kr |
seada.hussen@aastu.edu.et

Show more details

Related Keywords：

3DinSAR: Object 3D Localization for Indoor RFID Applications
2016，IEEE International Conference on RFID (RFID)
Latent diffusion transformer for point cloud generation
2024，Visual Computer
基于3D的交互式太极拳仿真教学系统的研究
2012，中华人民共和国第九届大学生运动会暨科学论文报告会
点阵夹芯结构在弯曲载荷工况下的力学性能分析
2017，北京力学会第二十三届学术年会

Source ：

SCIENTIFIC REPORTS

ISSN： 2045-2322

Year： 2025

Issue： 1

Volume： 15

4 . 6 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to