• Complex
  • Title
  • Keyword
  • Abstract
  • Scholars
  • Journal
  • ISSN
  • Conference
搜索

Author:

Sajid, Muhammad (Sajid, Muhammad.) | Malik, Kaleem Razzaq (Malik, Kaleem Razzaq.) | Rehman, Ateeq Ur (Rehman, Ateeq Ur.) | Malik, Tauqeer Safdar (Malik, Tauqeer Safdar.) | Alajmi, Masoud (Alajmi, Masoud.) | Khan, Ali Haider (Khan, Ali Haider.) | Haider, Amir (Haider, Amir.) | Hussen, Seada (Hussen, Seada.)

Indexed by:

Scopus SCIE

Abstract:

Although the Transformer architecture has established itself as the industry standard for jobs involving natural language processing, it still has few uses in computer vision. In vision, attention is used in conjunction with convolutional networks or to replace individual convolutional network elements while preserving the overall network design. Differences between the two domains, such as significant variations in the scale of visual things and the higher granularity of pixels in images compared to words in the text, make it difficult to transfer Transformer from language to vision. Masking autoencoding is a promising self-supervised learning approach that greatly advances computer vision and natural language processing. For robust 2D representations, pre-training with large image data has become standard practice. On the other hand, the low availability of 3D datasets significantly impedes learning high-quality 3D features because of the high data processing cost. We present a strong multi-scale MAE prior training architecture that uses a trained ViT and a 3D representation model from 2D images to let 3D point clouds learn on their own. We employ the adept 2D information to direct a 3D masking-based autoencoder, which uses an encoder-decoder architecture to rebuild the masked point tokens through self-supervised pre-training. To acquire the input point cloud's multi-view visual characteristics, we first use pre-trained 2D models. Next, we present a two-dimensional masking method that preserves the visibility of semantically significant point tokens. Numerous tests demonstrate how effectively our method works with pre-trained models and how well it generalizes to a range of downstream tasks. In particular, our pre-trained model achieved 93.63% accuracy for linear SVM on ScanObjectNN and 91.31% accuracy on ModelNet40. Our approach demonstrates how a straightforward architecture solely based on conventional transformers may outperform specialized transformer models from supervised learning.

Keyword:

3D Vision Transformers 2D Semantics 2D Masked Autoencoders

Author Community:

  • [ 1 ] [Sajid, Muhammad]Air Univ, Dept Comp Sci, Islamabad 44230, Pakistan
  • [ 2 ] [Malik, Kaleem Razzaq]Air Univ, Dept Comp Sci, Islamabad 44230, Pakistan
  • [ 3 ] [Rehman, Ateeq Ur]Saveetha Inst Med & Tech Sci, Saveetha Sch Engn, Comp Sci & Engn, Chennai, Tamilnadu, India
  • [ 4 ] [Rehman, Ateeq Ur]Appl Sci Private Univ, Appl Sci Res Ctr, Amman, Jordan
  • [ 5 ] [Rehman, Ateeq Ur]Chandigarh Univ, Univ Ctr Res & Dev, Mohali 140413, Punjab, India
  • [ 6 ] [Malik, Tauqeer Safdar]Bahauddin Zakariya Univ, Dept Informat & Commun Technol, Multan 60800, Punjab, Pakistan
  • [ 7 ] [Alajmi, Masoud]Taif Univ, Coll Comp & Informat Technol, Dept Comp Engn, Taif 21944, Saudi Arabia
  • [ 8 ] [Khan, Ali Haider]Beijing Univ Technol, Sch Software Engn, Beijing 100081, Peoples R China
  • [ 9 ] [Haider, Amir]Sejong Univ, Dept Artificial Intelligence & Robot, Seoul, South Korea
  • [ 10 ] [Hussen, Seada]Adama Sci & Technol Univ, Dept Elect Power, Adama 1888, Ethiopia

Reprint Author's Address:

  • [Haider, Amir]Sejong Univ, Dept Artificial Intelligence & Robot, Seoul, South Korea;;[Hussen, Seada]Adama Sci & Technol Univ, Dept Elect Power, Adama 1888, Ethiopia

Show more details

Related Keywords:

Related Article:

Source :

SCIENTIFIC REPORTS

ISSN: 2045-2322

Year: 2025

Issue: 1

Volume: 15

4 . 6 0 0

JCR@2022

Cited Count:

WoS CC Cited Count:

SCOPUS Cited Count:

ESI Highly Cited Papers on the List: 0 Unfold All

WanFang Cited Count:

Chinese Cited Count:

30 Days PV: 4

Affiliated Colleges:

Online/Total:511/10695216
Address:BJUT Library(100 Pingleyuan,Chaoyang District,Beijing 100124, China Post Code:100124) Contact Us:010-67392185
Copyright:BJUT Library Technical Support:Beijing Aegean Software Co., Ltd.