Latent diffusion transformer for point cloud generation - Details

Author：

Ji, J. (Ji, J..) | Zhao, R. (Zhao, R..) | Lei, M. (Lei, M..)

Indexed by：

EI Scopus SCIE

Abstract：

Diffusion　models　have　been　successfully　applied　to　point　cloud　generation　tasks　recently.　The　main　notion　is　using　a　forward　process　to　progressively　add　noises　into　point　clouds　and　then　use　a　reverse　process　to　generate　point　clouds　by　denoising　these　noises.　However,　since　point　cloud　data　is　high-dimensional　and　exhibits　complex　structures,　it　is　challenging　to　adequately　capture　the　surface　distribution　of　point　clouds.　Moreover,　point　cloud　generation　methods　often　resort　to　sampling　methods　and　local　operations　to　extract　features,　which　inevitably　ignores　the　global　structures　and　overall　shapes　of　point　clouds.　To　address　these　limitations,　we　propose　a　latent　diffusion　model　based　on　Transformers　for　point　cloud　generation.　Instead　of　directly　building　a　diffusion　process　based　on　the　points,　we　first　propose　a　latent　compressor　to　convert　original　point　clouds　into　a　set　of　latent　tokens　before　feeding　them　into　diffusion　models.　Converting　point　clouds　as　latent　tokens　not　only　improves　expressiveness,　but　also　exhibits　better　flexibility　since　they　can　adapt　to　various　downstream　tasks.　We　carefully　design　the　latent　compressor　based　on　an　attention-based　auto-encoder　architecture　to　capture　global　structures　in　point　clouds.　Then,　we　propose　to　use　transformers　as　the　backbones　of　the　latent　diffusion　module　to　maintain　global　structures.　The　powerful　feature　extraction　ability　of　transformers　guarantees　the　high　quality　and　smoothness　of　generated　point　clouds.　Experiments　show　that　our　method　achieves　superior　performance　in　both　unconditional　generation　on　ShapeNet　and　multi-modal　point　cloud　completion　on　ShapeNet-ViPC.　Our　code　and　samples　are　publicly　available　at　https://github.com/Negai-98/LDT.　©　The　Author(s),　under　exclusive　licence　to　Springer-Verlag　GmbH　Germany,　part　of　Springer　Nature　2024.

Keyword：

3D Diffusion model Transformers Point cloud generation

Author Community：

[ 1 ] [Ji J.]Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, 100124, China
[ 2 ] [Ji J.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 3 ] [Zhao R.]Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, 100124, China
[ 4 ] [Zhao R.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China
[ 5 ] [Lei M.]Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, 100124, China
[ 6 ] [Lei M.]Faculty of Information Technology, Beijing University of Technology, Beijing, 100124, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Leveraging two-dimensional pre-trained vision transformers for three-dimensional model generation via masked autoencoders
2025，SCIENTIFIC REPORTS
基于3D的交互式太极拳仿真教学系统的研究
2012，中华人民共和国第九届大学生运动会暨科学论文报告会
点阵夹芯结构在弯曲载荷工况下的力学性能分析
2017，北京力学会第二十三届学术年会
基于波段分组的3D-SPIHT高光谱图像无损压缩算法
2005，中国图象图形学报：A辑

Source ：

Visual Computer

ISSN： 0178-2789

Year： 2024

Issue： 6

Volume： 40

Page： 3903-3917

3 . 5 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 4

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to