Research on AIE automatic generation strategy: A case of SpMM algorithm - Details

Author：

Bao, Zhenshan (Bao, Zhenshan.) | Li, Ning (Li, Ning.) | Zhang, Wenbo (Zhang, Wenbo.)

Indexed by：

EI Scopus

Abstract：

With　the　rapid　development　of　artificial　intelligence　and　deep　learning,　particularly　with　the　emergence　of　pre-trained　large　models,　the　tremendous　pressure　on　computational　power　has　led　to　an　increasing　demand　for　hardware　accelerators.　This　trend　brings　significant　challenges　in　computational　efficiency　and　resource　consumption.　In　this　context,　research　on　parameter　sparsity　is　becoming　increasingly　important,　as　it　can　reduce　unnecessary　computations　in　models　and　lower　storage　requirements　through　sparse　matrix　multiplication.　However,　traditional　CPU　and　GPU　architectures　are　characterized　by　high　power　consumption,　while　FPGA　development　processes　are　often　overly　complex.　Therefore,　exploring　sparse　matrix　multiplication　on　new　platforms　and　architectures　has　become　essential.　The　Versal　ACAP　(Adaptive　Compute　Acceleration　Platform)　offers　a　promising　solution　for　high-performance　computing　due　to　its　low　power　consumption　and　shorter　development　cycles.　This　paper　aims　to　investigate　efficient　implementations　of　the　sparse　matrix-dense　matrix　multiplication　(SpMM)　algorithm　on　the　Versal　ACAP　platform,　leveraging　the　AIE　Graph　methodology　for　automatic　strategy　generation.　By　analyzing　and　optimizing　the　SpMM　algorithm　based　on　the　CSR　compression　format,　we　demonstrate　the　advantages　and　potential　of　the　Versal　ACAP　platform　in　the　research　of　parameter　sparsity　for　large　models.　Experimental　results　indicate　that　the　method　achieves　an　average　computation　time　of　1.64　µs　in　a　single-core　configuration,　while　reducing　PLIO　resource　consumption　by　approximately　73.4%　in　multi-core　scenarios,　highlighting　the　potential　of　the　AIE　architecture　in　SpMM　computations.　©　2024　Copyright　held　by　the　owner/author(s).

Keyword：

Deep learning Picture archiving and communication systems Graphics processing unit Memory architecture Matrix algebra Computer graphics equipment Program processors Flow visualization

Author Community：

[ 1 ] [Bao, Zhenshan]Beijing University of Technology, Beijing, China
[ 2 ] [Li, Ning]Beijing University of Technology, Beijing, China
[ 3 ] [Zhang, Wenbo]Beijing University of Technology, Beijing, China

Reprint Author's Address：

Email：

Show more details

Related Keywords：

Parallel multiple nonnegative matrices factorization using graphics processing unit
2016，ICIC Express Letters
Application of Intelligence Binocular Vision Sensor: Mobility Solutions for Automotive Perception System
2024，IEEE Sensors Journal
Performance verification of GPU-based framework for real-time hybrid testing
2022，Journal of Vibration Engineering
Accelerated cone beam CT reconstruction based on OpenCL
2010，2nd International Conference on Image Analysis and Signal Processing, IASP'2010

Source ：

Year： 2025

Page： 177-182

Language： English

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 29

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Engineering Village

Type
Departments

All Years Choose Year From to