Hardware-Friendly 3-D CNN Acceleration With Balanced Kernel Group Sparsity - Details

Author：

Sun, Mengshu (Sun, Mengshu.) | Xu, Kaidi (Xu, Kaidi.) | Lin, Xue (Lin, Xue.) | Hu, Yongli (Hu, Yongli.) | Yin, Baocai (Yin, Baocai.)

Indexed by：

EI Scopus SCIE

Abstract：

Being　capable　of　extracting　more　information　than　2-D　convolutional　neural　networks　(CNNs),　3-D　CNNs　have　been　playing　a　vital　role　in　video　analysis　tasks　like　human　action　recognition,　but　their　massive　operations　hinder　the　realtime　execution　on　edge　devices　with　constrained　computation　and　memory　resources.　Although　various　model　compression　techniques　have　been　applied　to　accelerate　2-D　CNNs,　there　are　rare　efforts　in　investigating　hardware-friendly　pruning　of　3D　CNNs　and　acceleration　on　customizable　edge　platforms　like　FPGAs.　This　work　starts　from　proposing　a　kernel　group　row-　column　(KGRC)　weight　sparsity　pattern,　which　is　fine-grained　to　achieve　high　pruning　ratios　with　negligible　accuracy　loss,　and　balanced　across　kernel　groups　to　achieve　high　computation　parallelism　on　hardware.　The　reweighted　pruning　algorithm　for　this　sparsity　is　then　presented　and　performed　on　3-D　CNNs,　followed　by　quantization　under　different　precisions.　Along　with　model　compression,　FPGA-based　accelerators　with　four　modes　are　designed　in　support　of　the　kernel　group　sparsity　in　multiple　dimensions.　The　co-design　framework　of　the　pruning　algorithm　and　the　accelerator　is　tested　on　two　representative　3-D　CNNs,　namely　C3D　and　R(2+1)D,　+　1)D,　with　the　Xilinx　ZCU102　FPGA　platform　for　action　recognition.　The　experimental　results　indicate　that　the　accelerator　implementation　with　the　KGRC　sparsity　and　8-bit　quantization　achieves　a　good　balance　between　the　speedup　and　model　accuracy,　leading　to　acceleration　ratios　of　4.12x　for　C3D　and　3.85x　for　R(2+1)D　compared　with　the　16-bit　baseline　designs　supporting　only　dense　models.

Keyword：

Quantization (signal) edge device inference Parallel processing model compression Convolutional neural networks Kernel Computational modeling Field programmable gate arrays Three-dimensional displays FPGA 3-D convolutional neural network (CNN) weight pruning

Author Community：

[ 1 ] [Sun, Mengshu]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 2 ] [Hu, Yongli]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 3 ] [Yin, Baocai]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China
[ 4 ] [Sun, Mengshu]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 5 ] [Hu, Yongli]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 6 ] [Yin, Baocai]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[ 7 ] [Xu, Kaidi]Drexel Univ, Dept Comp Sci, Philadelphia, PA 19104 USA
[ 8 ] [Lin, Xue]Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA

Reprint Author's Address：

[Yin, Baocai]Beijing Univ Technol, Beijing Inst Artificial Intelligence, Beijing Key Lab Multimedia & Intelligent Software, Beijing 100124, Peoples R China;;[Yin, Baocai]Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China;;

Email：

sunms@bjut.edu.cn |
kx46@drexel.edu |
xue.lin@northeastern.edu |
huyongli@bjut.edu.cn |
ybc@bjut.edu.cn

Show more details

Related Keywords：

LSFQ: A Low-Bit Full Integer Quantization for High-Performance FPGA-Based CNN Acceleration
2022，IEEE MICRO
Lightweight Deep Learning for Missing Data Imputation in Wastewater Treatment With Variational Residual Auto-Encoder
2024，IEEE INTERNET OF THINGS JOURNAL
IoT Device Friendly and Communication-Efficient Federated Learning via Joint Model Pruning and Quantization
2022，IEEE Internet of Things Journal
Image Coding With Data-Driven Transforms: Methodology, Performance and Potential
2020，IEEE TRANSACTIONS ON IMAGE PROCESSING

Source ：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

ISSN： 0278-0070

Year： 2024

Issue： 10

Volume： 43

Page： 3027-3040

2 . 9 0 0

JCR@2022

Cited Count：

WoS CC Cited Count：

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 1

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search Web of Science

Type
Departments

All Years Choose Year From to