Indexed by:
Abstract:
Convolutional neural network has become one of the mainstream algorithms of deep learning with high computational resource consumption. The model inferencing delay and power consumption can be effectively reduced by the sparsity-aware accelerator with low resource overhead and almost lossless accuracy. However, the computation fragmentation problem caused by sparse operations will greatly reduce convolution operation efficiency. In order to alleviate the above problems, an input channel expansion method is proposed to improve the resource utilization rate in this paper. Considering that bandwidth often becomes the bottleneck of accelerator performance, a bandwidth-efficient data loopback structure is designed to reduce data transmission between accelerator and off-chip memory. The proposed hardware architecture is implemented on Xilinx VC709. It contains up to 1024 multiplication and accumulation units, providing 409.6 GOP/s peak computing power. Its computation speed reaches 315.8 GOP/s in VGG-16 model, which is equivalent to 788 GOP/s of accelerator without sparse activation optimization. 54.2% of activation data transmission is canceled through data loopback bus, which eases the dependence on off-chip bandwidth. This flexible sparsity-aware accelerator architecture can be widely applied to deep convolutional neural networks for large-scale inferencing. © 2023 Elsevier B.V.
Keyword:
Reprint Author's Address:
Email:
Source :
Microprocessors and Microsystems
ISSN: 0141-9331
Year: 2023
Volume: 98
2 . 6 0 0
JCR@2022
ESI Discipline: COMPUTER SCIENCE;
ESI HC Threshold:19
Cited Count:
WoS CC Cited Count: 0
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 4
Affiliated Colleges: